Haskell's WebDriver package is back in business

TL;DR

The webdriver package has undergone some major changes and now supports Selenium 3 and 4 with the release of version 0.13.0.0!

Some history

This project has been on autopilot for a while now. Originally written by @erratic-pattern, the library spoke the Selenium JSON wire protocol for controlling browsers via Selenium 2. However, the world began to change when Selenium 3 came out, with support for a new W3C protocol with some notable new features like Action chains. Then Selenium 4 came out, which marked the legacy wire protocol as deprecated. It was time for some major updates to keep this package up to date.

As a heavy WebDriver user, I got Hackage maintainer access in November 2022 to help keep it running, making small tweaks to support dependency updates and new GHC versions etc. In April 2023 we got the haskell-webdriver organization created on GitHub as the new home of the project. That year @dten made a couple nice contributions.

The push towards real progress was started by PR #144, which was first opened by @cfraz89 in 2018. This PR updated some API calls and data structures to use W3C-compatible versions, and served as a starting point when I began trying to do the upgrade in earnest.

In the rest of this blog post I’ll mention some interesting aspects of this update, for posterity’s sake and hopefully to inspire some people to migrate to the new version and use the package.

Process lifecycle management

One of the hardest things about using this package in the past was that you had to manage Selenium yourself! This meant installing the Selenium server, plus the driver program(s) such as ChromeDriver or geckodriver, plus whatever browsers you needed. In particular, you had to make sure the driver programs were compatible with the browsers. In the bad old days, you had to access an obscure Google server to find a table of which ChromeDriver versions were compatible with which Chrome versions.

After obtaining all the executables, you then had to launch the Selenium process yourself, make sure it started up successfully, and obtain the port number it was running on. Selenium’s default port is 4444, but if you wanted to be able to run more than one test simultaneously, you had to get it to use a random port and then figure out what port it chose. To this day there isn’t a good programmatic way to do this; you have to parse the output of the Selenium process. Once you had the port, you could configure this library to speak to it.

Now, the world is somewhat different. You can still use Selenium, but ChromeDriver and geckodriver are W3C-compliant WebDriver servers in their own right. So if you want to avoid some complexity and don’t care about any Selenium-specific features, you can just run against a driver directly. But you can also run in the traditional mode where you start a Selenium server, and it starts the drivers for you.

To make this easy to work with, version 0.13.0.0 handles process launching for you. First you make a WebDriverContext by calling mkEmptyWebDriverContext. Then, you can launch a WebDriver session with the startSession function:

  
startSession :: (WebDriverBase m, MonadMask m, MonadLogger m)
  => WebDriverContext
  -> DriverConfig -- Desired driver config (Selenium standalone, ChromeDriver, GeckoDriver, etc.)
  -> Capabilities -- W3C Capabilities
  -> String -- Session name
  -> m Session

The WebDriverContext is an opaque type in which we track all processes that we launch and sessions we create. When you’re done, just call teardownWebDriverContext to tear down all sessions and stop all processes.

This design allows us to smooth over some oddities between different drivers. For example, a single geckodriver instance can’t start multiple Firefox sessions (see here). So, we automatically spin up a separate geckodriver process for every session.

Tests!

The original package didn’t have any tests. Now we have a test suite, testing a matrix of Selenium 3 and 4 against Firefox and Chrome. Almost all of the WebDriver commands are tested.

One aspect of this I’m excited about is this use of sandwich-contexts, a companion package to my Sandwich test framework. This package enables Sandwich tests to use Nix to obtain dependencies from the extensive package set available in Nixpkgs. You can see the basics of what this looks like in the documentation. To see how nicely this works in the webdriver tests, see this part of the test contexts code.

Using Nix packages for testing is great. It works well in GitHub CI. It allows us to pin a specific version of our dependencies for all time. We can easily test a variety of different Selenium or browser versions to make sure the library works properly with all of them.

Code restructuring

Modules

I took the opportunity of a major new release to clean up some technical debt and rethink some previous design decisions. For one thing, the number of different modules. If you compare version 0.12.0.0 to 0.13.0.0, you can see that the module hierarchy has been cleaned up quite a bit. Importing Test.WebDriver gives you everything you need for normal usage. If you want to go beyond that, everything you need for working with Firefox/Opera profiles is in Test.WebDriver.Profile, all the commands are exported from Test.WebDriver.Commands, and so on.

Profiles

Speaking of Firefox/Opera profiles, the system for working with them has been simplified. Formerly, it was possible to configure a set on-disk paths that would get included in your browser profile. The profile would then need to be “prepared” before sending it to WebDriver. Preparing meant actually reading them all and combining them into a zip archive in memory. (Incidentally, the code that did this actually called setCurrentDirectory, temporarily changing the current directory of your whole test process and potentially causing head-scratching race conditions.)

Now the profile system is simpler: you just provide any extra files as a HashMap FilePath ByteString, and we zip everything up in memory.

Monads

Another change was to the monads. Formerly, the WebDriver class depended on MonadBaseControl IO, which I felt was a bit overengineered for a class that has a single method, doCommand, which is morally a simple IO function which makes an HTTP request. Now the equivalent class is called WebDriverBase, and it depends on MonadUnliftIO.

It should be about as easy as before for test frameworks to integrate with this. For example, here’s how sandwich-webdriver does it, in the process providing nice logging of all requests to and from the WebDriver server.

  
-- This implementation of 'W.WebDriverBase' provides logging for the requests/responses.
instance (MonadUnliftIO m) => W.WebDriverBase (ExampleT context m) where
  doCommandBase driver method path args = do
    let req = W.mkDriverRequest driver method path args
    debug [i|--> #{HC.method req} #{HC.path req}#{HC.queryString req} (#{showRequestBody (HC.requestBody req)})|]
    response <- tryAny (liftIO $ HC.httpLbs req (W._driverManager driver)) >>= either throwIO return
    let (N.Status code _) = HC.responseStatus response

    if | code >= 200 && code < 300 -> case A.eitherDecode (HC.responseBody response) of
           -- For successful responses, try to pull out the "value" and show it
           Right (A.Object (aesonLookup "value" -> Just value)) -> debug [i|<-- #{code} #{A.encode value}|]
           _ -> debug [i|<-- #{code} #{HC.responseBody response}|]
       -- For failed responses, log the entire response.
       | otherwise -> debug [i|<-- #{code} #{response}|]
    return response

To complete the integration with a test framework, you also have to provide an implementation for SessionState, which provides a single method getSession :: m Session. This is a little different from before, when there was also putSession :: WDSession -> m (). I wanted to get us out of the business of providing what is essentially a state monad, and was able to restructure things so it’s not needed. For more on the philosphy here see this article on exceptions and monad transformers.

New stuff

Action chains support

The new W3C WebDriver Actions API is defined by the spec like this:

The Actions API provides a low-level interface for providing virtualized device input to the web browser. Conceptually, the Actions commands divide time into a series of ticks. The local end sends a series of actions which correspond to the change in state, if any, of each input device during each tick.

We use this to define some useful new commands, like clickCenter and doubleClickCenter. The former is defined like this:

  
-- | Helper to click the center of an element.
clickCenter :: (HasCallStack, WebDriver wd) => Element -> wd ()
clickCenter el = performActions [PointerSource "mouse1" [
  ActionPointer $ PointerMove (PointerElement el) 0 0 movementTimeMs
  , ActionPointer $ PointerDown LeftButton
  , ActionPointer $ PointerUp LeftButton
  ]]

You can define your own action chains to do arbitrary sequences of keyboard and mouse actions, or even test things like pinch-zoom.

Browser logs retrieval

Retrieving browser logs is a useful thing to do during tests, especially if you want to check for warnings or errors. The W3C WebDriver spec doesn’t actually contain a mechanism for doing it. However, an even newer spec called WebDriver Bidi does. This spec involves connecting a WebSocket to the WebDriver server so that you can exchange bidirectional commands.

To make use of this, I added a function called withRecordLogsViaBiDi (Selenium 4 only). I also added a function called getLogs which attempts to use more legacy browser-specific methods to retrieve logs.

WD monad

The WD monad has been moved out of Test.WebDriver and into Test.WebDriver.WD (and internally converted from a StateT to a ReaderT). De-emphasizing it is meant to make clear that it’s not a core construct and is just one possible monad implementing the core classes WebDriverBase and SessionState. In case it’s helpful, you can find a full example of using it in app/Main.hs.

Misc

For a (hopefully complete) list of all the changes, see the CHANGELOG.

Conclusion

The webdriver package is now more well-tested than it’s ever been, so I hope you’ll try it out! I’ve done my best to keep functions consistent with the previous versions, so most code won’t require much modification. It’s available in Stackage nightly as well. Happy WebDriver testing!