Implement a workflow for AI-assisted Selenium test case creation. #21041

jmchilton · 2025-10-10T16:02:52Z

Automated Test Cases - Bad!

AI generated test cases are noisy and generally just bad - this is not that. The context that would be needed to pull in our whole client codebase, all the test functions, etc.. would be huge. Repeated re-running of tests as it guessed and checked would be painfully slow.

A semi-automated approach I looked into was recording a script and having an AI convert it into Selenium commands - it wasn't very promising at all though.

The selectors chosen don't really seem great and obviously most test cases can be bootstrapped and setup with a huge mountain of existing test helpers we've already had - reducing all of that to just sequences of Selectors would result in a ton of duplication, unreadable code, and less robustness (our helpers have a lot of good retry logic, adaptive waiting, rich debug messages, etc...).

AI Assistance in Building Test Cases - Good!

The semi-automatic approach that I think is more promising is to have the AI agent setup a rich environment for manually testing the UI and then provide a mechanism for turning that exploration directly into a test case. I've implemented that hear using claude slash commands.

This PR adds a Claude slash command /setup-selenium-test-notebook <feature description OR GitHub PR>. It can take a description of the feature to test or just be given a PR.

It will setup a Jupyter notebook with cells filled out for setting up the Selenium environment and talking with Galaxy. It tells the user about the config file they need to setup if it isn't present to talk with a standing development server of Galaxy and tells the user how to run Jupyter. All this part is based on my prior work in #11177.

The agent will pull down the PR description and try to come up with an idea for how to test it. The manual testing instructions we already provide are great for this. It will also "research" the code base and find related tests and will provide potentially relevant code from existing tests as Markdown comments right in the notebook - so you have a good idea of what helpers and components are already implemented that might help with the task of testing the PR.

Some screenshots from the Notebook they setup for me for testing #20886.

I think it generated some existing code for preconditions that worked pretty good out of the box - the stuff unlike the existing stuff in the test framework came in commented form. It didn't pretend to know things it didn't - it was kind of refreshing.

The agent seems smart enough to reason about when a managed history annotation is needed and how to deal with user login, etc...

Developing in Jupyter is nice because it can sustain a persistent connection to the browser automation application. You don't have to re-run the whole test - you can work a line or two at a time with cells and preserve progress and just re-run what is needed as components are annotated, etc...

I think the screenshots are a cool part of the framework we have - and these will appear right inside the notebook.

After the notebook test case is ready go, claude seems pretty good at converting it directly to a test case. This can be done with '/extract-selenium-test '

I generated the test case for the test in #21040 and it worked on the first try for me (I did move it into an existing file because I thought that is where it belonged - so that was a manual step - but no big deal).

How to test the changes?

(Select all options that apply)

Instructions for manual testing are as follows:
1. Pay for Claude and try out the workflow described above.

License

I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

Automated Test Cases - Bad AI generated test cases are noisy and generally just bad - this is not that. A semi-automated approach I looked into was recording a script and having an AI convert it into Selenium commands - it wasn't very promising at all though. The selectors chosen don't really seem great and obviously most test cases can be bootstrapped and setup with a huge mountain of existing test helpers we've already had - reducing all of that to just sequences of Selectors would result in a ton of duplication, unreadable code, and less robustness (our helpers have a lot of good retry logic, adaptive waiting, rich debug messages, etc...). AI Assistance in Building Test Cases - Good! The semi-automatic approach that I think is more promising is to have the AI agent setup a rich environment for manually testing the UI and then provide a mechanism for turning that exploration directly into a test case. This PR adds a Claude slash command "/setup-selenium-test-notebook <feature description OR GitHub PR>". It can take a description of the feature to test or a PR. It will setup a Jupyter notebook with cells filled out for setting up the Selenium enviornment and talking with Galaxy. It tells the user about the config file they need to setup if it isn't present and tells the user how to run Jupyter. All this part is based on my prior work in galaxyproject#11177. The agent will pull down the PR description and try to come up with an idea for how to test it. The manual testing instructions we already provide are great for this. It will also "research" the code base and find related tests and will provide potentially relevant code from existing tests as Markdown comments right in the notebook - so you have a good idea of what helpers and components are already implemented that might help with the task of testing the PR. The agent seems smart enough to reason about when a managed history annotation is needed and how to deal with user login, etc... Developing in Jupyter is nice because it can sustain a persistent connection to the browser automation application. You don't have to re-run the whole test - you can work a line or two at a time with cells and preserve progress and just re-run what is needed as components are annotated, etc... I think the screenshots are a cool part of the framework we have - and these will appear right inside the notebook. After the notebook test case is ready go, claude seems pretty good at converting it directly to a test case. This can be done with '/extract-selenium-test <notebook path or description>'

jmchilton added kind/enhancement area/documentation area/testing/selenium labels Oct 10, 2025

jmchilton force-pushed the ai_assisted_selenium_workflow branch from 7b639e7 to b1ef9ce Compare October 10, 2025 21:16

jmchilton marked this pull request as ready for review October 12, 2025 19:54

github-actions bot added this to the 26.0 milestone Oct 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Implement a workflow for AI-assisted Selenium test case creation. #21041

Implement a workflow for AI-assisted Selenium test case creation. #21041

Uh oh!

jmchilton commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Implement a workflow for AI-assisted Selenium test case creation. #21041

Are you sure you want to change the base?

Implement a workflow for AI-assisted Selenium test case creation. #21041

Uh oh!

Conversation

jmchilton commented Oct 10, 2025

Automated Test Cases - Bad!

AI Assistance in Building Test Cases - Good!

How to test the changes?

License

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant