Skip to content

Conversation

@jmchilton
Copy link
Member

Automated Test Cases - Bad!

AI generated test cases are noisy and generally just bad - this is not that. The context that would be needed to pull in our whole client codebase, all the test functions, etc.. would be huge. Repeated re-running of tests as it guessed and checked would be painfully slow.

A semi-automated approach I looked into was recording a script and having an AI convert it into Selenium commands - it wasn't very promising at all though.

The selectors chosen don't really seem great and obviously most test cases can be bootstrapped and setup with a huge mountain of existing test helpers we've already had - reducing all of that to just sequences of Selectors would result in a ton of duplication, unreadable code, and less robustness (our helpers have a lot of good retry logic, adaptive waiting, rich debug messages, etc...).

AI Assistance in Building Test Cases - Good!

The semi-automatic approach that I think is more promising is to have the AI agent setup a rich environment for manually testing the UI and then provide a mechanism for turning that exploration directly into a test case. I've implemented that hear using claude slash commands.

This PR adds a Claude slash command /setup-selenium-test-notebook <feature description OR GitHub PR>. It can take a description of the feature to test or just be given a PR.

It will setup a Jupyter notebook with cells filled out for setting up the Selenium environment and talking with Galaxy. It tells the user about the config file they need to setup if it isn't present to talk with a standing development server of Galaxy and tells the user how to run Jupyter. All this part is based on my prior work in #11177.

The agent will pull down the PR description and try to come up with an idea for how to test it. The manual testing instructions we already provide are great for this. It will also "research" the code base and find related tests and will provide potentially relevant code from existing tests as Markdown comments right in the notebook - so you have a good idea of what helpers and components are already implemented that might help with the task of testing the PR.

Some screenshots from the Notebook they setup for me for testing #20886.

Screenshot 2025-10-10 at 11 54 48 AM Screenshot 2025-10-10 at 11 55 00 AM Screenshot 2025-10-10 at 11 55 15 AM

I think it generated some existing code for preconditions that worked pretty good out of the box - the stuff unlike the existing stuff in the test framework came in commented form. It didn't pretend to know things it didn't - it was kind of refreshing.

The agent seems smart enough to reason about when a managed history annotation is needed and how to deal with user login, etc...

Developing in Jupyter is nice because it can sustain a persistent connection to the browser automation application. You don't have to re-run the whole test - you can work a line or two at a time with cells and preserve progress and just re-run what is needed as components are annotated, etc...

I think the screenshots are a cool part of the framework we have - and these will appear right inside the notebook.

After the notebook test case is ready go, claude seems pretty good at converting it directly to a test case. This can be done with '/extract-selenium-test '

I generated the test case for the test in #21040 and it worked on the first try for me (I did move it into an existing file because I thought that is where it belonged - so that was a manual step - but no big deal).

How to test the changes?

(Select all options that apply)

  • Instructions for manual testing are as follows:
    1. Pay for Claude and try out the workflow described above.

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

Automated Test Cases - Bad

AI generated test cases are noisy and generally just bad - this is not that.

A semi-automated approach I looked into was recording a script and having an AI convert it into Selenium commands - it wasn't very promising at all though.

The selectors chosen don't really seem great and obviously most test cases can be bootstrapped and setup with a huge mountain of existing test helpers we've already had - reducing all of that to just sequences of Selectors would result in a ton of duplication, unreadable code, and less robustness (our helpers have a  lot of good retry logic, adaptive waiting, rich debug messages, etc...).

AI Assistance in Building Test Cases - Good!

The semi-automatic approach that I think is more promising is to have the AI agent setup a rich environment for manually testing the UI and then provide a mechanism for turning that exploration directly into a test case.

This PR adds a Claude slash command "/setup-selenium-test-notebook <feature description OR GitHub PR>". It can take a description of the feature to test or a PR.

It will setup a Jupyter notebook with cells filled out for setting up the Selenium enviornment and talking with Galaxy. It tells the user about the config file they need to setup if it isn't present and tells the user how to run Jupyter. All this part is based on my prior work in galaxyproject#11177.

The agent will pull down the PR description and try to come up with an idea for how to test it. The manual testing instructions we already provide are great for this. It will also "research" the code base and find related tests and will provide potentially relevant code from existing tests as Markdown comments right in the notebook - so you have a good idea of what helpers and components are already implemented that might help with the task of testing the PR.

The agent seems smart enough to reason about when a managed history annotation is needed and how to deal with user login, etc...

Developing in Jupyter is nice because it can sustain a persistent connection to the browser automation application. You don't have to re-run the whole test - you can work a line or two at a time with cells and preserve progress and just re-run what is needed as components are annotated, etc...

I think the screenshots are a cool part of the framework we have - and these will appear right inside the notebook.

After the notebook test case is ready go, claude seems pretty good at converting it directly to a test case. This can be done with '/extract-selenium-test <notebook path or description>'
@jmchilton jmchilton force-pushed the ai_assisted_selenium_workflow branch from 7b639e7 to b1ef9ce Compare October 10, 2025 21:16
@jmchilton jmchilton marked this pull request as ready for review October 12, 2025 19:54
@github-actions github-actions bot added this to the 26.0 milestone Oct 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant