Skip to content

Conversation

@JoshuaSBrown
Copy link
Collaborator

@JoshuaSBrown JoshuaSBrown commented Nov 18, 2025

Ticket

Description

How Has This Been Tested?

Artifacts (if appropriate):

Tasks

  • - A description of the PR has been provided, and a diagram included if it is a new feature.
  • - Formatter has been run
  • - CHANGELOG comment has been added
  • - Labels have been assigned to the pr
  • - A reviwer has been added
  • - A user has been assigned to work on the pr
  • - If new feature a unit test has been added

Summary by Sourcery

Add end-to-end query tests and align backend query handling and proxy configuration to support them.

New Features:

  • Introduce Python end-to-end test that validates creating and executing saved queries against metadata.
  • Document how to run Playwright-based web UI end-to-end tests locally and via Docker, including headless and headed modes.

Bug Fixes:

  • Ensure user creation route sends the created user in the HTTP response.
  • Fix ZeroMQ body size error reporting to avoid accessing message size after the message is closed.
  • Correct handling of query parameters by treating them as JSON objects end-to-end instead of JSON strings.

Enhancements:

  • Switch server and mock server to use the custom proxy implementation for message routing.
  • Adjust database search payload construction and parsing to pass query parameters in the expected format.
  • Refine server factory to map the custom proxy type to the basic ZeroMQ proxy implementation.
  • Clean up obsolete debug logging in Foxx query routes and Playwright configuration.
  • Improve CMake Python client requirements copying path and register the new end-to-end query test with appropriate fixtures.
  • Add diagnostic output for enabling end-to-end web UI tests in the web-UI CMake configuration.

Tests:

  • Add a new Python-based end-to-end test that provisions repository and allocation resources and verifies query create/execute behavior via the DataFed API.

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Nov 18, 2025

Reviewer's Guide

Adds end-to-end tests for the Query API (Python and web), wires them into the CMake test suite, and fixes/aligns server, Foxx, and messaging behavior around query params and proxy selection needed for these tests to pass.

Sequence diagram for end-to-end Query API request with structured params

sequenceDiagram
  actor PythonClient
  participant CoreDatabaseAPI as Core_DatabaseAPI
  participant CoreServer as Core_Server
  participant ProxyBasicZMQ as Proxy_Basic_ZMQ
  participant ZeroMQCommunicator as ZeroMQ_Communicator
  participant ArangoFoxxQueryRouter as Foxx_QueryRouter

  PythonClient->>CoreDatabaseAPI: generalSearch(SearchRequest)
  CoreDatabaseAPI->>CoreDatabaseAPI: build JSON payload
  Note over CoreDatabaseAPI: payload[params] = params (already JSON)
  CoreDatabaseAPI->>CoreServer: HTTP POST /query with payload

  CoreServer->>ProxyBasicZMQ: forward database request
  ProxyBasicZMQ->>ZeroMQCommunicator: send message frame
  ZeroMQCommunicator->>ZeroMQCommunicator: validate frame_size
  ZeroMQCommunicator-->>ProxyBasicZMQ: deliver message to Foxx
  ProxyBasicZMQ-->>ArangoFoxxQueryRouter: HTTP request /query

  ArangoFoxxQueryRouter->>ArangoFoxxQueryRouter: validate body with joi
  Note over ArangoFoxxQueryRouter: params: joi.object().required()
  ArangoFoxxQueryRouter->>ArangoFoxxQueryRouter: execQuery(client, mode, published, query)
  ArangoFoxxQueryRouter->>ArangoFoxxQueryRouter: enforce page size using query.params.cnt
  ArangoFoxxQueryRouter-->>ProxyBasicZMQ: HTTP response with results

  ProxyBasicZMQ-->>CoreServer: response from Foxx
  CoreServer-->>CoreDatabaseAPI: HTTP response with query results
  CoreDatabaseAPI-->>PythonClient: parsed results
Loading

Class diagram for ServerFactory, proxy selection, and DatabaseAPI params handling

classDiagram
  class CoreServer {
    +void msgRouter(LogContext log_context, int thread_count)
    +void ioInsecure(LogContext log_context, int thread_count)
  }

  class ServerFactory {
    -LogContext m_log_context
    +ServerFactory(LogContext log_context)
    +std::unique_ptr~IServer~ create(ServerType type, SocketOptions socket_options, SocketCredentials socket_credentials)
  }

  class IServer {
    <<interface>>
    +void run()
  }

  class Proxy {
    +Proxy(SocketOptions socket_options, SocketCredentials socket_credentials, LogContext log_context)
    +void run()
  }

  class ProxyBasicZMQ {
    +ProxyBasicZMQ(SocketOptions socket_options, SocketCredentials socket_credentials, LogContext log_context)
    +void run()
  }

  class ZeroMQCommunicator {
    +void receiveBody(IMessage msg, Buffer buffer, ProtoBufFactory factory, size_t frame_size)
  }

  class DatabaseAPI {
    +void generalSearch(Auth_SearchRequest a_request, string a_user_id, bool a_is_admin)
    +uint32_t parseSearchRequest(Auth_SearchRequest a_request, string a_qry_begin, string a_qry_end, string a_qry_filter, string a_params)
  }

  class Foxx_QueryRouter {
    +execQuery(client, mode, published, query)
  }

  CoreServer --> ServerFactory
  ServerFactory ..> IServer
  Proxy ..|> IServer
  ProxyBasicZMQ ..|> IServer

  CoreServer ..> ProxyBasicZMQ : uses PROXY_CUSTOM
  ServerFactory ..> ProxyBasicZMQ : returns for PROXY_CUSTOM

  ZeroMQCommunicator ..> ProxyBasicZMQ : message transport

  DatabaseAPI ..> Foxx_QueryRouter : HTTP query requests

  note for DatabaseAPI "generalSearch now sends params as JSON object
parseSearchRequest wraps a_params with braces before sending"
Loading

File-Level Changes

Change Details Files
Align query API payload shape so params are treated as structured JSON objects end-to-end instead of encoded strings.
  • Tightened Joi validation for query create/update/execute routes to require params as an object rather than any/string
  • Stopped parsing params from JSON strings in the query exec route and passed the object through directly
  • Updated DatabaseAPI search request handling so params is serialized once at the API boundary and passed through without extra braces
  • Adjusted parseSearchRequest to wrap raw params in braces for Foxx consumption instead of wrapping at generalSearch call site
core/database/foxx/api/query_router.js
core/server/DatabaseAPI.cpp
Switch server components to use the custom proxy implementation instead of the basic ZMQ proxy and fix server factory wiring accordingly.
  • Changed CoreServer and MockCoreServer to request PROXY_CUSTOM from ServerFactory instead of PROXY_BASIC_ZMQ
  • Updated ServerFactory::create to construct ProxyBasicZMQ when PROXY_BASIC_ZMQ is requested, maintaining support for both types
  • Included ProxyBasicZMQ header in ServerFactory implementation
core/server/CoreServer.cpp
tests/mock_core/MockCoreServer.cpp
common/source/ServerFactory.cpp
Harden ZeroMQ body reception error reporting to avoid using freed message objects.
  • Captured zmq_msg_size into a local variable before closing the message and used that in the error message when frame size mismatches are detected
common/source/communicators/ZeroMQCommunicator.cpp
Add end-to-end Query API test and hook it into the CTest-based E2E suite with proper fixture ordering.
  • Introduced test_api_query.py that performs environment-driven setup (login, repo creation/allocation, record creation) and runs a metadata-based query to verify CRUD aspects of the Query API
  • Registered the new end_to_end_query test in tests/end-to-end/CMakeLists.txt
  • Used CTest fixtures so record tests set up state needed by query tests and ensured proper sequential execution
tests/end-to-end/test_api_query.py
tests/end-to-end/CMakeLists.txt
Improve E2E test tooling: Playwright usage docs, Python client build path fix, and web UI test harness tweaks.
  • Extended tests/end-to-end/README with detailed Playwright setup and Docker-based execution instructions, including headless/headed configuration tips
  • Simplified Playwright config by removing commented-out Firefox/WebKit project scaffolding
  • Fixed CMake file copy command so Python requirements.txt is copied into the target directory rather than renamed
  • Added a debug message to the web-UI CMakeLists when E2E web tests are processed
tests/end-to-end/README.md
tests/end-to-end/web-UI/playwright.config.js
CMakeLists.txt
tests/end-to-end/web-UI/CMakeLists.txt
Fix user creation route to actually return the created user in the HTTP response.
  • Ensured the user_router.js route sends the accumulated result array back in the response before logging success
core/database/foxx/api/user_router.js

Assessment against linked issues

Issue Objective Addressed Explanation
#1774 Add end-to-end tests that exercise the query functionality (query router / Query API), including creating and executing queries.
#1774 Integrate the new query end-to-end tests into the existing automated end-to-end test suite so they run with the other API tests.
#1775 Ensure query parameters are sent, interpreted, and stored as a JSON object (not a string) throughout the Core and Foxx query pipeline.
#1775 Tighten Foxx Joi validation so that the query 'params' field is required to be a JSON object, preventing arbitrary or string-only inputs.
#1777 Fix the backend/user API so that updating a user password from the web UI completes successfully and returns a proper response instead of an error.
#1777 Add or update automated tests (end-to-end/web UI) to cover the user password change flow to prevent regressions.
#1790 Prevent the ZMQ assertion failure by avoiding calls to zmq_msg_size on an invalid/closed message when receiving a message body of unexpected size.
#1791 Update the CMake configuration so that requirements.txt is copied directly into python/datafed_pkg/ (and not into a nested python/datafed_pkg/requirements.txt/requirements.txt path).

Possibly linked issues

  • #N/A: PR directly implements the requested end-to-end tests for query routing, adding query CRUD tests and wiring in CMake.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@JoshuaSBrown JoshuaSBrown added the Type: Test Related to unit or integration testing label Nov 18, 2025
@JoshuaSBrown JoshuaSBrown self-assigned this Nov 18, 2025
@JoshuaSBrown JoshuaSBrown added Priority: High Highest priority Component: Database Relates to database microservice / data model Component: Core Relates to core service Component: CLI Relates to command-line interface Component: Python API Relates to Python API Component: CI Component: Foxx Foxx Arango micro services. labels Nov 18, 2025
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • tearDown references an undefined task_id and attempts to delete a hardcoded 'krpytonite' alias instead of the created 'adamantium' record—please correct the alias and properly capture the task ID for cleanup.
  • setUp is very long and interleaves environment parsing, login, repo creation, and allocation; consider breaking it into helper methods or fixtures to improve readability and reuse.
  • The test relies on hardcoded user credentials and environment variables—parameterizing these inputs or moving them into a config helper would make the test more flexible and secure.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- tearDown references an undefined `task_id` and attempts to delete a hardcoded 'krpytonite' alias instead of the created 'adamantium' record—please correct the alias and properly capture the task ID for cleanup.
- setUp is very long and interleaves environment parsing, login, repo creation, and allocation; consider breaking it into helper methods or fixtures to improve readability and reuse.
- The test relies on hardcoded user credentials and environment variables—parameterizing these inputs or moving them into a config helper would make the test more flexible and secure.

## Individual Comments

### Comment 1
<location> `tests/end-to-end/test_api_query.py:181-183` </location>
<code_context>

</code_context>

<issue_to_address>
**issue (code-quality):** Avoid loops in tests. ([`no-loop-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-loop-in-tests))

<details><summary>Explanation</summary>Avoid complex code, like loops, in test functions.

Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals

Some ways to fix this:

* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.

> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.

Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>

### Comment 2
<location> `tests/end-to-end/test_api_query.py:182-183` </location>
<code_context>

</code_context>

<issue_to_address>
**issue (code-quality):** Avoid conditionals in tests. ([`no-conditionals-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-conditionals-in-tests))

<details><summary>Explanation</summary>Avoid complex code, like conditionals, in test functions.

Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals

Some ways to fix this:

* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.

> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.

Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>

### Comment 3
<location> `tests/end-to-end/test_api_query.py:198-204` </location>
<code_context>
        while status < 3:
            if count > 20:
                break
            time.sleep(self._timeout)
            task_result = self._df_api.taskView(task_id)
            status = task_result[0].task[0].status
            count = count + 1

</code_context>

<issue_to_address>
**suggestion (code-quality):** Move a guard clause in a while statement's body into its test ([`while-guard-to-condition`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/while-guard-to-condition))

```suggestion
        while status < 3 and not count > 20:
            time.sleep(self._timeout)
            task_result = self._df_api.taskView(task_id)
            status = task_result[0].task[0].status
            count = count + 1

```

<br/><details><summary>Explanation</summary>Removing the guard clause simplifies the code and makes clearer the intention of
the loop.
</details>
</issue_to_address>

### Comment 4
<location> `tests/end-to-end/test_api_query.py:90` </location>
<code_context>
    def setUp(self):
        path_of_file = os.path.abspath(__file__)
        current_folder = os.path.dirname(path_of_file)
        path_to_python_datafed_module = os.path.normpath(
            current_folder
            + os.sep
            + ".."
            + os.sep
            + ".."
            + os.sep
            + "python/datafed_pkg"
        )
        sys.path.insert(0, path_to_python_datafed_module)
        try:
            from datafed.CommandLib import API
        except ImportError:
            print(
                "datafed was not found, make sure you are running script with "
                "PYTHONPATH set to the location of the package in the datafed repo"
            )
            sys.exit(1)

        from datafed import version as df_ver

        print(df_ver)

        datafed_domain = os.environ.get("DATAFED_DOMAIN")
        opts = {"server_host": datafed_domain}

        if datafed_domain is None:
            print("DATAFED_DOMAIN must be set before the end-to-end tests can be run")
            sys.exit(1)

        self._df_api = API(opts)

        self._username = "datafed89"
        password = os.environ.get("DATAFED_USER89_PASSWORD")

        self._timeout = int(os.environ.get('DATAFED_TEST_TIMEOUT_OVERRIDE', '1'));
        count = 0
        while True:
            try:
                self._df_api.loginByPassword(self._username, password)
                break
            except BaseException:
                pass
            count += 1
            # Try three times to authenticate
            assert count < 3

        path_to_repo_form = os.environ.get("DATAFED_REPO_FORM_PATH")
        if path_to_repo_form is None:
            self.fail("DATAFED_REPO_FORM_PATH env variable is not defined")

        if not path_to_repo_form.endswith(".json"):
            self.fail(
                "repo create test requires that the repo form exist and be "
                "provided as a json file, the test uses the environment "
                "variable DATAFED_REPO_PATH to search for the repo form"
            )

        self._repo_form = {}
        with open(path_to_repo_form) as json_file:
            self._repo_form = json.load(json_file)

        if len(self._repo_form["exp_path"]) == 0:
            print(
                "exp_path is empty, we will set it to / for the test. This is "
                "cruft and should be removed anyway"
            )
            self._repo_form["exp_path"] = "/"

        self._repo_form["admins"] = ["u/" + self._username]

        # Create the repositories
        result = self._df_api.repoCreate(
            repo_id=self._repo_form["id"],
            title=self._repo_form["title"],
            desc=self._repo_form["desc"],
            domain=self._repo_form["domain"],
            capacity=self._repo_form["capacity"],
            pub_key=self._repo_form["pub_key"],
            address=self._repo_form["address"],
            endpoint=self._repo_form["endpoint"],
            path=self._repo_form["path"],
            exp_path=self._repo_form["exp_path"],
            admins=self._repo_form["admins"],
        )

        result = self._df_api.repoList(list_all=True)
        count = 0
        while len(result[0].repo) == 0:
            time.sleep(self._timeout)
            result = self._df_api.repoList(list_all=True)
            count = count + 1
            if count > 3:
                self.fail("Setup failed with repo create")

        self._repo_id = self._repo_form["id"]
        if not self._repo_id.startswith("repo/"):
            self._repo_id = "repo/" + self._repo_id

        # Will return a task
        result = self._df_api.repoAllocationCreate(
            repo_id=self._repo_id,
            subject="datafed89",
            data_limit=1000000000,
            rec_limit=100,
        )

        task_id = result[0].task[0].id

        # Check the status of the task
        task_result = self._df_api.taskView(task_id)

        # If status is less than 3 it is in the works
        status = task_result[0].task[0].status
        count = 0
        while status < 3:
            if count > 2:
                print(task_result)
                self.fail(
                    "Something went wrong task was unable to complete, attempt "
                    "to create an allocation after 3 seconds failed, make sure "
                    "all services are running."
                )
                break
            time.sleep(self._timeout)
            task_result = self._df_api.taskView(task_id)
            status = task_result[0].task[0].status
            count = count + 1


        parameters = {
            "testing_tempareture": 900000,
            "voltage": [1, 2, -4, 7.123],
            "creator": "Lex Luther",
            "occupation": "super villan",
            "equipment_serial_numbers": {"SEM": 14, "AFM": 9134},
        }

        title = "Adamantium"
        alias = "adamantium"
        data_result = self._df_api.dataCreate(
            title=title,
            alias=alias,
            metadata=json.dumps(parameters),
            tags=["material"],
            parent_id="root",
        )

</code_context>

<issue_to_address>
**issue (code-quality):** Use f-string instead of string concatenation [×2] ([`use-fstring-for-concatenation`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-fstring-for-concatenation/))
</issue_to_address>

### Comment 5
<location> `tests/end-to-end/test_api_query.py:173` </location>
<code_context>
    def test_query_create_delete(self):

        search_query = self._df_api.queryCreate(title="Search for Adamantium", owner="u/" + self._username, coll=["root"], meta="md.creator == 'Lex Luther'")

        print("Search query create")
        print(search_query)

        query_result=self._df_api.queryExec(search_query[0].id)

        material = ""
        for model in query_result[0].item:
          if model.alias.startswith("adamantium"):
            material = model.alias

        print(f"Query found {material}")

        self.assertEqual(material, "adamantium")

        self._df_api.queryDelete(search_query[0].id) 

</code_context>

<issue_to_address>
**issue (code-quality):** Use f-string instead of string concatenation ([`use-fstring-for-concatenation`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-fstring-for-concatenation/))
</issue_to_address>

### Comment 6
<location> `tests/end-to-end/test_api_query.py:240` </location>
<code_context>
    def tearDown(self):

        task_result = self._df_api.dataDelete("krpytonite")

        status = task_result[0].task[0].status
        count = 0
        while status < 3:
            if count > 20:
                break
            time.sleep(self._timeout)
            task_result = self._df_api.taskView(task_id)
            status = task_result[0].task[0].status
            count = count + 1

        print("Delete record result.")
        print(task_result)

        result = self._df_api.repoAllocationDelete(
            repo_id=self._repo_id, subject="datafed89"
        )

        task_id = result[0].task[0].id

        # Check the status of the task
        task_result = self._df_api.taskView(task_id)

        # If status is less than 3 it is in the works
        status = task_result[0].task[0].status
        count = 0
        while status < 3:
            if count > 2:
                print(task_result)
                self.fail(
                    "Something went wrong task was unable to complete, attempt"
                    " to delete an allocation after 3 seconds failed, make sure"
                    " all services are running."
                )
                break
            time.sleep(self._timeout)
            task_result = self._df_api.taskView(task_id)
            status = task_result[0].task[0].status
            count = count + 1

        print("Delete Allocations")
        print(result)

        repo_id = self._repo_form["id"]
        if not repo_id.startswith("repo/"):
            repo_id = "repo/" + repo_id
        result = self._df_api.repoDelete(repo_id)
        result = self._df_api.repoList(list_all=True)

</code_context>

<issue_to_address>
**issue (code-quality):** Use f-string instead of string concatenation ([`use-fstring-for-concatenation`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-fstring-for-concatenation/))
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 181 to 183
for model in query_result[0].item:
if model.alias.startswith("adamantium"):
material = model.alias
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Avoid loops in tests. (no-loop-in-tests)

ExplanationAvoid complex code, like loops, in test functions.

Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:

  • loops
  • conditionals

Some ways to fix this:

  • Use parametrized tests to get rid of the loop.
  • Move the complex logic into helpers.
  • Move the complex part into pytest fixtures.

Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.

Software Engineering at Google / Don't Put Logic in Tests

Comment on lines 182 to 183
if model.alias.startswith("adamantium"):
material = model.alias
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Avoid conditionals in tests. (no-conditionals-in-tests)

ExplanationAvoid complex code, like conditionals, in test functions.

Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:

  • loops
  • conditionals

Some ways to fix this:

  • Use parametrized tests to get rid of the loop.
  • Move the complex logic into helpers.
  • Move the complex part into pytest fixtures.

Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.

Software Engineering at Google / Don't Put Logic in Tests

)
self._repo_form["exp_path"] = "/"

self._repo_form["admins"] = ["u/" + self._username]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Use f-string instead of string concatenation [×2] (use-fstring-for-concatenation)


def test_query_create_delete(self):

search_query = self._df_api.queryCreate(title="Search for Adamantium", owner="u/" + self._username, coll=["root"], meta="md.creator == 'Lex Luther'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Use f-string instead of string concatenation (use-fstring-for-concatenation)

@JoshuaSBrown JoshuaSBrown changed the title [Test] add query end to end tests [DAPS-1774] - Core, Python, Database, Foxx, Test add query end to end tests Nov 18, 2025
@JoshuaSBrown
Copy link
Collaborator Author

JoshuaSBrown commented Nov 24, 2025

It looks like this PR, is failing when it should now be passing, the fix was merged in but the query end-to-end test is not displaying any results. We are going to run the CI without cleaning up afterwards, log in under the test user account and look to see if the record and query both exist. One possibility is that the record document is not yet registered in the datbase before the query tries to find it, even though the record is created before hand. This could be the result of isolation levels, and sandboxing interfering with the test execution.

JoshuaSBrown and others added 4 commits November 25, 2025 12:11
…se. (#1778)

[DAPS-1786] - Web tests, refactor: add test for hitting password reset. (#1787)
…XCEPT call due to callin zmq_msg with invalid state after closing it.
…oved to a folder named requirements.txt during cmake configure script.
@JoshuaSBrown
Copy link
Collaborator Author

#1277

@JoshuaSBrown
Copy link
Collaborator Author

@sourcery-ai review

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • The change to use ServerType::PROXY_CUSTOM when instantiating ProxyBasicZMQ in ServerFactory and Server::msgRouter/ioInsecure is a bit confusing; consider aligning the enum value name with the concrete implementation or adding a brief comment to explain why PROXY_CUSTOM maps to ProxyBasicZMQ.
  • The Foxx query routes now validate params as an object while DatabaseAPI still manipulates params as strings (e.g., wrapping with braces in parseSearchRequest); it would be safer to normalize params to a single structured representation across the C++ and Foxx layers to avoid subtle serialization issues.
  • The message("ENABLE END TO END!!!!!!!") in tests/end-to-end/web-UI/CMakeLists.txt looks like a leftover debug print; consider removing or gating it with a verbose/debug option to keep the CMake output clean.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The change to use ServerType::PROXY_CUSTOM when instantiating ProxyBasicZMQ in ServerFactory and Server::msgRouter/ioInsecure is a bit confusing; consider aligning the enum value name with the concrete implementation or adding a brief comment to explain why PROXY_CUSTOM maps to ProxyBasicZMQ.
- The Foxx query routes now validate `params` as an object while DatabaseAPI still manipulates `params` as strings (e.g., wrapping with braces in parseSearchRequest); it would be safer to normalize `params` to a single structured representation across the C++ and Foxx layers to avoid subtle serialization issues.
- The `message("ENABLE END TO END!!!!!!!")` in tests/end-to-end/web-UI/CMakeLists.txt looks like a leftover debug print; consider removing or gating it with a verbose/debug option to keep the CMake output clean.

## Individual Comments

### Comment 1
<location> `core/server/DatabaseAPI.cpp:1311` </location>
<code_context>
   payload["qry_begin"] = qry_begin;
   payload["qry_end"] = qry_end;
   payload["qry_filter"] = qry_filter;
-  payload["params"] = "{" + params + "}";
+  payload["params"] = params;
   payload["limit"] = to_string(cnt);

</code_context>

<issue_to_address>
**issue (bug_risk):** The params type between DatabaseAPI and Foxx query_router now appears inconsistent (string vs object).

On the Foxx side, `params` is now required to be a Joi object and is passed directly into `execQuery` without `JSON.parse`. On the C++ side, `params` is still a `std::string`, so the request body will contain `"params": "{...}"` (a JSON string that looks like an object), while Foxx expects a real JSON object. This mismatch will likely break Joi validation or cause runtime errors when accessing fields like `query.params.cnt`.

To fix this, either:
- Change `params` in `DatabaseAPI` to a structured JSON value and assign it directly so `payload["params"]` is an object, or
- Keep `params` as a string but revert to parsing it in Foxx and loosen the Joi schema accordingly.
</issue_to_address>

### Comment 2
<location> `core/server/DatabaseAPI.cpp:3944` </location>
<code_context>
   a_qry_begin = a_qry_begin;
   a_qry_end = a_qry_end;
   a_qry_filter = a_qry_filter;
-
+  a_params = "{" + a_params + "}";
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Wrapping a_params in braces here may double-wrap or otherwise distort the params string.

Given that parseSearchRequest now does `a_params = "{" + a_params + "}";`, this can: (1) generate invalid JSON if `a_params` is already a complete JSON object string (e.g. `"{\"cnt\":10}"``"{{...}}"`), and (2) change semantics when `a_params` is empty (becoming `"{}"`). It would be more robust to build `a_params` as a proper JSON object once in the earlier builder logic, rather than wrapping a string here. If wrapping must happen here, please guard against cases where `a_params` is already brace-wrapped.

Suggested implementation:

```cpp
  a_qry_begin = a_qry_begin;
  a_qry_end = a_qry_end;
  a_qry_filter = a_qry_filter;
  // Ensure we only wrap a_params in braces if it is not already a JSON object string.
  // This avoids double-wrapping (e.g. "{\"cnt\":10}" -> "{{...}}") and preserves
  // existing well-formed JSON. If a_params is empty or whitespace, leave it as-is.
  if (!a_params.empty()) {
    const auto first = a_params.find_first_not_of(" \t\n\r");
    const auto last = a_params.find_last_not_of(" \t\n\r");

    if (first != string::npos && last != string::npos) {
      const char first_char = a_params[first];
      const char last_char = a_params[last];

      // Only wrap if it doesn't already look like a JSON object.
      if (!(first_char == '{' && last_char == '}')) {
        a_params = "{" + a_params + "}";
      }
    }
  }
  return cnt;
}

```

- Ideally, the code constructing `a_params` earlier (e.g., in the builder logic or `parseSearchRequest`) should be updated so that it always produces a valid JSON object string. Once that is done, the wrapping/normalization logic here can be removed entirely, and this function can assume `a_params` is already well-formed JSON.
- If there are other call sites or helper functions that also wrap `a_params` with braces, they should be updated to either:
  - Follow the same guard pattern used here, or
  - Be refactored so that only one place is responsible for producing a JSON object for `a_params`.
</issue_to_address>

### Comment 3
<location> `tests/end-to-end/test_api_query.py:17-26` </location>
<code_context>
+class TestDataFedPythonAPIQueryCRUD(unittest.TestCase):
</code_context>

<issue_to_address>
**issue (testing):** Commented-out tearDown and cleanup can lead to test flakiness and environment pollution

The tests create repos, allocations, records, and queries, but all cleanup (`tearDown` and `queryDelete`) is commented out. In a shared end-to-end environment this can leave persistent state behind, causing conflicts between runs and flaky, non-idempotent tests.

Please:
- Restore and validate `tearDown` so it reliably removes any repos, allocations, records, and queries created by the tests.
- Move `queryDelete` into `tearDown` (or another shared cleanup path) if that matches the test design.

If full cleanup isn’t feasible due to shared resources, add a brief comment explaining the constraint and choose IDs/aliases that minimize interference with other tests or runs.
</issue_to_address>

### Comment 4
<location> `tests/end-to-end/test_api_query.py:53-62` </location>
<code_context>
+        self._username = "datafed89"
</code_context>

<issue_to_address>
**suggestion (testing):** Hardcoded user and record identifiers may cause conflicts across runs or environments

The test’s reliance on a specific account (`datafed89`) and fixed alias (`adamantium`) makes it brittle and environment-dependent. It can cause collisions with other tests using the same values and fail in environments where that user isn’t provisioned.

To improve robustness:
- Read the username from an environment variable so each environment can control which account is used.
- Generate a unique alias per run (e.g., with a timestamp or random suffix) and assert against that constructed value instead of a hardcoded string.

Suggested implementation:

```python
        self._df_api = API(opts)

        # Use environment variables for credentials to avoid hardcoded, environment-specific values
        self._username = os.environ.get("DATAFED_USERNAME")
        if not self._username:
            print("DATAFED_USERNAME must be set before the end-to-end tests can be run")
            sys.exit(1)

        password = os.environ.get("DATAFED_PASSWORD")
        if not password:
            print("DATAFED_PASSWORD must be set before the end-to-end tests can be run")
            sys.exit(1)

```

To fully implement your comment about avoiding hardcoded record identifiers (e.g., alias "adamantium"), you should:
1. Locate where the hardcoded alias is used in this test file (likely something like `alias="adamantium"` or assertions against the string `"adamantium"`).
2. In the common setup for these tests (e.g., in the test class `__init__`, `setup_method`, or a fixture), generate a unique alias per run, such as:
   - `self._alias = f"adamantium-{int(time.time())}"` or
   - `self._alias = f"adamantium-{uuid.uuid4().hex}"`.
3. Replace all hardcoded usages of `"adamantium"` with this dynamically generated `self._alias` (or the equivalent variable in your setup/fixture).
4. Update assertions to compare against the constructed alias variable instead of the literal string `"adamantium"` to keep the tests stable and unique across runs.
</issue_to_address>

### Comment 5
<location> `tests/end-to-end/test_api_query.py:150-159` </location>
<code_context>
+            status = task_result[0].task[0].status
+            count = count + 1
+
+        parameters = {
+            "testing_tempareture": 900000,
+            "voltage": [1, 2, -4, 7.123],
+            "creator": "Lex Luther",
+            "occupation": "super villan",
+            "equipment_serial_numbers": {"SEM": 14, "AFM": 9134}
+        }
+
+        title = "Adamantium"
+        alias = "adamantium"
+        record = self._df_api.dataCreate(
+            title=title,
+            alias=alias,
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding assertions that the query is actually filtering on the intended metadata fields

Right now the test only checks that one of the results has alias `adamantium`, so it doesn’t really prove that the metadata filter is what’s driving the result. Please add at least one record that should not match the filter (e.g., a different `creator`) and assert that it’s excluded from the results. It would also help to assert key metadata fields on the returned record to confirm that the correct item is being selected based on the metadata filter, not other incidental state.

Suggested implementation:

```python
        parameters = {
            "testing_tempareture": 900000,
            "voltage": [1, 2, -4, 7.123],
            "creator": "Lex Luther",
            "occupation": "super villan",
            "equipment_serial_numbers": {"SEM": 14, "AFM": 9134}
        }

        title = "Adamantium"
        alias = "adamantium"
        matching_record = self._df_api.dataCreate(
            title=title,
            alias=alias,
            metadata=json.dumps(parameters),
            tags=["material"],
            parent_id="root",
        )

        # Create a similar record that should NOT match the metadata filter
        non_matching_parameters = {
            "testing_tempareture": 900000,
            "voltage": [1, 2, -4, 7.123],
            # Different creator to ensure the metadata filter on "creator" excludes this record
            "creator": "Bruce Wayne",
            "occupation": "super villan",
            "equipment_serial_numbers": {"SEM": 14, "AFM": 9134}
        }

        non_matching_title = "Adamantium (control)"
        non_matching_alias = "adamantium-control"
        non_matching_record = self._df_api.dataCreate(
            title=non_matching_title,
            alias=non_matching_alias,
            metadata=json.dumps(non_matching_parameters),
            tags=["material"],
            parent_id="root",
        )

        print("Created matching record")
        print(matching_record)
        print("Created non-matching record")
        print(non_matching_record)

    def test_query_create_delete(self):

```

To fully implement the suggestion about asserting that the query is actually filtering on the intended metadata fields, you should also update the part of this test that performs the metadata query and asserts on the results (not shown in the snippet). In that section:
1) Ensure the query explicitly filters on at least one metadata field that differs between the two records, e.g. `creator == "Lex Luther"`.
2) Assert that:
   - Exactly one record is returned (or, more generally, that the non-matching record is not included).
   - The returned record’s alias is `"adamantium"` (or matches `alias` from `matching_record`).
   - Key metadata fields on the returned record (`creator`, `testing_tempareture`, `voltage`, etc.) match the values in `parameters`, so you know the correct item was selected because of the metadata filter.
3) Optionally, if the API allows, assert that the record with alias `"adamantium-control"` (or `non_matching_alias`) is not present in the query result set.
You’ll need to adapt these assertions to the exact query/response shape used later in this test file (e.g. whether results come back as a list, generator, or paginated object).
</issue_to_address>

### Comment 6
<location> `tests/end-to-end/test_api_query.py:183-185` </location>
<code_context>

</code_context>

<issue_to_address>
**issue (code-quality):** Avoid loops in tests. ([`no-loop-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-loop-in-tests))

<details><summary>Explanation</summary>Avoid complex code, like loops, in test functions.

Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals

Some ways to fix this:

* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.

> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.

Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>

### Comment 7
<location> `tests/end-to-end/test_api_query.py:184-185` </location>
<code_context>

</code_context>

<issue_to_address>
**issue (code-quality):** Avoid conditionals in tests. ([`no-conditionals-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-conditionals-in-tests))

<details><summary>Explanation</summary>Avoid complex code, like conditionals, in test functions.

Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals

Some ways to fix this:

* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.

> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.

Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>

### Comment 8
<location> `tests/end-to-end/test_api_query.py:90` </location>
<code_context>
    def setUp(self):
        path_of_file = os.path.abspath(__file__)
        current_folder = os.path.dirname(path_of_file)
        path_to_python_datafed_module = os.path.normpath(
            current_folder
            + os.sep
            + ".."
            + os.sep
            + ".."
            + os.sep
            + "python/datafed_pkg"
        )
        sys.path.insert(0, path_to_python_datafed_module)
        try:
            from datafed.CommandLib import API
        except ImportError:
            print(
                "datafed was not found, make sure you are running script with "
                "PYTHONPATH set to the location of the package in the datafed repo"
            )
            sys.exit(1)

        from datafed import version as df_ver

        print(df_ver)

        datafed_domain = os.environ.get("DATAFED_DOMAIN")
        opts = {"server_host": datafed_domain}

        if datafed_domain is None:
            print("DATAFED_DOMAIN must be set before the end-to-end tests can be run")
            sys.exit(1)

        self._df_api = API(opts)

        self._username = "datafed89"
        password = os.environ.get("DATAFED_USER89_PASSWORD")

        self._timeout = int(os.environ.get('DATAFED_TEST_TIMEOUT_OVERRIDE', '1'))
        count = 0
        while True:
            try:
                self._df_api.loginByPassword(self._username, password)
                break
            except BaseException:
                pass
            count += 1
            # Try three times to authenticate
            assert count < 3

        path_to_repo_form = os.environ.get("DATAFED_REPO_FORM_PATH")
        if path_to_repo_form is None:
            self.fail("DATAFED_REPO_FORM_PATH env variable is not defined")

        if not path_to_repo_form.endswith(".json"):
            self.fail(
                "repo create test requires that the repo form exist and be "
                "provided as a json file, the test uses the environment "
                "variable DATAFED_REPO_PATH to search for the repo form"
            )

        self._repo_form = {}
        with open(path_to_repo_form) as json_file:
            self._repo_form = json.load(json_file)

        if len(self._repo_form["exp_path"]) == 0:
            print(
                "exp_path is empty, we will set it to / for the test. This is "
                "cruft and should be removed anyway"
            )
            self._repo_form["exp_path"] = "/"

        self._repo_form["admins"] = ["u/" + self._username]

        # Create the repositories
        result = self._df_api.repoCreate(
            repo_id=self._repo_form["id"],
            title=self._repo_form["title"],
            desc=self._repo_form["desc"],
            domain=self._repo_form["domain"],
            capacity=self._repo_form["capacity"],
            pub_key=self._repo_form["pub_key"],
            address=self._repo_form["address"],
            endpoint=self._repo_form["endpoint"],
            path=self._repo_form["path"],
            exp_path=self._repo_form["exp_path"],
            admins=self._repo_form["admins"],
        )

        result = self._df_api.repoList(list_all=True)
        count = 0
        while len(result[0].repo) == 0:
            time.sleep(self._timeout)
            result = self._df_api.repoList(list_all=True)
            count = count + 1
            if count > 3:
                self.fail("Setup failed with repo create")

        self._repo_id = self._repo_form["id"]
        if not self._repo_id.startswith("repo/"):
            self._repo_id = "repo/" + self._repo_id

        # Will return a task
        result = self._df_api.repoAllocationCreate(
            repo_id=self._repo_id,
            subject=self._username,
            data_limit=1000000000,
            rec_limit=100,
        )

        task_id = result[0].task[0].id

        # Check the status of the task
        task_result = self._df_api.taskView(task_id)

        # If status is less than 3 it is in the works
        status = task_result[0].task[0].status
        count = 0
        while status < 3:
            if count > 2:
                print(task_result)
                self.fail(
                    "Something went wrong task was unable to complete, attempt "
                    "to create an allocation after 3 seconds failed, make sure "
                    "all services are running."
                )
                break
            time.sleep(self._timeout)
            task_result = self._df_api.taskView(task_id)
            status = task_result[0].task[0].status
            count = count + 1

        parameters = {
            "testing_tempareture": 900000,
            "voltage": [1, 2, -4, 7.123],
            "creator": "Lex Luther",
            "occupation": "super villan",
            "equipment_serial_numbers": {"SEM": 14, "AFM": 9134}
        }

        title = "Adamantium"
        alias = "adamantium"
        record = self._df_api.dataCreate(
            title=title,
            alias=alias,
            metadata=json.dumps(parameters),
            tags=["material"],
            parent_id="root",
        )

        print("Created record")
        print(record)

</code_context>

<issue_to_address>
**issue (code-quality):** Use f-string instead of string concatenation [×2] ([`use-fstring-for-concatenation`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-fstring-for-concatenation/))
</issue_to_address>

### Comment 9
<location> `tests/end-to-end/test_api_query.py:173` </location>
<code_context>
    def test_query_create_delete(self):

        search_query = self._df_api.queryCreate(title="Search for Adamantium", owner="u/" + self._username, coll=["root"], meta="md.creator == 'Lex Luther'")

        print("Search query create")
        print(search_query)

        query_result = self._df_api.queryExec(search_query[0].id)
        print("Query result")
        print(query_result)

        material = ""
        for model in query_result[0].item:
            if model.alias.startswith("adamantium"):
                material = model.alias

        print(f"Query found {material}")

        self.assertEqual(material, "adamantium")

</code_context>

<issue_to_address>
**issue (code-quality):** Use f-string instead of string concatenation ([`use-fstring-for-concatenation`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-fstring-for-concatenation/))
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.


a_qry_begin = a_qry_begin;
a_qry_end = a_qry_end;
a_qry_filter = a_qry_filter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Wrapping a_params in braces here may double-wrap or otherwise distort the params string.

Given that parseSearchRequest now does a_params = "{" + a_params + "}";, this can: (1) generate invalid JSON if a_params is already a complete JSON object string (e.g. "{\"cnt\":10}""{{...}}"), and (2) change semantics when a_params is empty (becoming "{}"). It would be more robust to build a_params as a proper JSON object once in the earlier builder logic, rather than wrapping a string here. If wrapping must happen here, please guard against cases where a_params is already brace-wrapped.

Suggested implementation:

  a_qry_begin = a_qry_begin;
  a_qry_end = a_qry_end;
  a_qry_filter = a_qry_filter;
  // Ensure we only wrap a_params in braces if it is not already a JSON object string.
  // This avoids double-wrapping (e.g. "{\"cnt\":10}" -> "{{...}}") and preserves
  // existing well-formed JSON. If a_params is empty or whitespace, leave it as-is.
  if (!a_params.empty()) {
    const auto first = a_params.find_first_not_of(" \t\n\r");
    const auto last = a_params.find_last_not_of(" \t\n\r");

    if (first != string::npos && last != string::npos) {
      const char first_char = a_params[first];
      const char last_char = a_params[last];

      // Only wrap if it doesn't already look like a JSON object.
      if (!(first_char == '{' && last_char == '}')) {
        a_params = "{" + a_params + "}";
      }
    }
  }
  return cnt;
}
  • Ideally, the code constructing a_params earlier (e.g., in the builder logic or parseSearchRequest) should be updated so that it always produces a valid JSON object string. Once that is done, the wrapping/normalization logic here can be removed entirely, and this function can assume a_params is already well-formed JSON.
  • If there are other call sites or helper functions that also wrap a_params with braces, they should be updated to either:
    • Follow the same guard pattern used here, or
    • Be refactored so that only one place is responsible for producing a JSON object for a_params.

Comment on lines +53 to +62
self._username = "datafed89"
password = os.environ.get("DATAFED_USER89_PASSWORD")

self._timeout = int(os.environ.get('DATAFED_TEST_TIMEOUT_OVERRIDE', '1'))
count = 0
while True:
try:
self._df_api.loginByPassword(self._username, password)
break
except BaseException:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Hardcoded user and record identifiers may cause conflicts across runs or environments

The test’s reliance on a specific account (datafed89) and fixed alias (adamantium) makes it brittle and environment-dependent. It can cause collisions with other tests using the same values and fail in environments where that user isn’t provisioned.

To improve robustness:

  • Read the username from an environment variable so each environment can control which account is used.
  • Generate a unique alias per run (e.g., with a timestamp or random suffix) and assert against that constructed value instead of a hardcoded string.

Suggested implementation:

        self._df_api = API(opts)

        # Use environment variables for credentials to avoid hardcoded, environment-specific values
        self._username = os.environ.get("DATAFED_USERNAME")
        if not self._username:
            print("DATAFED_USERNAME must be set before the end-to-end tests can be run")
            sys.exit(1)

        password = os.environ.get("DATAFED_PASSWORD")
        if not password:
            print("DATAFED_PASSWORD must be set before the end-to-end tests can be run")
            sys.exit(1)

To fully implement your comment about avoiding hardcoded record identifiers (e.g., alias "adamantium"), you should:

  1. Locate where the hardcoded alias is used in this test file (likely something like alias="adamantium" or assertions against the string "adamantium").
  2. In the common setup for these tests (e.g., in the test class __init__, setup_method, or a fixture), generate a unique alias per run, such as:
    • self._alias = f"adamantium-{int(time.time())}" or
    • self._alias = f"adamantium-{uuid.uuid4().hex}".
  3. Replace all hardcoded usages of "adamantium" with this dynamically generated self._alias (or the equivalent variable in your setup/fixture).
  4. Update assertions to compare against the constructed alias variable instead of the literal string "adamantium" to keep the tests stable and unique across runs.

Comment on lines +150 to +159
parameters = {
"testing_tempareture": 900000,
"voltage": [1, 2, -4, 7.123],
"creator": "Lex Luther",
"occupation": "super villan",
"equipment_serial_numbers": {"SEM": 14, "AFM": 9134}
}

title = "Adamantium"
alias = "adamantium"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Consider adding assertions that the query is actually filtering on the intended metadata fields

Right now the test only checks that one of the results has alias adamantium, so it doesn’t really prove that the metadata filter is what’s driving the result. Please add at least one record that should not match the filter (e.g., a different creator) and assert that it’s excluded from the results. It would also help to assert key metadata fields on the returned record to confirm that the correct item is being selected based on the metadata filter, not other incidental state.

Suggested implementation:

        parameters = {
            "testing_tempareture": 900000,
            "voltage": [1, 2, -4, 7.123],
            "creator": "Lex Luther",
            "occupation": "super villan",
            "equipment_serial_numbers": {"SEM": 14, "AFM": 9134}
        }

        title = "Adamantium"
        alias = "adamantium"
        matching_record = self._df_api.dataCreate(
            title=title,
            alias=alias,
            metadata=json.dumps(parameters),
            tags=["material"],
            parent_id="root",
        )

        # Create a similar record that should NOT match the metadata filter
        non_matching_parameters = {
            "testing_tempareture": 900000,
            "voltage": [1, 2, -4, 7.123],
            # Different creator to ensure the metadata filter on "creator" excludes this record
            "creator": "Bruce Wayne",
            "occupation": "super villan",
            "equipment_serial_numbers": {"SEM": 14, "AFM": 9134}
        }

        non_matching_title = "Adamantium (control)"
        non_matching_alias = "adamantium-control"
        non_matching_record = self._df_api.dataCreate(
            title=non_matching_title,
            alias=non_matching_alias,
            metadata=json.dumps(non_matching_parameters),
            tags=["material"],
            parent_id="root",
        )

        print("Created matching record")
        print(matching_record)
        print("Created non-matching record")
        print(non_matching_record)

    def test_query_create_delete(self):

To fully implement the suggestion about asserting that the query is actually filtering on the intended metadata fields, you should also update the part of this test that performs the metadata query and asserts on the results (not shown in the snippet). In that section:

  1. Ensure the query explicitly filters on at least one metadata field that differs between the two records, e.g. creator == "Lex Luther".
  2. Assert that:
    • Exactly one record is returned (or, more generally, that the non-matching record is not included).
    • The returned record’s alias is "adamantium" (or matches alias from matching_record).
    • Key metadata fields on the returned record (creator, testing_tempareture, voltage, etc.) match the values in parameters, so you know the correct item was selected because of the metadata filter.
  3. Optionally, if the API allows, assert that the record with alias "adamantium-control" (or non_matching_alias) is not present in the query result set.
    You’ll need to adapt these assertions to the exact query/response shape used later in this test file (e.g. whether results come back as a list, generator, or paginated object).

@JoshuaSBrown JoshuaSBrown merged commit fb1a800 into staging Dec 1, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: CI Component: CLI Relates to command-line interface Component: Core Relates to core service Component: Database Relates to database microservice / data model Component: Foxx Foxx Arango micro services. Component: Python API Relates to Python API Priority: High Highest priority Type: Test Related to unit or integration testing

Projects

None yet

2 participants