-
Notifications
You must be signed in to change notification settings - Fork 16
[DAPS-1774] - Core, Python, Database, Foxx, Test add query end to end tests #1779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer's GuideAdds end-to-end tests for the Query API (Python and web), wires them into the CMake test suite, and fixes/aligns server, Foxx, and messaging behavior around query params and proxy selection needed for these tests to pass. Sequence diagram for end-to-end Query API request with structured paramssequenceDiagram
actor PythonClient
participant CoreDatabaseAPI as Core_DatabaseAPI
participant CoreServer as Core_Server
participant ProxyBasicZMQ as Proxy_Basic_ZMQ
participant ZeroMQCommunicator as ZeroMQ_Communicator
participant ArangoFoxxQueryRouter as Foxx_QueryRouter
PythonClient->>CoreDatabaseAPI: generalSearch(SearchRequest)
CoreDatabaseAPI->>CoreDatabaseAPI: build JSON payload
Note over CoreDatabaseAPI: payload[params] = params (already JSON)
CoreDatabaseAPI->>CoreServer: HTTP POST /query with payload
CoreServer->>ProxyBasicZMQ: forward database request
ProxyBasicZMQ->>ZeroMQCommunicator: send message frame
ZeroMQCommunicator->>ZeroMQCommunicator: validate frame_size
ZeroMQCommunicator-->>ProxyBasicZMQ: deliver message to Foxx
ProxyBasicZMQ-->>ArangoFoxxQueryRouter: HTTP request /query
ArangoFoxxQueryRouter->>ArangoFoxxQueryRouter: validate body with joi
Note over ArangoFoxxQueryRouter: params: joi.object().required()
ArangoFoxxQueryRouter->>ArangoFoxxQueryRouter: execQuery(client, mode, published, query)
ArangoFoxxQueryRouter->>ArangoFoxxQueryRouter: enforce page size using query.params.cnt
ArangoFoxxQueryRouter-->>ProxyBasicZMQ: HTTP response with results
ProxyBasicZMQ-->>CoreServer: response from Foxx
CoreServer-->>CoreDatabaseAPI: HTTP response with query results
CoreDatabaseAPI-->>PythonClient: parsed results
Class diagram for ServerFactory, proxy selection, and DatabaseAPI params handlingclassDiagram
class CoreServer {
+void msgRouter(LogContext log_context, int thread_count)
+void ioInsecure(LogContext log_context, int thread_count)
}
class ServerFactory {
-LogContext m_log_context
+ServerFactory(LogContext log_context)
+std::unique_ptr~IServer~ create(ServerType type, SocketOptions socket_options, SocketCredentials socket_credentials)
}
class IServer {
<<interface>>
+void run()
}
class Proxy {
+Proxy(SocketOptions socket_options, SocketCredentials socket_credentials, LogContext log_context)
+void run()
}
class ProxyBasicZMQ {
+ProxyBasicZMQ(SocketOptions socket_options, SocketCredentials socket_credentials, LogContext log_context)
+void run()
}
class ZeroMQCommunicator {
+void receiveBody(IMessage msg, Buffer buffer, ProtoBufFactory factory, size_t frame_size)
}
class DatabaseAPI {
+void generalSearch(Auth_SearchRequest a_request, string a_user_id, bool a_is_admin)
+uint32_t parseSearchRequest(Auth_SearchRequest a_request, string a_qry_begin, string a_qry_end, string a_qry_filter, string a_params)
}
class Foxx_QueryRouter {
+execQuery(client, mode, published, query)
}
CoreServer --> ServerFactory
ServerFactory ..> IServer
Proxy ..|> IServer
ProxyBasicZMQ ..|> IServer
CoreServer ..> ProxyBasicZMQ : uses PROXY_CUSTOM
ServerFactory ..> ProxyBasicZMQ : returns for PROXY_CUSTOM
ZeroMQCommunicator ..> ProxyBasicZMQ : message transport
DatabaseAPI ..> Foxx_QueryRouter : HTTP query requests
note for DatabaseAPI "generalSearch now sends params as JSON object
parseSearchRequest wraps a_params with braces before sending"
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- tearDown references an undefined
task_idand attempts to delete a hardcoded 'krpytonite' alias instead of the created 'adamantium' record—please correct the alias and properly capture the task ID for cleanup. - setUp is very long and interleaves environment parsing, login, repo creation, and allocation; consider breaking it into helper methods or fixtures to improve readability and reuse.
- The test relies on hardcoded user credentials and environment variables—parameterizing these inputs or moving them into a config helper would make the test more flexible and secure.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- tearDown references an undefined `task_id` and attempts to delete a hardcoded 'krpytonite' alias instead of the created 'adamantium' record—please correct the alias and properly capture the task ID for cleanup.
- setUp is very long and interleaves environment parsing, login, repo creation, and allocation; consider breaking it into helper methods or fixtures to improve readability and reuse.
- The test relies on hardcoded user credentials and environment variables—parameterizing these inputs or moving them into a config helper would make the test more flexible and secure.
## Individual Comments
### Comment 1
<location> `tests/end-to-end/test_api_query.py:181-183` </location>
<code_context>
</code_context>
<issue_to_address>
**issue (code-quality):** Avoid loops in tests. ([`no-loop-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-loop-in-tests))
<details><summary>Explanation</summary>Avoid complex code, like loops, in test functions.
Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals
Some ways to fix this:
* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.
> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.
Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>
### Comment 2
<location> `tests/end-to-end/test_api_query.py:182-183` </location>
<code_context>
</code_context>
<issue_to_address>
**issue (code-quality):** Avoid conditionals in tests. ([`no-conditionals-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-conditionals-in-tests))
<details><summary>Explanation</summary>Avoid complex code, like conditionals, in test functions.
Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals
Some ways to fix this:
* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.
> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.
Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>
### Comment 3
<location> `tests/end-to-end/test_api_query.py:198-204` </location>
<code_context>
while status < 3:
if count > 20:
break
time.sleep(self._timeout)
task_result = self._df_api.taskView(task_id)
status = task_result[0].task[0].status
count = count + 1
</code_context>
<issue_to_address>
**suggestion (code-quality):** Move a guard clause in a while statement's body into its test ([`while-guard-to-condition`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/while-guard-to-condition))
```suggestion
while status < 3 and not count > 20:
time.sleep(self._timeout)
task_result = self._df_api.taskView(task_id)
status = task_result[0].task[0].status
count = count + 1
```
<br/><details><summary>Explanation</summary>Removing the guard clause simplifies the code and makes clearer the intention of
the loop.
</details>
</issue_to_address>
### Comment 4
<location> `tests/end-to-end/test_api_query.py:90` </location>
<code_context>
def setUp(self):
path_of_file = os.path.abspath(__file__)
current_folder = os.path.dirname(path_of_file)
path_to_python_datafed_module = os.path.normpath(
current_folder
+ os.sep
+ ".."
+ os.sep
+ ".."
+ os.sep
+ "python/datafed_pkg"
)
sys.path.insert(0, path_to_python_datafed_module)
try:
from datafed.CommandLib import API
except ImportError:
print(
"datafed was not found, make sure you are running script with "
"PYTHONPATH set to the location of the package in the datafed repo"
)
sys.exit(1)
from datafed import version as df_ver
print(df_ver)
datafed_domain = os.environ.get("DATAFED_DOMAIN")
opts = {"server_host": datafed_domain}
if datafed_domain is None:
print("DATAFED_DOMAIN must be set before the end-to-end tests can be run")
sys.exit(1)
self._df_api = API(opts)
self._username = "datafed89"
password = os.environ.get("DATAFED_USER89_PASSWORD")
self._timeout = int(os.environ.get('DATAFED_TEST_TIMEOUT_OVERRIDE', '1'));
count = 0
while True:
try:
self._df_api.loginByPassword(self._username, password)
break
except BaseException:
pass
count += 1
# Try three times to authenticate
assert count < 3
path_to_repo_form = os.environ.get("DATAFED_REPO_FORM_PATH")
if path_to_repo_form is None:
self.fail("DATAFED_REPO_FORM_PATH env variable is not defined")
if not path_to_repo_form.endswith(".json"):
self.fail(
"repo create test requires that the repo form exist and be "
"provided as a json file, the test uses the environment "
"variable DATAFED_REPO_PATH to search for the repo form"
)
self._repo_form = {}
with open(path_to_repo_form) as json_file:
self._repo_form = json.load(json_file)
if len(self._repo_form["exp_path"]) == 0:
print(
"exp_path is empty, we will set it to / for the test. This is "
"cruft and should be removed anyway"
)
self._repo_form["exp_path"] = "/"
self._repo_form["admins"] = ["u/" + self._username]
# Create the repositories
result = self._df_api.repoCreate(
repo_id=self._repo_form["id"],
title=self._repo_form["title"],
desc=self._repo_form["desc"],
domain=self._repo_form["domain"],
capacity=self._repo_form["capacity"],
pub_key=self._repo_form["pub_key"],
address=self._repo_form["address"],
endpoint=self._repo_form["endpoint"],
path=self._repo_form["path"],
exp_path=self._repo_form["exp_path"],
admins=self._repo_form["admins"],
)
result = self._df_api.repoList(list_all=True)
count = 0
while len(result[0].repo) == 0:
time.sleep(self._timeout)
result = self._df_api.repoList(list_all=True)
count = count + 1
if count > 3:
self.fail("Setup failed with repo create")
self._repo_id = self._repo_form["id"]
if not self._repo_id.startswith("repo/"):
self._repo_id = "repo/" + self._repo_id
# Will return a task
result = self._df_api.repoAllocationCreate(
repo_id=self._repo_id,
subject="datafed89",
data_limit=1000000000,
rec_limit=100,
)
task_id = result[0].task[0].id
# Check the status of the task
task_result = self._df_api.taskView(task_id)
# If status is less than 3 it is in the works
status = task_result[0].task[0].status
count = 0
while status < 3:
if count > 2:
print(task_result)
self.fail(
"Something went wrong task was unable to complete, attempt "
"to create an allocation after 3 seconds failed, make sure "
"all services are running."
)
break
time.sleep(self._timeout)
task_result = self._df_api.taskView(task_id)
status = task_result[0].task[0].status
count = count + 1
parameters = {
"testing_tempareture": 900000,
"voltage": [1, 2, -4, 7.123],
"creator": "Lex Luther",
"occupation": "super villan",
"equipment_serial_numbers": {"SEM": 14, "AFM": 9134},
}
title = "Adamantium"
alias = "adamantium"
data_result = self._df_api.dataCreate(
title=title,
alias=alias,
metadata=json.dumps(parameters),
tags=["material"],
parent_id="root",
)
</code_context>
<issue_to_address>
**issue (code-quality):** Use f-string instead of string concatenation [×2] ([`use-fstring-for-concatenation`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-fstring-for-concatenation/))
</issue_to_address>
### Comment 5
<location> `tests/end-to-end/test_api_query.py:173` </location>
<code_context>
def test_query_create_delete(self):
search_query = self._df_api.queryCreate(title="Search for Adamantium", owner="u/" + self._username, coll=["root"], meta="md.creator == 'Lex Luther'")
print("Search query create")
print(search_query)
query_result=self._df_api.queryExec(search_query[0].id)
material = ""
for model in query_result[0].item:
if model.alias.startswith("adamantium"):
material = model.alias
print(f"Query found {material}")
self.assertEqual(material, "adamantium")
self._df_api.queryDelete(search_query[0].id)
</code_context>
<issue_to_address>
**issue (code-quality):** Use f-string instead of string concatenation ([`use-fstring-for-concatenation`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-fstring-for-concatenation/))
</issue_to_address>
### Comment 6
<location> `tests/end-to-end/test_api_query.py:240` </location>
<code_context>
def tearDown(self):
task_result = self._df_api.dataDelete("krpytonite")
status = task_result[0].task[0].status
count = 0
while status < 3:
if count > 20:
break
time.sleep(self._timeout)
task_result = self._df_api.taskView(task_id)
status = task_result[0].task[0].status
count = count + 1
print("Delete record result.")
print(task_result)
result = self._df_api.repoAllocationDelete(
repo_id=self._repo_id, subject="datafed89"
)
task_id = result[0].task[0].id
# Check the status of the task
task_result = self._df_api.taskView(task_id)
# If status is less than 3 it is in the works
status = task_result[0].task[0].status
count = 0
while status < 3:
if count > 2:
print(task_result)
self.fail(
"Something went wrong task was unable to complete, attempt"
" to delete an allocation after 3 seconds failed, make sure"
" all services are running."
)
break
time.sleep(self._timeout)
task_result = self._df_api.taskView(task_id)
status = task_result[0].task[0].status
count = count + 1
print("Delete Allocations")
print(result)
repo_id = self._repo_form["id"]
if not repo_id.startswith("repo/"):
repo_id = "repo/" + repo_id
result = self._df_api.repoDelete(repo_id)
result = self._df_api.repoList(list_all=True)
</code_context>
<issue_to_address>
**issue (code-quality):** Use f-string instead of string concatenation ([`use-fstring-for-concatenation`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-fstring-for-concatenation/))
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
tests/end-to-end/test_api_query.py
Outdated
| for model in query_result[0].item: | ||
| if model.alias.startswith("adamantium"): | ||
| material = model.alias |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Avoid loops in tests. (no-loop-in-tests)
Explanation
Avoid complex code, like loops, in test functions.Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
- loops
- conditionals
Some ways to fix this:
- Use parametrized tests to get rid of the loop.
- Move the complex logic into helpers.
- Move the complex part into pytest fixtures.
Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.
Software Engineering at Google / Don't Put Logic in Tests
tests/end-to-end/test_api_query.py
Outdated
| if model.alias.startswith("adamantium"): | ||
| material = model.alias |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Avoid conditionals in tests. (no-conditionals-in-tests)
Explanation
Avoid complex code, like conditionals, in test functions.Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
- loops
- conditionals
Some ways to fix this:
- Use parametrized tests to get rid of the loop.
- Move the complex logic into helpers.
- Move the complex part into pytest fixtures.
Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.
Software Engineering at Google / Don't Put Logic in Tests
| ) | ||
| self._repo_form["exp_path"] = "/" | ||
|
|
||
| self._repo_form["admins"] = ["u/" + self._username] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Use f-string instead of string concatenation [×2] (use-fstring-for-concatenation)
|
|
||
| def test_query_create_delete(self): | ||
|
|
||
| search_query = self._df_api.queryCreate(title="Search for Adamantium", owner="u/" + self._username, coll=["root"], meta="md.creator == 'Lex Luther'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Use f-string instead of string concatenation (use-fstring-for-concatenation)
…params object schema to routes. (#1781)
|
It looks like this PR, is failing when it should now be passing, the fix was merged in but the query end-to-end test is not displaying any results. We are going to run the CI without cleaning up afterwards, log in under the test user account and look to see if the record and query both exist. One possibility is that the record document is not yet registered in the datbase before the query tries to find it, even though the record is created before hand. This could be the result of isolation levels, and sandboxing interfering with the test execution. |
…OM correctly defined
…XCEPT call due to callin zmq_msg with invalid state after closing it.
…oved to a folder named requirements.txt during cmake configure script.
|
@sourcery-ai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- The change to use ServerType::PROXY_CUSTOM when instantiating ProxyBasicZMQ in ServerFactory and Server::msgRouter/ioInsecure is a bit confusing; consider aligning the enum value name with the concrete implementation or adding a brief comment to explain why PROXY_CUSTOM maps to ProxyBasicZMQ.
- The Foxx query routes now validate
paramsas an object while DatabaseAPI still manipulatesparamsas strings (e.g., wrapping with braces in parseSearchRequest); it would be safer to normalizeparamsto a single structured representation across the C++ and Foxx layers to avoid subtle serialization issues. - The
message("ENABLE END TO END!!!!!!!")in tests/end-to-end/web-UI/CMakeLists.txt looks like a leftover debug print; consider removing or gating it with a verbose/debug option to keep the CMake output clean.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The change to use ServerType::PROXY_CUSTOM when instantiating ProxyBasicZMQ in ServerFactory and Server::msgRouter/ioInsecure is a bit confusing; consider aligning the enum value name with the concrete implementation or adding a brief comment to explain why PROXY_CUSTOM maps to ProxyBasicZMQ.
- The Foxx query routes now validate `params` as an object while DatabaseAPI still manipulates `params` as strings (e.g., wrapping with braces in parseSearchRequest); it would be safer to normalize `params` to a single structured representation across the C++ and Foxx layers to avoid subtle serialization issues.
- The `message("ENABLE END TO END!!!!!!!")` in tests/end-to-end/web-UI/CMakeLists.txt looks like a leftover debug print; consider removing or gating it with a verbose/debug option to keep the CMake output clean.
## Individual Comments
### Comment 1
<location> `core/server/DatabaseAPI.cpp:1311` </location>
<code_context>
payload["qry_begin"] = qry_begin;
payload["qry_end"] = qry_end;
payload["qry_filter"] = qry_filter;
- payload["params"] = "{" + params + "}";
+ payload["params"] = params;
payload["limit"] = to_string(cnt);
</code_context>
<issue_to_address>
**issue (bug_risk):** The params type between DatabaseAPI and Foxx query_router now appears inconsistent (string vs object).
On the Foxx side, `params` is now required to be a Joi object and is passed directly into `execQuery` without `JSON.parse`. On the C++ side, `params` is still a `std::string`, so the request body will contain `"params": "{...}"` (a JSON string that looks like an object), while Foxx expects a real JSON object. This mismatch will likely break Joi validation or cause runtime errors when accessing fields like `query.params.cnt`.
To fix this, either:
- Change `params` in `DatabaseAPI` to a structured JSON value and assign it directly so `payload["params"]` is an object, or
- Keep `params` as a string but revert to parsing it in Foxx and loosen the Joi schema accordingly.
</issue_to_address>
### Comment 2
<location> `core/server/DatabaseAPI.cpp:3944` </location>
<code_context>
a_qry_begin = a_qry_begin;
a_qry_end = a_qry_end;
a_qry_filter = a_qry_filter;
-
+ a_params = "{" + a_params + "}";
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Wrapping a_params in braces here may double-wrap or otherwise distort the params string.
Given that parseSearchRequest now does `a_params = "{" + a_params + "}";`, this can: (1) generate invalid JSON if `a_params` is already a complete JSON object string (e.g. `"{\"cnt\":10}"` → `"{{...}}"`), and (2) change semantics when `a_params` is empty (becoming `"{}"`). It would be more robust to build `a_params` as a proper JSON object once in the earlier builder logic, rather than wrapping a string here. If wrapping must happen here, please guard against cases where `a_params` is already brace-wrapped.
Suggested implementation:
```cpp
a_qry_begin = a_qry_begin;
a_qry_end = a_qry_end;
a_qry_filter = a_qry_filter;
// Ensure we only wrap a_params in braces if it is not already a JSON object string.
// This avoids double-wrapping (e.g. "{\"cnt\":10}" -> "{{...}}") and preserves
// existing well-formed JSON. If a_params is empty or whitespace, leave it as-is.
if (!a_params.empty()) {
const auto first = a_params.find_first_not_of(" \t\n\r");
const auto last = a_params.find_last_not_of(" \t\n\r");
if (first != string::npos && last != string::npos) {
const char first_char = a_params[first];
const char last_char = a_params[last];
// Only wrap if it doesn't already look like a JSON object.
if (!(first_char == '{' && last_char == '}')) {
a_params = "{" + a_params + "}";
}
}
}
return cnt;
}
```
- Ideally, the code constructing `a_params` earlier (e.g., in the builder logic or `parseSearchRequest`) should be updated so that it always produces a valid JSON object string. Once that is done, the wrapping/normalization logic here can be removed entirely, and this function can assume `a_params` is already well-formed JSON.
- If there are other call sites or helper functions that also wrap `a_params` with braces, they should be updated to either:
- Follow the same guard pattern used here, or
- Be refactored so that only one place is responsible for producing a JSON object for `a_params`.
</issue_to_address>
### Comment 3
<location> `tests/end-to-end/test_api_query.py:17-26` </location>
<code_context>
+class TestDataFedPythonAPIQueryCRUD(unittest.TestCase):
</code_context>
<issue_to_address>
**issue (testing):** Commented-out tearDown and cleanup can lead to test flakiness and environment pollution
The tests create repos, allocations, records, and queries, but all cleanup (`tearDown` and `queryDelete`) is commented out. In a shared end-to-end environment this can leave persistent state behind, causing conflicts between runs and flaky, non-idempotent tests.
Please:
- Restore and validate `tearDown` so it reliably removes any repos, allocations, records, and queries created by the tests.
- Move `queryDelete` into `tearDown` (or another shared cleanup path) if that matches the test design.
If full cleanup isn’t feasible due to shared resources, add a brief comment explaining the constraint and choose IDs/aliases that minimize interference with other tests or runs.
</issue_to_address>
### Comment 4
<location> `tests/end-to-end/test_api_query.py:53-62` </location>
<code_context>
+ self._username = "datafed89"
</code_context>
<issue_to_address>
**suggestion (testing):** Hardcoded user and record identifiers may cause conflicts across runs or environments
The test’s reliance on a specific account (`datafed89`) and fixed alias (`adamantium`) makes it brittle and environment-dependent. It can cause collisions with other tests using the same values and fail in environments where that user isn’t provisioned.
To improve robustness:
- Read the username from an environment variable so each environment can control which account is used.
- Generate a unique alias per run (e.g., with a timestamp or random suffix) and assert against that constructed value instead of a hardcoded string.
Suggested implementation:
```python
self._df_api = API(opts)
# Use environment variables for credentials to avoid hardcoded, environment-specific values
self._username = os.environ.get("DATAFED_USERNAME")
if not self._username:
print("DATAFED_USERNAME must be set before the end-to-end tests can be run")
sys.exit(1)
password = os.environ.get("DATAFED_PASSWORD")
if not password:
print("DATAFED_PASSWORD must be set before the end-to-end tests can be run")
sys.exit(1)
```
To fully implement your comment about avoiding hardcoded record identifiers (e.g., alias "adamantium"), you should:
1. Locate where the hardcoded alias is used in this test file (likely something like `alias="adamantium"` or assertions against the string `"adamantium"`).
2. In the common setup for these tests (e.g., in the test class `__init__`, `setup_method`, or a fixture), generate a unique alias per run, such as:
- `self._alias = f"adamantium-{int(time.time())}"` or
- `self._alias = f"adamantium-{uuid.uuid4().hex}"`.
3. Replace all hardcoded usages of `"adamantium"` with this dynamically generated `self._alias` (or the equivalent variable in your setup/fixture).
4. Update assertions to compare against the constructed alias variable instead of the literal string `"adamantium"` to keep the tests stable and unique across runs.
</issue_to_address>
### Comment 5
<location> `tests/end-to-end/test_api_query.py:150-159` </location>
<code_context>
+ status = task_result[0].task[0].status
+ count = count + 1
+
+ parameters = {
+ "testing_tempareture": 900000,
+ "voltage": [1, 2, -4, 7.123],
+ "creator": "Lex Luther",
+ "occupation": "super villan",
+ "equipment_serial_numbers": {"SEM": 14, "AFM": 9134}
+ }
+
+ title = "Adamantium"
+ alias = "adamantium"
+ record = self._df_api.dataCreate(
+ title=title,
+ alias=alias,
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding assertions that the query is actually filtering on the intended metadata fields
Right now the test only checks that one of the results has alias `adamantium`, so it doesn’t really prove that the metadata filter is what’s driving the result. Please add at least one record that should not match the filter (e.g., a different `creator`) and assert that it’s excluded from the results. It would also help to assert key metadata fields on the returned record to confirm that the correct item is being selected based on the metadata filter, not other incidental state.
Suggested implementation:
```python
parameters = {
"testing_tempareture": 900000,
"voltage": [1, 2, -4, 7.123],
"creator": "Lex Luther",
"occupation": "super villan",
"equipment_serial_numbers": {"SEM": 14, "AFM": 9134}
}
title = "Adamantium"
alias = "adamantium"
matching_record = self._df_api.dataCreate(
title=title,
alias=alias,
metadata=json.dumps(parameters),
tags=["material"],
parent_id="root",
)
# Create a similar record that should NOT match the metadata filter
non_matching_parameters = {
"testing_tempareture": 900000,
"voltage": [1, 2, -4, 7.123],
# Different creator to ensure the metadata filter on "creator" excludes this record
"creator": "Bruce Wayne",
"occupation": "super villan",
"equipment_serial_numbers": {"SEM": 14, "AFM": 9134}
}
non_matching_title = "Adamantium (control)"
non_matching_alias = "adamantium-control"
non_matching_record = self._df_api.dataCreate(
title=non_matching_title,
alias=non_matching_alias,
metadata=json.dumps(non_matching_parameters),
tags=["material"],
parent_id="root",
)
print("Created matching record")
print(matching_record)
print("Created non-matching record")
print(non_matching_record)
def test_query_create_delete(self):
```
To fully implement the suggestion about asserting that the query is actually filtering on the intended metadata fields, you should also update the part of this test that performs the metadata query and asserts on the results (not shown in the snippet). In that section:
1) Ensure the query explicitly filters on at least one metadata field that differs between the two records, e.g. `creator == "Lex Luther"`.
2) Assert that:
- Exactly one record is returned (or, more generally, that the non-matching record is not included).
- The returned record’s alias is `"adamantium"` (or matches `alias` from `matching_record`).
- Key metadata fields on the returned record (`creator`, `testing_tempareture`, `voltage`, etc.) match the values in `parameters`, so you know the correct item was selected because of the metadata filter.
3) Optionally, if the API allows, assert that the record with alias `"adamantium-control"` (or `non_matching_alias`) is not present in the query result set.
You’ll need to adapt these assertions to the exact query/response shape used later in this test file (e.g. whether results come back as a list, generator, or paginated object).
</issue_to_address>
### Comment 6
<location> `tests/end-to-end/test_api_query.py:183-185` </location>
<code_context>
</code_context>
<issue_to_address>
**issue (code-quality):** Avoid loops in tests. ([`no-loop-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-loop-in-tests))
<details><summary>Explanation</summary>Avoid complex code, like loops, in test functions.
Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals
Some ways to fix this:
* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.
> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.
Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>
### Comment 7
<location> `tests/end-to-end/test_api_query.py:184-185` </location>
<code_context>
</code_context>
<issue_to_address>
**issue (code-quality):** Avoid conditionals in tests. ([`no-conditionals-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-conditionals-in-tests))
<details><summary>Explanation</summary>Avoid complex code, like conditionals, in test functions.
Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals
Some ways to fix this:
* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.
> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.
Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>
### Comment 8
<location> `tests/end-to-end/test_api_query.py:90` </location>
<code_context>
def setUp(self):
path_of_file = os.path.abspath(__file__)
current_folder = os.path.dirname(path_of_file)
path_to_python_datafed_module = os.path.normpath(
current_folder
+ os.sep
+ ".."
+ os.sep
+ ".."
+ os.sep
+ "python/datafed_pkg"
)
sys.path.insert(0, path_to_python_datafed_module)
try:
from datafed.CommandLib import API
except ImportError:
print(
"datafed was not found, make sure you are running script with "
"PYTHONPATH set to the location of the package in the datafed repo"
)
sys.exit(1)
from datafed import version as df_ver
print(df_ver)
datafed_domain = os.environ.get("DATAFED_DOMAIN")
opts = {"server_host": datafed_domain}
if datafed_domain is None:
print("DATAFED_DOMAIN must be set before the end-to-end tests can be run")
sys.exit(1)
self._df_api = API(opts)
self._username = "datafed89"
password = os.environ.get("DATAFED_USER89_PASSWORD")
self._timeout = int(os.environ.get('DATAFED_TEST_TIMEOUT_OVERRIDE', '1'))
count = 0
while True:
try:
self._df_api.loginByPassword(self._username, password)
break
except BaseException:
pass
count += 1
# Try three times to authenticate
assert count < 3
path_to_repo_form = os.environ.get("DATAFED_REPO_FORM_PATH")
if path_to_repo_form is None:
self.fail("DATAFED_REPO_FORM_PATH env variable is not defined")
if not path_to_repo_form.endswith(".json"):
self.fail(
"repo create test requires that the repo form exist and be "
"provided as a json file, the test uses the environment "
"variable DATAFED_REPO_PATH to search for the repo form"
)
self._repo_form = {}
with open(path_to_repo_form) as json_file:
self._repo_form = json.load(json_file)
if len(self._repo_form["exp_path"]) == 0:
print(
"exp_path is empty, we will set it to / for the test. This is "
"cruft and should be removed anyway"
)
self._repo_form["exp_path"] = "/"
self._repo_form["admins"] = ["u/" + self._username]
# Create the repositories
result = self._df_api.repoCreate(
repo_id=self._repo_form["id"],
title=self._repo_form["title"],
desc=self._repo_form["desc"],
domain=self._repo_form["domain"],
capacity=self._repo_form["capacity"],
pub_key=self._repo_form["pub_key"],
address=self._repo_form["address"],
endpoint=self._repo_form["endpoint"],
path=self._repo_form["path"],
exp_path=self._repo_form["exp_path"],
admins=self._repo_form["admins"],
)
result = self._df_api.repoList(list_all=True)
count = 0
while len(result[0].repo) == 0:
time.sleep(self._timeout)
result = self._df_api.repoList(list_all=True)
count = count + 1
if count > 3:
self.fail("Setup failed with repo create")
self._repo_id = self._repo_form["id"]
if not self._repo_id.startswith("repo/"):
self._repo_id = "repo/" + self._repo_id
# Will return a task
result = self._df_api.repoAllocationCreate(
repo_id=self._repo_id,
subject=self._username,
data_limit=1000000000,
rec_limit=100,
)
task_id = result[0].task[0].id
# Check the status of the task
task_result = self._df_api.taskView(task_id)
# If status is less than 3 it is in the works
status = task_result[0].task[0].status
count = 0
while status < 3:
if count > 2:
print(task_result)
self.fail(
"Something went wrong task was unable to complete, attempt "
"to create an allocation after 3 seconds failed, make sure "
"all services are running."
)
break
time.sleep(self._timeout)
task_result = self._df_api.taskView(task_id)
status = task_result[0].task[0].status
count = count + 1
parameters = {
"testing_tempareture": 900000,
"voltage": [1, 2, -4, 7.123],
"creator": "Lex Luther",
"occupation": "super villan",
"equipment_serial_numbers": {"SEM": 14, "AFM": 9134}
}
title = "Adamantium"
alias = "adamantium"
record = self._df_api.dataCreate(
title=title,
alias=alias,
metadata=json.dumps(parameters),
tags=["material"],
parent_id="root",
)
print("Created record")
print(record)
</code_context>
<issue_to_address>
**issue (code-quality):** Use f-string instead of string concatenation [×2] ([`use-fstring-for-concatenation`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-fstring-for-concatenation/))
</issue_to_address>
### Comment 9
<location> `tests/end-to-end/test_api_query.py:173` </location>
<code_context>
def test_query_create_delete(self):
search_query = self._df_api.queryCreate(title="Search for Adamantium", owner="u/" + self._username, coll=["root"], meta="md.creator == 'Lex Luther'")
print("Search query create")
print(search_query)
query_result = self._df_api.queryExec(search_query[0].id)
print("Query result")
print(query_result)
material = ""
for model in query_result[0].item:
if model.alias.startswith("adamantium"):
material = model.alias
print(f"Query found {material}")
self.assertEqual(material, "adamantium")
</code_context>
<issue_to_address>
**issue (code-quality):** Use f-string instead of string concatenation ([`use-fstring-for-concatenation`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-fstring-for-concatenation/))
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
|
||
| a_qry_begin = a_qry_begin; | ||
| a_qry_end = a_qry_end; | ||
| a_qry_filter = a_qry_filter; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Wrapping a_params in braces here may double-wrap or otherwise distort the params string.
Given that parseSearchRequest now does a_params = "{" + a_params + "}";, this can: (1) generate invalid JSON if a_params is already a complete JSON object string (e.g. "{\"cnt\":10}" → "{{...}}"), and (2) change semantics when a_params is empty (becoming "{}"). It would be more robust to build a_params as a proper JSON object once in the earlier builder logic, rather than wrapping a string here. If wrapping must happen here, please guard against cases where a_params is already brace-wrapped.
Suggested implementation:
a_qry_begin = a_qry_begin;
a_qry_end = a_qry_end;
a_qry_filter = a_qry_filter;
// Ensure we only wrap a_params in braces if it is not already a JSON object string.
// This avoids double-wrapping (e.g. "{\"cnt\":10}" -> "{{...}}") and preserves
// existing well-formed JSON. If a_params is empty or whitespace, leave it as-is.
if (!a_params.empty()) {
const auto first = a_params.find_first_not_of(" \t\n\r");
const auto last = a_params.find_last_not_of(" \t\n\r");
if (first != string::npos && last != string::npos) {
const char first_char = a_params[first];
const char last_char = a_params[last];
// Only wrap if it doesn't already look like a JSON object.
if (!(first_char == '{' && last_char == '}')) {
a_params = "{" + a_params + "}";
}
}
}
return cnt;
}
- Ideally, the code constructing
a_paramsearlier (e.g., in the builder logic orparseSearchRequest) should be updated so that it always produces a valid JSON object string. Once that is done, the wrapping/normalization logic here can be removed entirely, and this function can assumea_paramsis already well-formed JSON. - If there are other call sites or helper functions that also wrap
a_paramswith braces, they should be updated to either:- Follow the same guard pattern used here, or
- Be refactored so that only one place is responsible for producing a JSON object for
a_params.
| self._username = "datafed89" | ||
| password = os.environ.get("DATAFED_USER89_PASSWORD") | ||
|
|
||
| self._timeout = int(os.environ.get('DATAFED_TEST_TIMEOUT_OVERRIDE', '1')) | ||
| count = 0 | ||
| while True: | ||
| try: | ||
| self._df_api.loginByPassword(self._username, password) | ||
| break | ||
| except BaseException: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Hardcoded user and record identifiers may cause conflicts across runs or environments
The test’s reliance on a specific account (datafed89) and fixed alias (adamantium) makes it brittle and environment-dependent. It can cause collisions with other tests using the same values and fail in environments where that user isn’t provisioned.
To improve robustness:
- Read the username from an environment variable so each environment can control which account is used.
- Generate a unique alias per run (e.g., with a timestamp or random suffix) and assert against that constructed value instead of a hardcoded string.
Suggested implementation:
self._df_api = API(opts)
# Use environment variables for credentials to avoid hardcoded, environment-specific values
self._username = os.environ.get("DATAFED_USERNAME")
if not self._username:
print("DATAFED_USERNAME must be set before the end-to-end tests can be run")
sys.exit(1)
password = os.environ.get("DATAFED_PASSWORD")
if not password:
print("DATAFED_PASSWORD must be set before the end-to-end tests can be run")
sys.exit(1)To fully implement your comment about avoiding hardcoded record identifiers (e.g., alias "adamantium"), you should:
- Locate where the hardcoded alias is used in this test file (likely something like
alias="adamantium"or assertions against the string"adamantium"). - In the common setup for these tests (e.g., in the test class
__init__,setup_method, or a fixture), generate a unique alias per run, such as:self._alias = f"adamantium-{int(time.time())}"orself._alias = f"adamantium-{uuid.uuid4().hex}".
- Replace all hardcoded usages of
"adamantium"with this dynamically generatedself._alias(or the equivalent variable in your setup/fixture). - Update assertions to compare against the constructed alias variable instead of the literal string
"adamantium"to keep the tests stable and unique across runs.
| parameters = { | ||
| "testing_tempareture": 900000, | ||
| "voltage": [1, 2, -4, 7.123], | ||
| "creator": "Lex Luther", | ||
| "occupation": "super villan", | ||
| "equipment_serial_numbers": {"SEM": 14, "AFM": 9134} | ||
| } | ||
|
|
||
| title = "Adamantium" | ||
| alias = "adamantium" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Consider adding assertions that the query is actually filtering on the intended metadata fields
Right now the test only checks that one of the results has alias adamantium, so it doesn’t really prove that the metadata filter is what’s driving the result. Please add at least one record that should not match the filter (e.g., a different creator) and assert that it’s excluded from the results. It would also help to assert key metadata fields on the returned record to confirm that the correct item is being selected based on the metadata filter, not other incidental state.
Suggested implementation:
parameters = {
"testing_tempareture": 900000,
"voltage": [1, 2, -4, 7.123],
"creator": "Lex Luther",
"occupation": "super villan",
"equipment_serial_numbers": {"SEM": 14, "AFM": 9134}
}
title = "Adamantium"
alias = "adamantium"
matching_record = self._df_api.dataCreate(
title=title,
alias=alias,
metadata=json.dumps(parameters),
tags=["material"],
parent_id="root",
)
# Create a similar record that should NOT match the metadata filter
non_matching_parameters = {
"testing_tempareture": 900000,
"voltage": [1, 2, -4, 7.123],
# Different creator to ensure the metadata filter on "creator" excludes this record
"creator": "Bruce Wayne",
"occupation": "super villan",
"equipment_serial_numbers": {"SEM": 14, "AFM": 9134}
}
non_matching_title = "Adamantium (control)"
non_matching_alias = "adamantium-control"
non_matching_record = self._df_api.dataCreate(
title=non_matching_title,
alias=non_matching_alias,
metadata=json.dumps(non_matching_parameters),
tags=["material"],
parent_id="root",
)
print("Created matching record")
print(matching_record)
print("Created non-matching record")
print(non_matching_record)
def test_query_create_delete(self):To fully implement the suggestion about asserting that the query is actually filtering on the intended metadata fields, you should also update the part of this test that performs the metadata query and asserts on the results (not shown in the snippet). In that section:
- Ensure the query explicitly filters on at least one metadata field that differs between the two records, e.g.
creator == "Lex Luther". - Assert that:
- Exactly one record is returned (or, more generally, that the non-matching record is not included).
- The returned record’s alias is
"adamantium"(or matchesaliasfrommatching_record). - Key metadata fields on the returned record (
creator,testing_tempareture,voltage, etc.) match the values inparameters, so you know the correct item was selected because of the metadata filter.
- Optionally, if the API allows, assert that the record with alias
"adamantium-control"(ornon_matching_alias) is not present in the query result set.
You’ll need to adapt these assertions to the exact query/response shape used later in this test file (e.g. whether results come back as a list, generator, or paginated object).
Ticket
Description
How Has This Been Tested?
Artifacts (if appropriate):
Tasks
Summary by Sourcery
Add end-to-end query tests and align backend query handling and proxy configuration to support them.
New Features:
Bug Fixes:
Enhancements:
Tests: