-
Notifications
You must be signed in to change notification settings - Fork 0
Update ADA notebook to v4.5 Artifex‑Aether research manifest #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Update ADA notebook to v4.5 Artifex‑Aether research manifest #5
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Summary of ChangesHello @Tuesdaythe13th, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly upgrades the Accessible Deep Agent (ADA) demo to version 4.5, codenamed "Artifex-Aether." The primary goal is to modernize the demo to align with anticipated 2026 workflows and tooling, providing a comprehensive, Colab-ready pipeline. This update introduces advanced capabilities for data processing, including sophisticated ingestion, neural network-based mapping, and integrated bias detection, culminating in agentic synthesis powered by Gemini 2.0. The changes aim to offer a robust and forward-looking demonstration of neuroadaptive research methodologies. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request replaces the existing ADA demo notebook with a new, more comprehensive version. The new notebook is well-structured and demonstrates a full data processing pipeline. My review focuses on improving code robustness, correctness, and maintainability. I've identified several critical issues related to handling empty dataframes which could cause the notebook to crash, as well as some high-severity bugs in package installation and UI rendering. I've also included some medium-severity suggestions to improve code clarity and maintainability.
| "embeddings = model_emb.encode(df_ada['text'].tolist(), show_progress_bar=True)\n", | ||
| "\n", | ||
| "log_ada(\"Discovering Topology via HDBSCAN...\", \"PROC\")\n", | ||
| "clusterer = HDBSCAN(min_cluster_size=min(len(df_ada), 3), metric='euclidean')\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The HDBSCAN min_cluster_size parameter must be greater than 1. The current code min(len(df_ada), 3) can result in 0 or 1 if df_ada has 0 or 1 rows, which will cause a crash. The entire cell's logic (clustering, PCA) is not robust against cases with fewer than 2 nodes. You should add a guard clause at the beginning of the cell to handle this gracefully.
For example:
if len(df_ada) < 2:
log_ada("Not enough nodes for clustering. Skipping.", "WARN")
df_ada['cluster'] = -1
df_ada['x'] = 0
df_ada['y'] = 0
else:
# ... existing clustering and PCA logic ...
clusterer = HDBSCAN(min_cluster_size=max(2, min(len(df_ada), 3)), metric='euclidean')
# ...| " status = \"\u2705 PARITY SECURED\" if parity_score > 0.88 else \"\ud83d\udea8 BIAS DETECTED (ALEXITHYMIC MASK)\"\n", | ||
| " return {\"prediction\": prediction, \"recon\": reconstruction, \"score\": parity_score, \"status\": status}\n", | ||
| "\n", | ||
| "audit_res = run_parity_audit(df_ada['text'].iloc[0])\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| " </div>\n", | ||
| " <div class=\"ax-card\" style=\"margin:0; text-align:center; border-left:none; border-top: 3px solid var(--ax-cyan);\">\n", | ||
| " <div class=\"ax-title\" style=\"font-size:0.7em;\">Neural Clusters</div>\n", | ||
| " <div class=\"ax-stat\" style=\"font-size:1.8em; color:var(--ax-cyan);\">{df_ada['cluster'].nunique()}</div>\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code df_ada['cluster'].nunique() will raise a KeyError if the df_ada DataFrame is empty, because the 'cluster' column will not exist. This can happen if no file was uploaded or no text nodes were extracted. You should handle this case to prevent the notebook from crashing.
<div class="ax-stat" style="font-size:1.8em; color:var(--ax-cyan);">{df_ada['cluster'].nunique() if 'cluster' in df_ada else 0}</div>
| " \"pymupdf4llm==0.2.9\", \"sentence-transformers\", \"google-generativeai\", \n", | ||
| " \"plotly\", \"pandas\", \"scikit-learn>=1.3.0\", \"python-docx\", \"loguru\", \"watermark\"\n", | ||
| "]\n", | ||
| "subprocess.run([sys.executable, \"-m\", \"pip\", \"install\", \"-q\"] + pkgs)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The subprocess.run command to install packages does not check for errors. If pip install fails for any reason (e.g., network issues, package not found), the error will be silenced by the -q flag and the script will continue, only to fail later when trying to import a missing package. It's safer to ensure the command executes successfully.
subprocess.run([sys.executable, "-m", "pip", "install", "-q"] + pkgs, check=True)
| " if ext == 'pdf':\n", | ||
| " with open(fname, \"wb\") as f: f.write(content)\n", | ||
| " return pymupdf4llm.to_markdown(fname)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parse_asset function writes the PDF content to a file on disk and then reads it back with pymupdf4llm. This is inefficient and can cause issues in multi-user or concurrent environments. The pymupdf4llm.to_markdown function can directly accept a byte stream, which avoids this unnecessary disk I/O.
if ext == 'pdf':
return pymupdf4llm.to_markdown(stream=content)
| "display(HTML(f\"\"\"\n", | ||
| "<div class='ax-card' style='border-color: {\"var(--ax-neon)\" if audit_res['score'] > 0.88 else \"var(--ax-red)\"}'>\n", | ||
| " <div class='ax-title'>Bidirectional Parity Audit // Node_0</div>\n", | ||
| " <p><span class='ax-stat'>FORWARD PREDICTION:</span> {audit_res['prediction']}</p>\n", | ||
| " <p><span class='ax-stat'>REVERSE RECONSTRUCTION:</span> {audit_res['recon']}</p>\n", | ||
| " <p><span class='ax-stat'>PARITY SCORE:</span> {audit_res['score']:.4f}</p>\n", | ||
| " <div style='margin-top:10px; font-weight:bold; color:{\"var(--ax-neon)\" if audit_res['score'] > 0.88 else \"var(--ax-red)\"}'>{audit_res['status']}</div>\n", | ||
| "</div>\n", | ||
| "\"\"\"))\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The f-string formatting for setting the CSS border-color and color is incorrect. The expression {"var(--ax-neon)" if ...} creates a Python set, which is then converted to a string like "{'var(--ax-neon)'}". This is invalid CSS. The color value should be the string var(--ax-neon) directly.
border_color = "var(--ax-neon)" if audit_res['score'] > 0.88 else "var(--ax-red)"
display(HTML(f"""
<div class='ax-card' style='border-color: {border_color}'>
<div class='ax-title'>Bidirectional Parity Audit // Node_0</div>
<p><span class='ax-stat'>FORWARD PREDICTION:</span> {audit_res['prediction']}</p>
<p><span class='ax-stat'>REVERSE RECONSTRUCTION:</span> {audit_res['recon']}</p>
<p><span class='ax-stat'>PARITY SCORE:</span> {audit_res['score']:.4f}</p>
<div style='margin-top:10px; font-weight:bold; color:{border_color}'>{audit_res['status']}</div>
</div>
"""))
| "outputs": [], | ||
| "source": [ | ||
| "#@title \u27c1 1.0 KERNEL INITIALIZATION { display-mode: \"form\" }\n", | ||
| "import sys, subprocess, os, json, time, io\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| " fname = list(uploaded.keys())[0]\n", | ||
| " raw_md = parse_asset(fname, uploaded[fname])\n", | ||
| " # Semantic Chunking by Markdown Double-Newline\n", | ||
| " nodes = [n.strip() for n in raw_md.split('\\n\\n') if len(n.strip()) > 50]\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number 50 is used to filter out small text nodes. This is a "magic number". It's better to define it as a constant with a descriptive name at the beginning of the cell (e.g., MIN_NODE_LENGTH = 50). This improves readability and makes it easier to configure the value later.
nodes = [n.strip() for n in raw_md.split('\n\n') if len(n.strip()) > MIN_NODE_LENGTH]
| " v2 = model_emb.encode([reconstruction])\n", | ||
| " parity_score = cosine_similarity(v1, v2)[0][0]\n", | ||
| " \n", | ||
| " status = \"\u2705 PARITY SECURED\" if parity_score > 0.88 else \"\ud83d\udea8 BIAS DETECTED (ALEXITHYMIC MASK)\"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The threshold 0.88 for the parity score is a magic number. It should be defined as a named constant at the top of the cell (e.g., PARITY_SCORE_THRESHOLD = 0.88) to improve readability and maintainability.
status = "\u2705 PARITY SECURED" if parity_score > PARITY_SCORE_THRESHOLD else "\ud83d\udea8 BIAS DETECTED (ALEXITHYMIC MASK)"
| " generation_config={\"response_mime_type\": \"application/json\"}\n", | ||
| " )\n", | ||
| " \n", | ||
| " context = \"\\n\\n\".join(df_ada['text'].head(15).tolist())\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Motivation
Description
accessible_deep_agent_accessibility_demo.ipynbwith the v4.5 Artifex‑Aether manifest, adding kernel initialization, thelog_adaUI helper, and Artifex styling injection.SentenceTransformer('BAAI/bge-m3')and structural extraction viapymupdf4llm.pdf/docx/csv/xlsx, semantic chunking intodf_ada, topology mapping withHDBSCAN+PCAand Plotly visualization, and a bidirectional parity bias audit using embedding cosine similarity.google.generativeai, included final manifest export toADA_Aether_Manifest_2026.csvwith a notebook watermark, and resolved a generated string‑quoting error so the notebook is written cleanly.Testing
Codex Task