Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
85 commits
Select commit Hold shift + click to select a range
e37894b
Implement initial version of Leann KVS with semantic search functiona…
Yoshiki0319 Nov 9, 2025
cd95cbb
Add Leann KVS implementation with semantic search and initial data en…
Yoshishi380 Nov 9, 2025
ed6ee60
Refactor semantic search implementation and improve error handling
Yoshishi380 Nov 10, 2025
e7ef14f
Add README for LEANN Semantic Search demo
Yoshiki0319 Nov 10, 2025
924f16c
Beginning of Installation instructions
steveoVivo Nov 11, 2025
9ac5e37
Remove bazelversion (local) and carraige returns from shell scripts
steveoVivo Nov 13, 2025
8b4a954
Added instructions for dealing with C++ errors to the README
steveoVivo Nov 14, 2025
33b2182
Reorganized README and added more bugfixes for installation
steveoVivo Nov 14, 2025
c2e2dbe
Better markdown formatting for README
steveoVivo Nov 14, 2025
7076ad7
Add installation instructions for Bazel 6.0.0
Yoshiki0319 Nov 19, 2025
8a21778
Add instructions for running ResDB-ORM
Yoshiki0319 Nov 19, 2025
c0d035b
Update Step 4 reference to match previous steps
Yoshiki0319 Nov 19, 2025
d1dec34
Fix wget command formatting in README
Yoshiki0319 Nov 20, 2025
62028c0
Fix git clone commands in README
Yoshiki0319 Nov 20, 2025
b0806cf
Fix formatting in README for virtual environment setup
Yoshiki0319 Nov 20, 2025
e9edddb
Add resdb-orm installation to README
Yoshiki0319 Nov 20, 2025
88bed2b
Fix clone command directory in README
Yoshiki0319 Nov 20, 2025
8deaef8
Update README for virtual environment instructions
Yoshiki0319 Nov 20, 2025
7f41ddc
Add configuration, indexing, and searching functionality for ResDB-ORM
Yoshiki0319 Nov 20, 2025
05390f8
Merge branch 'master' of https://github.com/Yoshiki0319/indexers-ECS2…
Yoshiki0319 Nov 20, 2025
60e5802
Create README for ResilientDB x LEANN integration
Yoshiki0319 Nov 20, 2025
6a1177e
Improve README formatting and section headings
Yoshiki0319 Nov 20, 2025
bb646f6
Format commands in troubleshooting section as code
Yoshiki0319 Nov 20, 2025
6ddbbef
Clarify access instructions for ResilientDB KV Service
Yoshiki0319 Nov 20, 2025
ef8fccb
Update README.md for clarity and conciseness
Yoshiki0319 Nov 20, 2025
58205aa
Remove checkmark from index update print statement
Yoshiki0319 Nov 20, 2025
4121b47
Refactor ResDB configuration path and enhance indexer functionality w…
Yoshiki0319 Nov 21, 2025
97648b7
Revise README with version update notice
Yoshiki0319 Nov 21, 2025
deb6bcf
Enhance SafeResDBORM functionality with improved error handling, soft…
Yoshiki0319 Nov 21, 2025
e0408b5
Merge branch 'master' of https://github.com/Yoshiki0319/indexers-ECS2…
Yoshiki0319 Nov 21, 2025
ec4b2f4
Refactor diagnose_db.py with English comments, enhance indexer.py ope…
Yoshiki0319 Nov 22, 2025
7761faf
Enhance README with detailed service documentation
Yoshiki0319 Nov 22, 2025
05458e1
Fix formatting in README.md for data manager section
Yoshiki0319 Nov 22, 2025
a6852fd
Remove obsolete id_mapping.json, resdb.ids.txt, and related files to …
Yoshiki0319 Nov 22, 2025
20e2c98
Merge branch 'master' of https://github.com/Yoshiki0319/indexers-ECS2…
Yoshiki0319 Nov 22, 2025
9b7e2da
Mark integration as under construction in README
Yoshiki0319 Nov 22, 2025
693f6bb
Added bash file to recursively clear all carraige returns from all .s…
steveoVivo Nov 24, 2025
e8ddf71
Updated the README to use the local builds of ecosystem tools / Expla…
steveoVivo Nov 24, 2025
6760e53
Fixed code typos found after re-building resdb-orm
steveoVivo Nov 24, 2025
74d018b
Added script to start all helper tools + ResDB-orm, minor changes to …
steveoVivo Nov 26, 2025
9246bfd
Correct Bazel build command in README
Yoshiki0319 Nov 26, 2025
a92965a
Added the ability to generate embeddings, save them to ResDB, and sea…
steveoVivo Nov 29, 2025
4993977
Cleared all TODOs in vector_add
steveoVivo Nov 30, 2025
40b48a5
Progress on TODOs in vector_get, fixed issue in vector_Add that would…
steveoVivo Nov 30, 2025
e7ab97e
Complete all TODOs in add/get/library, fix major bug with encoding/de…
steveoVivo Dec 1, 2025
1645588
Add stress test script and update vector add/get for binary file hand…
Yoshiki0319 Dec 2, 2025
f35d0ae
Check for saved_data folder before trying to create embedding files
steveoVivo Dec 2, 2025
14e0bfc
Added hard delete functionality along with writing embeddings as bytes
Dec 2, 2025
3a5a08e
Update encoding method for text passages in vector_add.py
Yoshiki0319 Dec 3, 2025
fadec90
Change content decoding to base64 for ResDB
Yoshiki0319 Dec 3, 2025
3ed5d4a
Add base64 import to vector_delete.py
Yoshiki0319 Dec 3, 2025
b631ecd
Update encoding method for file writing
Yoshiki0319 Dec 3, 2025
e04f98f
Moved JSONScalar to separate file allowing app.py to run. Added test …
tichingkao Dec 5, 2025
fe8bd06
Fix base64 encoding for content bytes in vector_add.py, vector_delete…
Yoshiki0319 Dec 5, 2025
dc5a7a6
After converting to base64, the behavior became abnormal, so I revert…
Yoshiki0319 Dec 5, 2025
8f35987
Update encoding to base64 for content bytes in vector_add.py, vector_…
Yoshiki0319 Dec 5, 2025
fa292c7
Add vector client and proxy server implementation for vector indexing
Yoshiki0319 Dec 5, 2025
561852d
Fix exit behavior on failed deletion in vector_delete.py
Yoshiki0319 Dec 5, 2025
7767fa7
Remove unnecessary configurations from .bazelrc for cleaner build setup
Yoshiki0319 Dec 6, 2025
ab1bfa9
feat: Implement vector indexing and search functionality
Yoshiki0319 Dec 6, 2025
ac4cb55
stress test KV
SideCoin Dec 6, 2025
32b7ce0
bazel
SideCoin Dec 6, 2025
92e768d
Merge branch 'master' of https://github.com/Yoshiki0319/indexers-ECS2…
SideCoin Dec 6, 2025
692f772
Added kv_vector, a CLI tool that interfaces with the GraphQL proxy
steveoVivo Dec 6, 2025
da66f0f
readme update
SideCoin Dec 7, 2025
b4e1ab8
feat: Refactor vector management for persistent execution and optimiz…
Yoshiki0319 Dec 7, 2025
534d7ba
Merge branch 'master' of https://github.com/Yoshiki0319/indexers-ECS2…
Yoshiki0319 Dec 7, 2025
6fc137e
feat: Optimize vector search query and clean up vector_add script
Yoshiki0319 Dec 7, 2025
badd412
getAll now works when calling the proxy
steveoVivo Dec 7, 2025
d30b862
Stress test
SideCoin Dec 7, 2025
c733d42
fix: Adjust similarity score formatting in response output
Yoshiki0319 Dec 8, 2025
bae91c1
Add stress test results CSV and remove vector client and proxy scripts
Yoshiki0319 Dec 9, 2025
b19ecbf
Add README for Vector Indexing SDK
Yoshiki0319 Dec 9, 2025
4475cd3
Update README to remove default endpoint information
Yoshiki0319 Dec 9, 2025
7c0f355
Fix demo script command in README
Yoshiki0319 Dec 9, 2025
21fe92c
Add demo script to add texts to ResilientDB
Yoshiki0319 Dec 9, 2025
4e6e2b6
Update README with demo script execution instructions
Yoshiki0319 Dec 9, 2025
54d3920
Changed the commands in the instructions to run the project
steveoVivo Dec 10, 2025
4253b82
Added a better installation guide to the README
steveoVivo Dec 12, 2025
8c521fa
Merge branch 'master' of https://github.com/Yoshiki0319/indexers-ECS2…
steveoVivo Dec 12, 2025
6a9b85f
Added a correct commands for using the tooling
steveoVivo Dec 12, 2025
6858534
Extra changes to the readme
steveoVivo Dec 12, 2025
d1c7177
Cleaning up unclear code in the ReadME
steveoVivo Dec 12, 2025
651f22c
Moved all indexing files to the ecosystem folder or deleted unused to…
steveoVivo Dec 13, 2025
3f44eaa
Removed outdated spinup file
steveoVivo Dec 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .bazelrc
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
build --cxxopt='-std=c++17' --copt=-O3 --jobs=40
#build --action_env=PYTHON_BIN_PATH="/usr/bin/python3.10"
#build --action_env=PYTHON_LIB_PATH="/usr/include/python3.10"

1 change: 0 additions & 1 deletion .bazelversion

This file was deleted.

565 changes: 198 additions & 367 deletions README.md

Large diffs are not rendered by default.

10 changes: 10 additions & 0 deletions WORKSPACE
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,16 @@ workspace(name = "com_resdb_nexres")

load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

http_archive(
name = "bazel_skylib",
sha256 = "74d544d96f4a5bb630d465ca8bbcfe231e3594e5aae57e1edbf17a6eb3ca2506",
urls = [
"https://mirror.bazel.build/github.com/bazelbuild/bazel-skylib/releases/download/1.3.0/bazel-skylib-1.3.0.tar.gz",
"https://github.com/bazelbuild/bazel-skylib/releases/download/1.3.0/bazel-skylib-1.3.0.tar.gz",
],
)
load("@bazel_skylib//:workspace.bzl", "bazel_skylib_workspace")
bazel_skylib_workspace()
http_archive(
name = "hedron_compile_commands",
#Replace the commit hash (4f28899228fb3ad0126897876f147ca15026151e) with the latest commit hash from the repo
Expand Down
3 changes: 2 additions & 1 deletion ecosystem/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,8 @@ ecosystem/
├── sdk/ # Software Development Kits
│ ├── rust-sdk/ # Rust SDK
│ ├── resvault-sdk/ # ResVault SDK
│ └── resdb-orm/ # Python ORM
│ ├── resdb-orm/ # Python ORM
│ └── vector-indexing/ # Vector indexing for semantic search
├── deployment/ # Deployment and infrastructure
│ ├── ansible/ # Ansible playbooks
│ └── orbit/ # Orbit deployment tool
Expand Down
175 changes: 156 additions & 19 deletions ecosystem/graphql/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,37 +18,84 @@
#
#

import tempfile
import os
import sys
import subprocess
import re
import json
import strawberry
import typing
import ast
from pathlib import Path
from typing import Optional, List, Any
from flask import Flask
from flask_cors import CORS
from strawberry.flask.views import GraphQLView

# --- Local Imports ---
from resdb_driver import Resdb
from resdb_driver.crypto import generate_keypair
from json_scalar import JSONScalar

# --- Vector Indexing Imports ---
from sentence_transformers import SentenceTransformer

# --- Configuration ---
db_root_url = "localhost:18000"
protocol = "http://"
fetch_all_endpoint = "/v1/transactions"
db = Resdb(db_root_url)

import strawberry
import typing
import ast
import json
# --- Vector Indexing Scripts Path Configuration ---
CURRENT_DIR = os.path.dirname(os.path.abspath(__file__))
VECTOR_SCRIPT_DIR = os.path.abspath(os.path.join(CURRENT_DIR, "../sdk/vector-indexing"))
PYTHON_EXE = sys.executable

from typing import Optional, List, Any
from flask import Flask
from flask_cors import CORS
# Add vector script dir to sys.path to allow imports
sys.path.append(VECTOR_SCRIPT_DIR)

app = Flask(__name__)
CORS(app) # This will enable CORS for all routes
# Try importing the manager classes
try:
from vector_add import VectorIndexManager
from vector_get import VectorSearchManager
from vector_delete import VectorDeleteManager
except ImportError as e:
print(f"Warning: Could not import vector modules. Error: {e}")
VectorIndexManager = None
VectorSearchManager = None
VectorDeleteManager = None

from strawberry.flask.views import GraphQLView
# --- Initialize AI Model & Managers (Run Once) ---
print("Initializing Vector Managers...")
vector_index_manager = None
vector_search_manager = None
vector_delete_manager = None

try:
# Load model into memory once at startup to avoid per-request overhead
GLOBAL_MODEL = SentenceTransformer('all-MiniLM-L6-v2')

script_path = Path(VECTOR_SCRIPT_DIR)

if VectorIndexManager:
vector_index_manager = VectorIndexManager(script_path, GLOBAL_MODEL)

if VectorSearchManager:
vector_search_manager = VectorSearchManager(script_path, GLOBAL_MODEL)

if VectorDeleteManager:
vector_delete_manager = VectorDeleteManager(script_path, GLOBAL_MODEL)

print("Vector Managers initialized successfully.")
except Exception as e:
print(f"Error initializing vector managers: {e}")

@strawberry.scalar(description="Custom JSON scalar")
class JSONScalar:
@staticmethod
def serialize(value: Any) -> Any:
return value # Directly return the JSON object

@staticmethod
def parse_value(value: Any) -> Any:
return value # Accept JSON as is
app = Flask(__name__)
CORS(app) # This will enable CORS for all routes

# --- GraphQL Types ---

@strawberry.type
class RetrieveTransaction:
Expand Down Expand Up @@ -76,6 +123,14 @@ class PrepareAsset:
recipientPublicKey: str
asset: JSONScalar

# New Type for Vector Search Results
@strawberry.type
class VectorSearchResult:
text: str
score: float

# --- Query ---

@strawberry.type
class Query:
@strawberry.field
Expand All @@ -94,6 +149,69 @@ def getTransaction(self, id: strawberry.ID) -> RetrieveTransaction:
asset=data["asset"]
)
return payload

@strawberry.field
def count_cats(self) -> str:
# Create a temporary file
with tempfile.NamedTemporaryFile(mode="w+", delete=False) as tmp_file:
tmp_path = tmp_file.name

#Write to file
lines = ["cat", "cat", "cat", "mouse", "cat"]
for line in lines:
tmp_file.write(line + "\n")

# Count number of cats
cat_count = 0
with open(tmp_path, "r") as f:
for line in f:
if "cat" in line.strip():
cat_count += 1

#Delete temporary file
os.remove(tmp_path)

#return number of cats
return f'The word "cat" appears {cat_count} times'

@strawberry.field
def getAllVectors(self) -> List[VectorSearchResult]:
"""Search for all texts"""
results = []
raw_values = vector_search_manager.get_all_values()
for val in raw_values:
# For 'show all', we typically don't have a similarity score, or it's N/A
results.append(VectorSearchResult(text=val, score=1.0))
return results

# --- New: Vector Search Query (Optimized) ---
@strawberry.field
def searchVector(self, text: str = None, k: int = 1) -> List[VectorSearchResult]:
"""Search for similar texts using the in-memory manager."""
results = []

if not vector_search_manager:
print("Error: Vector search manager not initialized.")
return []

if text is None:
# Show all functionality
raw_values = vector_search_manager.get_all_values()
for val in raw_values:
# For 'show all', we typically don't have a similarity score, or it's N/A
results.append(VectorSearchResult(text=val, score=1.0))
else:
# Search functionality
search_results = vector_search_manager.search(text, k)
for item in search_results:
results.append(VectorSearchResult(
text=item['text'],
score=item['score']
))

return results

# --- Mutation ---

@strawberry.type
class Mutation:
Expand All @@ -115,6 +233,25 @@ def postTransaction(self, data: PrepareAsset) -> CommitTransaction:
)
return payload

# --- New: Vector Add Mutation (Optimized) ---
@strawberry.mutation
def addVector(self, text: str) -> str:
"""Add a text to the vector index using the in-memory manager."""
if vector_index_manager:
return vector_index_manager.add_value(text)
else:
return "Error: Vector index manager not initialized."

# --- New: Vector Delete Mutation (Optimized) ---
@strawberry.mutation
def deleteVector(self, text: str) -> str:
"""Delete a text from the vector index using the in-memory manager."""
if vector_delete_manager:
return vector_delete_manager.delete_value(text)
else:
return "Error: Vector delete manager not initialized."


schema = strawberry.Schema(query=Query, mutation=Mutation)

app.add_url_rule(
Expand All @@ -123,4 +260,4 @@ def postTransaction(self, data: PrepareAsset) -> CommitTransaction:
)

if __name__ == "__main__":
app.run(port="8000")
app.run(port="8000")
9 changes: 9 additions & 0 deletions ecosystem/graphql/json_scalar.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
import strawberry
from typing import Any

@strawberry.scalar(
name="JSONScalar",
description="Custom JSON scalar"
)
def JSONScalar(value: Any) -> Any:
return value
67 changes: 67 additions & 0 deletions ecosystem/sdk/vector-indexing/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Vector Indexing SDK for ResilientDB
This directory contains a Python SDK for performing vector indexing and similarity search using ResilientDB as the storage backend.

The primary interface for users is the ```kv_vector.py``` CLI tool, which interacts with the ResilientDB GraphQL service to manage vector embeddings.

## Architecture
- ```kv_vector.py```: The CLI frontend. It sends GraphQL mutations and queries to the proxy.
- ```kv_vector_library.py```: Handles the HTTP requests to the GraphQL endpoint.
### Backend Scripts
- ```vector_add.py```, ```vector_get.py```, ```vector_delete.py```: These scripts reside on the server side (or strictly connected environment) to handle embedding generation (via SentenceTransformers) and HNSW index management.

## Prerequisites
Before using this SDK, please ensure the entire ResilientDB stack is up and running. Specifically, you need:
1. ResilientDB KV Store: The core blockchain storage service must be running. [How to Setup](https://github.com/apache/incubator-resilientdb)
2. GraphQL Server (```ecosystem/graphql```): The backend service handling GraphQL schemas and resolvers. [How to Setup](https://github.com/apache/incubator-resilientdb/tree/master/ecosystem/graphql)
3. GraphQL Application (```ecosystem/graphql/app.py```): The Python web server (Ariadne/Flask) that exposes the GraphQL endpoint. [How to Setup](https://github.com/apache/incubator-resilientdb/tree/master/ecosystem/graphql)
4. In a terminal where the current directory is ecosystem/sdk/vector-indexing, activate the GraphQL virtual environment.

## Installation
Install the required Python dependencies:
```
pip install requests pyyaml numpy hnswlib sentence-transformers
```

## Quick Start: Demo Data
A shell script is provided to quickly populate the database with sample data for testing purposes. This is the fastest way to verify your environment is set up correctly.
1. Make sure you are in the ```ecosystem/sdk/vector-indexing``` directory.
2. Run the demo script:
```
chmod +x demo_add.sh
./demo_add.sh
```
**What this does:** The script iterates through a predefined list of sentences (covering topics like biology, sports, and art) and adds them to the ResilientDB vector index one by one using ```kv_vector.py```.

## Usage (CLI)
The ```kv_vector.py``` script is the main entry point. It allows you to add text (which is automatically vectorized), search for similar text, and manage records via the GraphQL endpoint.

### 1. Adding Data
To add a text string. This will generate an embedding and store it in ResilientDB.
```
python3 kv_vector.py --add "<TEXT>"
```

### 2. Searching
To find the ```k``` most similar strings to your query using HNSW similarity search.
```
# Get the single most similar record (default k=1)
python3 kv_vector.py --get "<SEARCH WORDS>"

# Get the top 3 matches
python3 kv_vector.py --get "<SEARCH WORDS>" --k_matches 3
```

### 3. Listing All Data
To retrieve all text values currently stored in the index.
```
python3 kv_vector.py --getAll
```

### 4. Deleting Data
To remove a specific value and its embedding from the index.
```
python3 kv_vector.py --delete "<TEXT>"
```

## Configuration
If your GraphQL service is running on a different host or port, you may need to modify the configuration in ```kv_vector_library.py``` or the ```config.yaml``` file depending on your deployment mode.
17 changes: 17 additions & 0 deletions ecosystem/sdk/vector-indexing/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

database:
db_root_url: http://0.0.0.0:18000
26 changes: 26 additions & 0 deletions ecosystem/sdk/vector-indexing/demo_add.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/bin/bash

echo "=== Adding 10 demo texts to ResilientDB ==="

texts=(
"Large language models can generate human-like text and assist with tasks such as summarization, translation, and code generation."
"Photosynthesis allows plants to convert sunlight into chemical energy, producing oxygen as a byproduct."
"Kyoto is known for its ancient temples, traditional wooden houses, and beautiful seasonal landscapes."
"Strong branding helps companies build customer trust and differentiate themselves in competitive markets."
"Regular exercise improves cardiovascular health, increases muscle strength, and reduces stress levels."
"Active learning encourages students to participate, discuss ideas, and apply knowledge rather than passively listen."
"Sourdough bread develops its unique flavor through natural fermentation using wild yeast and lactic acid bacteria."
"Reducing plastic waste requires better recycling systems and increased use of biodegradable materials."
"Impressionist painters focused on capturing light and movement rather than creating precise, realistic details."
"Basketball requires teamwork, quick decision-making, and precise coordination between players on the court."
)

for text in "${texts[@]}"
do
echo "→ Adding:"
echo " \"$text\""
python3 kv_vector.py --add "$text"
echo ""
done

echo "=== Done: All demo texts added ==="
Loading