Skip to content

Conversation

@philipithomas
Copy link
Member

This PR introduces an official Ruby client to Chroma, compatible with both local
Chroma and the latest Chroma Cloud APIs.

It’s currently published under the chromadb-experimental gem on RubyGems - please try
it and provide feedback (the package name may change, which would be a breaking
update). https://rubygems.org/gems/chromadb-experimental

Long‑term, the gem will move to chromadb or chromadb-official. We’re awaiting RubyGems
support on using the chromadb name (it’s currently available but rejected due to
similarity with the chroma-db gem).

Notes:

  • Does not include a “default embedding function” — recommended to use Cloud embedding
    functions or embed outside the library.
  • Does not bundle a local Chroma server (no in‑memory mode like the Python client).
  • See basic usage example in clients/ruby/spec/integration/single_node_spec.rb
  • See full Cloud usage example in clients/ruby/spec/integration/cloud_spec.rb
  • Basic docs in clients/ruby/README.md (docs.trychroma.com will need a separate update)

Here is a basic example with local Chroma:

require "chromadb"

client = Chroma::HttpClient.new(host: "localhost", port: 8000)
collection = client.get_or_create_collection(name: "ruby_local_docs")

collection.add(
  ids: ["a", "b"],
  embeddings: [[0.1, 0.2, 0.3], [0.2, 0.1, 0.0]],
  documents: ["hello", "world"],
  metadatas: [{ "topic" => "greeting" }, { "topic" => "farewell" }],
)

get_result = collection.get(ids: ["a"], include: ["documents", "metadatas"])
pp get_result.to_h

query_result = collection.query(
  query_embeddings: [[0.1, 0.2, 0.25]],
  n_results: 1,
)
pp query_result.to_h

Here is a more advanced example with Cloud Chroma:

require "chromadb"
require "securerandom"

Chroma.configure do |config|
  config.cloud_api_key = ENV.fetch("CHROMA_API_KEY")
  # For admin operations, tenant is required:
  config.cloud_tenant = ENV.fetch("CHROMA_TENANT")
end

cloud = Chroma::CloudClient.new

admin = Chroma::AdminClient.new(
  host: Chroma.configuration.cloud_host,
  port: Chroma.configuration.cloud_port,
  ssl: Chroma.configuration.cloud_ssl,
  headers: { "x-chroma-token" => Chroma.configuration.cloud_api_key },
  tenant: Chroma.configuration.cloud_tenant
)

database = "ruby_cloud_#{SecureRandom.hex(4)}"
admin.create_database(database, tenant: Chroma.configuration.cloud_tenant)

Chroma.configure do |config|
  config.cloud_database = database
end

client = Chroma::CloudClient.new

dense = Chroma::EmbeddingFunctions::ChromaCloudQwenEmbeddingFunction.new
sparse = Chroma::EmbeddingFunctions::ChromaCloudSpladeEmbeddingFunction.new

schema = Chroma::Schema.new
schema.create_index(config: Chroma::VectorIndexConfig.new(embedding_function: dense))
schema.create_index(
  config: Chroma::SparseVectorIndexConfig.new(
    embedding_function: sparse,
    source_key: Chroma::DOCUMENT_KEY
  ),
  key: "sparse_embedding"
)

collection = client.create_collection(
  name: "ruby_cloud_docs_#{SecureRandom.hex(4)}",
  schema: schema
)

collection.add(
  ids: %w[alpha beta gamma],
  documents: ["alpha document", "beta document", "gamma document"],
  metadatas: [
    { "category" => "alpha" },
    { "category" => "beta" },
    { "category" => "gamma" }
  ],
)

query_result = collection.query(
  query_texts: ["alpha"],
  n_results: 2,
  include: ["documents", "metadatas", "distances"]
)
pp query_result.to_h

# Hybrid search w/ RRF
limit = 2
rank_limit = [limit * 5, limit, 128].max

dense_knn = Chroma::Search.Knn(
  query: "alpha",
  key: Chroma::Search::K::EMBEDDING,
  limit: rank_limit,
  return_rank: true
)
sparse_knn = Chroma::Search.Knn(
  query: "alpha",
  key: "sparse_embedding",
  limit: rank_limit,
  return_rank: true
)

rrf = Chroma::Search.Rrf(
  ranks: [dense_knn, sparse_knn],
  k: 60,
  weights: [1.0, 1.0]
)

search = Chroma::Search::Search.new
  .rank(rrf)
  .limit([limit * 3, limit].max)
  .select(Chroma::Search::K::DOCUMENT, Chroma::Search::K::SCORE,
Chroma::Search::K::METADATA)

search_result = collection.search(search)
pp search_result.rows

# Fork collections
forked = collection.fork(name: "ruby_cloud_fork_#{SecureRandom.hex(4)}")
forked.add(ids: ["fork-only"], documents: ["fork only doc"])

@philipithomas philipithomas requested a review from HammadB January 7, 2026 15:16
@github-actions
Copy link

github-actions bot commented Jan 7, 2026

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@propel-code-bot
Copy link
Contributor

It layers on top of the generated OpenAPI models with higher-level abstractions for collections, search, schemas, and embedding functions, while extending configuration for tenants, databases, and authentication. The PR also expands CI to build and test the gem and adds integration suites and scripts that validate both single-node and cloud workflows.

Affected Areas

• clients/ruby/lib/chromadb
• clients/ruby/lib/chromadb/openapi
• clients/ruby/spec
• clients/ruby/scripts
• clients/ruby/README.md
• .github/workflows/_ruby-client-tests.yml
• bin/ruby-single-node-integration-test.sh
• bin/ruby-cloud-integration-test.sh

This summary was automatically generated by @propel-code-bot

```ruby
search = Chroma::Search::Search.new
.where(Chroma::Search::K["type"].eq("doc"))
.rank(Chroma::Search.Knn(query: "ruby", key: "#embedding", limit: 10))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Advisory

[Documentation] Fix namespace access: change Chroma::Search.Knn to Chroma::Search::Knn so the example references the Knn class correctly.

Context for Agents
[**Documentation**]

Fix namespace access: change `Chroma::Search.Knn` to `Chroma::Search::Knn` so the example references the `Knn` class correctly.

File: clients/ruby/README.md
Line: 129

@philipithomas
Copy link
Member Author

philipithomas commented Jan 7, 2026

Schema is working with Cloud UI:

Screenshot 2026-01-07 at 09 56 36

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants