-
Notifications
You must be signed in to change notification settings - Fork 426
integration: add Isaacus docs #1005
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
umarbutler
wants to merge
8
commits into
langchain-ai:main
Choose a base branch
from
isaacus-dev:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+193
−0
Open
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
f74a552
chore: add ignore for .code-workspace
umarbutler 2758db4
feat: add isaacus docs
umarbutler fdcf289
fix: typo in code snippet
umarbutler bd50d63
fix: typo in docs link
umarbutler 5f5fb0b
docs: remove redundant link
umarbutler c377719
docs: remove redundant link
umarbutler bfed65f
fix: typo
umarbutler cd53e1f
Merge branch 'main' into main
umarbutler File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -182,3 +182,6 @@ Chinook.db | |
| .editorconfig | ||
|
|
||
| *.swp | ||
|
|
||
| # VS Code | ||
| *.code-workspace | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,65 @@ | ||
| --- | ||
| title: Isaacus | ||
| --- | ||
|
|
||
| [Isaacus](https://isaacus.com/) is a foundational legal AI research company building AI models, apps, and tools for the legal tech ecosystem. | ||
|
|
||
| Isaacus' offering includes [Kanon 2 Embedder](https://isaacus.com/blog/introducing-kanon-2-embedder), the world's best legal embedding model (as measured on the [Massive Legal Embedding Benchmark](https://isaacus.com/blog/introducing-mleb)), as well as [legal zero-shot classification](https://docs.isaacus.com/models/introduction#universal-classification) and [legal extractive question answering models](https://docs.isaacus.com/models/introduction#answer-extraction). | ||
|
|
||
| Isaacus offers first-class support for LangChain's embedding interface, accessible via the [`langchain-isaacus`](https://pypi.org/project/langchain-isaacus/) integration package. | ||
|
|
||
| ## Setup | ||
| To get started using Isaacus models with LangChain, head to the [Isaacus Platform](https://platform.isaacus.com/accounts/signup/) and create a new account. | ||
|
|
||
| Once signed up, [add a payment method](https://platform.isaacus.com/billing/) (thereby claiming your [free credits](https://docs.isaacus.com/pricing/credits)) and [generate an API key](https://platform.isaacus.com/users/api-keys/). | ||
|
|
||
| Next, install the [`langchain-isaacus`](https://pypi.org/project/langchain-isaacus/) integration package: | ||
| <CodeGroup> | ||
| ```bash pip | ||
| pip install langchain-isaacus | ||
| ``` | ||
|
|
||
| ```bash uv | ||
| uv add langchain-isaacus | ||
| ``` | ||
| </CodeGroup> | ||
|
|
||
| You should then set your `ISAACUS_API_KEY` environment variable to your Isaacus API key. | ||
| <CodeGroup> | ||
| ```bash bash | ||
| export ISAACUS_API_KEY="your_api_key_here" | ||
| ``` | ||
| ```powershell powershell | ||
| $env:ISAACUS_API_KEY="your_api_key_here" | ||
| ``` | ||
| </CodeGroup> | ||
|
|
||
| ## Embeddings | ||
| The code snippet below demonstrates how you might use Isaacus' Kanon 2 Embedder model to assess the semantic similarity of legal queries to a legal document with LangChain. A more detailed walkthrough of how to generate embeddings with the Isaacus LangChain integration is available [here](/oss/integrations/text_embedding/isaacus). | ||
|
|
||
| ```python | ||
| import numpy as np # NOTE you may need to `pip install numpy`. | ||
|
|
||
| from langchain_isaacus import IsaacusEmbeddings | ||
|
|
||
| # Create an Isaacus API client for Kanon 2 Embedder. | ||
| client = IsaacusEmbeddings( | ||
| "kanon-2-embedder", | ||
| # dimensions=1792, # You may optionally wish to specify a lower dimension. | ||
| ) | ||
|
|
||
| # Embed a dummy document. | ||
| document_embedding = client.embed_documents(texts=["These are GitHub's billing policies."])[0] | ||
|
|
||
| # Embed our search queries. | ||
| relevant_query_embedding = client.embed_query(text="What are GitHub's billing policies?") | ||
| irrelevant_query_embedding = client.embed_query(text="What are Microsoft's billing policies?") | ||
|
|
||
| # Compute the similarity between the queries and the document. | ||
| relevant_similarity = np.dot(relevant_query_embedding, document_embedding) | ||
| irrelevant_similarity = np.dot(irrelevant_query_embedding, document_embedding) | ||
|
|
||
| # Log the results. | ||
| print(f"Similarity of relevant query to the document: {relevant_similarity * 100:.2f}") | ||
| print(f"Similarity of irrelevant query to the document: {irrelevant_similarity * 100:.2f}") | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,98 @@ | ||
| --- | ||
| title: Isaacus | ||
| --- | ||
|
|
||
| This guide walks you through how to get started generating legal embeddings using [Isaacus'](/oss/integrations/providers/isaacus) LangChain integration. | ||
|
|
||
| ## 1. Set up your account | ||
|
|
||
| Head to the [Isaacus Platform](https://platform.isaacus.com/accounts/signup/) to create a new account. | ||
|
|
||
| Once signed up, [add a payment method](https://platform.isaacus.com/billing/) to claim your [free credits](https://docs.isaacus.com/pricing/credits). | ||
|
|
||
| After adding a payment method, [create a new API key](https://platform.isaacus.com/users/api-keys/). | ||
|
|
||
| Make sure to keep your API key safe. You won't be able to see it again after you create it. But don't worry, you can always generate a new one. | ||
|
|
||
| ## 2. Install the Isaacus API client | ||
|
|
||
| Now that your account is set up, install the [Isaacus LangChain](https://pypi.org/project/langchain-isaacus/) integration package. | ||
|
|
||
| <CodeGroup> | ||
| ```bash pip | ||
| pip install langchain-isaacus | ||
| ``` | ||
|
|
||
| ```bash uv | ||
| uv add langchain-isaacus | ||
| ``` | ||
| </CodeGroup> | ||
|
|
||
| ## 3. Embed a document | ||
|
|
||
| With our API client installed, let's embed our first legal query and document. | ||
|
|
||
| To start, you need to **initialize the client with your API key**. You can do this by setting the `ISAACUS_API_KEY` environment variable or by passing it directly, which is what we're doing in this example. | ||
|
|
||
| We're going to use [Kanon 2 Embedder](https://isaacus.com/blog/introducing-kanon-2-embedder), the world's most accurate legal embedding model on the [Massive Legal Embedding Benchmark](https://isaacus.com/blog/introducing-mleb) as of 20 October 2025. | ||
|
|
||
| ```python | ||
| from langchain_isaacus import IsaacusEmbeddings | ||
|
|
||
| # Create an Isaacus API client for Kanon 2 Embedder. | ||
| client = IsaacusEmbeddings( | ||
| model="kanon-2-embedder", | ||
| api_key="PASTE_YOUR_API_KEY_HERE", | ||
| # dimensions=1792, # You may optionally wish to specify a lower dimension. | ||
| ) | ||
| ``` | ||
| Next, let's grab a legal document to embed. For this example, we'll use [GitHub's terms of service](https://github.com/terms). | ||
umarbutler marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ```python | ||
| import isaacus | ||
|
|
||
| tos = isaacus.Isaacus().get(path="https://examples.isaacus.com/github-tos.md", cast_to=str) | ||
umarbutler marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ``` | ||
|
|
||
| We're interested in retrieving the GitHub terms of service given a search query about it. | ||
|
|
||
| To do that, we'll first embed the document using the `.embed_documents()` method of our API client. Using this method indicates that we're embedding a document (as opposed to a search query) which is important for ensuring that our embeddings are optimized for retrieval (as opposed to other tasks like classification or sentence similarity). | ||
|
|
||
| ```python | ||
| document_embedding = client.embed_documents(texts=[tos])[0] | ||
| ``` | ||
|
|
||
| Now, let's embed two search queries, one that is clearly relevant to the document and another that is clearly irrelevant. This time we'll use the `.embed_query()` method of our API client, which indicates that we're embedding a search query. | ||
|
|
||
| ```python | ||
| relevant_query_embedding = client.embed_query(text="What are GitHub's billing policies?") | ||
| irrelevant_query_embedding = client.embed_query(text="What are Microsoft's billing policies?") | ||
| ``` | ||
|
|
||
| To assess the relevance of the queries to the document, we can compute the cosine similarity between their embeddings and the document embedding. | ||
|
|
||
| Cosine similarity measures how similar two sets of numbers are (specifically, the cosine of the angle between two vectors in an inner product space). In theory, it ranges from $$-1$$ to $$1$$, with $$1$$ indicating that the vectors are identical, $$0$$ indicating that they are orthogonal (i.e., completely dissimilar), and $$-1$$ indicating that they are diametrically opposed. In practice, however, it tends to range from $$0$$ to $$1$$ for text embeddings (since they are usually non-negative). | ||
|
|
||
| Isaacus' embedders have been optimized such that the cosine similarity of the embeddings they produce roughly corresponds to how similar the original texts are in meaning. Unlike Isaacus' universal classifiers, however, Isaacus embedders' scores have not been calibrated to be interpreted as probabilities, only as relative measures of similarity, making them most useful for ranking search results. | ||
|
|
||
| For the sake of convenience, our Python example uses [`numpy`](https://numpy.org/)'s `dot` function to compute the dot product of our embeddings (which is equivalent to their cosine similarity since all our embeddings are L2-normalized). If you prefer, you can use another library to compute the cosine similarity of the embeddings (e.g., [`torch`](https://pytorch.org/) via `torch.nn.functional.cosine_similarity`), or you could write your own implementation. | ||
|
|
||
| ```python | ||
| import numpy as np | ||
|
|
||
| relevant_similarity = np.dot(relevant_query_embedding, document_embedding) | ||
| irrelevant_similarity = np.dot(irrelevant_query_embedding, document_embedding) | ||
|
|
||
| print(f"Similarity of relevant query to the document: {relevant_similarity * 100:.2f}") | ||
| print(f"Similarity of irrelevant query to the document: {irrelevant_similarity * 100:.2f}") | ||
| ``` | ||
|
|
||
| The output should look something like this: | ||
| ``` | ||
| Similarity of relevant query to the document: 52.87 | ||
| Similarity of irrelevant query to the document: 24.86 | ||
| ``` | ||
|
|
||
| As you should see, the relevant query has a much higher similarity score to the document than the irrelevant query, indicating that our embedder has successfully captured the semantic meaning of the texts. | ||
|
|
||
| And that's it! You've just successfully embedded a legal document and queries using the Isaacus API with LangChain. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.