Skip to content

Conversation

@jeremybmerrill
Copy link
Contributor

As described in #2225, when nodes are persisted to Postgresql and then re-hydrated with query, they do not retain relationship key/value pair. That means nodes lose their relationship to their SOURCE document. This behavior is (as described in my comment in #2225) specific to the PGVectorStore class; SimpleVectorStore and WeaviateVectorStore properly retain `relationship.

The root of the bug is that PGVectorStore doesn't use nodeToMetadata to dehydrate the node to a single dict and doesn't use metadataDictToNode to re-hydrate it. Instead, PGVectorStore just stores the node's metadata (ignoring the rest of the node data, discarding it).

This PR uses nodeToMetadata and metadataDictToNode, imitating WeaviateVectorStore. Now, it works for me -- I can dehydrate and rehydrate nodes, and have them retain their relationships.

Closes #2225

Before, incorrect value of the metadata field in Postgres -- all of these keys are arbitrary user-specific metadata, nothing LlamaIndexTS-related except create_date:

{
    "fn": "filings_raw/0000023217-2024Q1-10-Q-cag20231126_10q.html",
    "cik": "23217",
    "font-family": "\"Times New Roman\"",
    "create_date": "2025-09-25T01:00:52.659Z",
    ...
}

After, with this PR:

{
    "fn": "filings_raw/0000023217-2024Q1-10-Q-cag20231126_10q.html",
    "cik": "23217",
    "doc_id": "d515c040-7ff9-482a-88f9-2e1c68f7e4d2",
    "_node_type": "TextNode",
    "ref_doc_id": "d515c040-7ff9-482a-88f9-2e1c68f7e4d2",
    "create_date": "2025-10-26T01:17:29.250Z",
    "document_id": "d515c040-7ff9-482a-88f9-2e1c68f7e4d2",
    "font-family": "\"Times New Roman\"",
    "_node_content": "{\"id_\":\"1400530c-ab43-4791-a1a5-ad5479ab848f\",\"excludedEmbedMetadataKeys\":[\"\",\"Unnamed: 0\",\"cik\",\"fn\",\"classification\",\"paragraph_index\",\"font-size\",\"font-family\",\"font-style\",\"font-weight\",\"line-height\",\"text-align\",\"width\",\"margin-bottom\",\"margin-top\",\"text-indent\",\"vertical-align\",\"color\",\"text_len\",\"pct_numbers\"],\"excludedLlmMetadataKeys\":[],\"relationships\":{\"SOURCE\":{\"nodeId\":\"d515c040-7ff9-482a-88f9-2e1c68f7e4d2\",\"metadata\":{\"\":\"1825\",\"Unnamed: 0\":\"1325\",\"cik\":\"23217\",\"fn\":\"filings_raw/0000023217-2024Q1-10-Q-cag20231126_10q.html\",\"classification\":\"body\",\"paragraph_index\":\"405\",\"font-size\":\"10pt\",\"font-family\":\"\\\"Times New Roman\\\"\",\"font-style\":\"\",\"font-weight\":\"\",\"line-height\":\"\",\"text-align\":\"justify\",\"width\":\"\",\"margin-bottom\":\"\",\"margin-top\":\"\",\"text-indent\":\"27pt\",\"vertical-align\":\"\",\"color\":\"\",\"text_len\":\"478.0\",\"pct_numbers\":\"0.0\"},\"hash\":\"K+SksiqFAH0gXsK3F0isP0/kPjXWT3/9F/dJJybN4sE=\"}},\"text\":\"\",\"textTemplate\":\"\",\"metadataSeparator\":\"\n\",\"startCharIdx\":0,\"endCharIdx\":478,\"type\":\"TEXT\",\"hash\":\"mAn2tHceWJ8PrhydvePeEfPrDdK571IaOqwY5tPukBs=\"}",
   ... 
}

I have not added tests with this PR because I'm bad at unit tests and there aren't any tests that I see for PGVectorStore. If required, I will futz around with Copilot and try to get some meaningful tests that pass. Let me know!

@changeset-bot
Copy link

changeset-bot bot commented Oct 26, 2025

🦋 Changeset detected

Latest commit: 58d4731

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@llamaindex/postgres Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new
Copy link

pkg-pr-new bot commented Oct 26, 2025

Open in StackBlitz

@llamaindex/autotool

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/autotool@2232

@llamaindex/community

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/community@2232

@llamaindex/core

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/core@2232

@llamaindex/env

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/env@2232

@llamaindex/experimental

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/experimental@2232

llamaindex

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/llamaindex@2232

@llamaindex/node-parser

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/node-parser@2232

@llamaindex/readers

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/readers@2232

@llamaindex/tools

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/tools@2232

@llamaindex/wasm-tools

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/wasm-tools@2232

@llamaindex/workflow

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/workflow@2232

@llamaindex/anthropic

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/anthropic@2232

@llamaindex/assemblyai

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/assemblyai@2232

@llamaindex/aws

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/aws@2232

@llamaindex/clip

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/clip@2232

@llamaindex/cohere

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/cohere@2232

@llamaindex/deepinfra

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/deepinfra@2232

@llamaindex/deepseek

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/deepseek@2232

@llamaindex/discord

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/discord@2232

@llamaindex/excel

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/excel@2232

@llamaindex/fireworks

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/fireworks@2232

@llamaindex/google

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/google@2232

@llamaindex/groq

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/groq@2232

@llamaindex/huggingface

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/huggingface@2232

@llamaindex/jinaai

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/jinaai@2232

@llamaindex/mistral

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/mistral@2232

@llamaindex/mixedbread

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/mixedbread@2232

@llamaindex/notion

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/notion@2232

@llamaindex/ollama

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/ollama@2232

@llamaindex/openai

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/openai@2232

@llamaindex/ovhcloud

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/ovhcloud@2232

@llamaindex/perplexity

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/perplexity@2232

@llamaindex/portkey-ai

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/portkey-ai@2232

@llamaindex/replicate

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/replicate@2232

@llamaindex/together

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/together@2232

@llamaindex/vercel

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/vercel@2232

@llamaindex/vllm

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/vllm@2232

@llamaindex/voyage-ai

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/voyage-ai@2232

@llamaindex/xai

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/xai@2232

@llamaindex/astra

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/astra@2232

@llamaindex/azure

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/azure@2232

@llamaindex/chroma

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/chroma@2232

@llamaindex/elastic-search

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/elastic-search@2232

@llamaindex/firestore

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/firestore@2232

@llamaindex/milvus

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/milvus@2232

@llamaindex/mongodb

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/mongodb@2232

@llamaindex/pinecone

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/pinecone@2232

@llamaindex/postgres

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/postgres@2232

@llamaindex/qdrant

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/qdrant@2232

@llamaindex/supabase

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/supabase@2232

@llamaindex/upstash

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/upstash@2232

@llamaindex/weaviate

npm i https://pkg.pr.new/run-llama/LlamaIndexTS/@llamaindex/weaviate@2232

commit: 58d4731

@jeremybmerrill
Copy link
Contributor Author

hi @marcusschiesser : Do you need anything from me to make this PR or #2237 ready to be merged? (I think those test failures are unrelated.)

I'd love for this to be merged because I'm having real trouble getting my forked version to build in my build process.

"name": "@llamaindex/core",
"type": "module",
"version": "0.6.22",
"version": "0.6.23",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR shouldn't change the versions. Change set is doing this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry @jeremybmerrill there are also other files changing the versions, please revert them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

geez, sorry about that. I think I've fixed it.

@marcusschiesser
Copy link
Collaborator

@jeremybmerrill please have a look, tests are failing:

location: '/home/runner/work/LlamaIndexTS/LlamaIndexTS/e2e/node/vector-store/pg-vector-store.e2e.ts:1:1726'
881
  failureType: 'testCodeFailure'
882
  error: |-
883
    Expected values to be strictly deep-equal:
884
    + actual - expected
885
    
886
      {
887
    +   embedding: undefined,
888
    +   excludedEmbedMetadataKeys: [],
889
    +   excludedLlmMetadataKeys: [],
890
    +   hash: '0h/Yo44sleekSLV0SrlweNZLdL1/MzxLsRUiSpWVmpQ=',
891
    +   id_: '5bb16627-f6c0-459c-bb18-71642813ef21',
892
    +   metadata: {
893
    +     create_date: '2025-11-28T03:44:07.865Z'
894
    +   },
895
    +   metadataSeparator: '\n',
896
    +   relationships: {},
897
    +   text: 'hello world',
898
    +   textTemplate: '',
899
    +   type: 'TEXT'
900
    -   embedding: [
901
    -     0.1,
902
    -     0.2,
903
    -     0.3
904
    -   ],
905
    -   excludedEmbedMetadataKeys: [],
906
    -   excludedLlmMetadataKeys: [],
907
    -   hash: '0h/Yo44sleekSLV0SrlweNZLdL1/MzxLsRUiSpWVmpQ=',
908
    -   id_: '5bb16627-f6c0-459c-bb18-71642813ef21',
909
    -   metadata: {
910
    -     create_date: '2025-11-28T03:44:07.865Z'
911
    -   },
912
    -   metadataSeparator: '\n',
913
    -   relationships: {},
914
    -   text: 'hello world',
915
    -   textTemplate: '',
916
    -   type: 'DOCUMENT'
917
      }
918
  code: 'ERR_ASSERTION'
919
  name: 'AssertionError'
920
  expected:
921
    id_: '5bb16627-f6c0-459c-bb18-71642813ef21'
922
    metadata:
923
      create_date: '2025-11-28T03:44:07.865Z'
924
    excludedEmbedMetadataKeys:
925
    excludedLlmMetadataKeys:
926
    relationships:
927
    embedding:
928
      0: 0.1
929
      1: 0.2
930
      2: 0.3
931
    text: 'hello world'
932
    textTemplate: ''
933
    metadataSeparator: |-
934
      
935
      
936
    type: 'DOCUMENT'
937
    hash: '0h/Yo44sleekSLV0SrlweNZLdL1/MzxLsRUiSpWVmpQ='
938
  actual:
939
    id_: '5bb16627-f6c0-459c-bb18-71642813ef21'
940
    metadata:
941
      create_date: '2025-11-28T03:44:07.865Z'
942
    excludedEmbedMetadataKeys:
943
    excludedLlmMetadataKeys:
944
    relationships:
945
    text: 'hello world'
946
    textTemplate: ''
947
    metadataSeparator: |-
948
      
949
      
950
    type: 'TEXT'
951
    hash: '0h/Yo44sleekSLV0SrlweNZLdL1/MzxLsRUiSpWVmpQ='
952
  operator: 'deepStrictEqual'
953
  stack: |-
954
    TestContext.<anonymous> (/home/runner/work/LlamaIndexTS/LlamaIndexTS/e2e/node/vector-store/pg-vector-store.e2e.ts:92:12)
955
    process.processTicksAndRejections (node:internal/process/task_queues:95:5)
956
    async Test.run (node:internal/test_runner/test:797:9)
957
    async startSubtest (node:internal/test_runner/harness:259:3)
958
    async <anonymous> (/home/runner/work/LlamaIndexTS/LlamaIndexTS/e2e/node/vector-store/pg-vector-store.e2e.ts:63:1)
959
  ...

@jeremybmerrill
Copy link
Contributor Author

@marcusschiesser BLUF: I resolved the first issue; the second one raises a question for you. I have two other recommendations/questions.

I've edited this branch to re-hydrate embedding. That resolves one of the two failing pieces of the test. I'd note that other vector stores (Chroma and Weaviate don't; Supabase does) seem to be inconsistent about whether they populate that key in the returned Node. Recommendation #1 it may be be good for the library to enforce consistency here for the various implementations of vector store.

Running this test requires some non-documented local Postgresql setup -- there's a ".env.ci" that assumes a Postgres user named runner and a database named llamaindex_node_test. I missed the test failure because it failed both before and after my change -- because I hadn't run the setup. I didn't see any reference to this in testing instructions -- Recommendation #2 that might be a helpful addition, as well as to note the fact that the tests will create a unique schema per-run.

The second failure (expected type: 'DOCUMENT', actual type: 'TEXT') is much trickier. I think the e2e test may be testing an idiosyncratic functionality of PGVectorStore that is not present in other vector store implementations -- that functionality is the ability to rehydrate a node of type other than INDEX or TEXT. I see traces of an implicit expectation that LlamaIndexTS and the Python implementation produce mutually readable artifacts -- as well as the expectation that consumers can expect identical behavior from different vector store backends. However, I don't know where to find tests (if they exist) for either of these expectations.

In metadataDictToNode, which is used by many Vector Store implementations, it will only return nodes of type TEXT or INDEX. I don't see any code to change the type of a created node to anything else; type is a getter without a setter on BaseNode, so we can't change it after creation.

One possible solution would be to edit metadataDictToNode to create other kinds of nodes (like Document). However, this could have other downstream effects I cannot anticipate; and without e2e tests for other vector stores, it's hard to know.

I think the best solution to this is to edit the test to accept type: 'TEXT' -- but you know a lot more about the expected behavior, so I look forward to hearing back from you. Question #1: When a Document is rehydrated from the vector store, is it expected to be returned as a Document or a TextNode? If it's expected to be rehydrated as a Document, can you give me an example that I can base my edit on?

@marcusschiesser
Copy link
Collaborator

I didn't see any reference to this in testing instructions
you're very welcome to add this in this PR

I think the best solution to this is to edit the test to accept type: 'TEXT' -- but you know a lot more about the expected behavior, so I look forward to hearing back from you.
thanks for looking into this - totally agree for this PR

Question #1: When a Document is rehydrated from the vector store, is it expected to be returned as a Document or a TextNode? If it's expected to be rehydrated as a Document, can you give me an example that I can base my edit on?
I think it should be a Document

@jeremybmerrill
Copy link
Contributor Author

@marcusschiesser Thanks! I will edit this PR to add some instructions for testing and to edit the test to accept TEXT for now. Hope to get to it this week.

@jeremybmerrill
Copy link
Contributor Author

jeremybmerrill commented Dec 8, 2025

I've made the first two changes in the commit above -- and the e2e tests now pass. I agree that Documents should be returned as Documents when rehydrated/roundtripped through a VectorStore. I will open a bug to keep track of this problem, as it's outside the scope of this PR (but will link to this PR from the bug).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PGVectorStore and column external_id

2 participants