Conversation
| POSTGRES_PORT_LANTERN=5436 | ||
| SSH_PORT_LANTERN=2222 No newline at end of file | ||
| SSH_PORT_LANTERN=2222 | ||
| PINECONE_API_KEY='YOUR_PINECONE_API_KEY_HERE' |
There was a problem hiding this comment.
Coming from JS background so following some conventions from there. What do you think about
- putting this in .env.local
- putting .env.local in .gitignore
- adding a check that these two variables are defined when testing Pinecone
I'm worry about accidentally committing these variables
There was a problem hiding this comment.
Yes I think moving .env to gitignore and adding .env.example may be an option. Also having checks before using these variables will help to provide better user facing error message
|
|
||
| def get_cloud_provider(provider_name): | ||
| if provider_name == Cloud.PINECONE: | ||
| return Pinecone(os.environ['PINECONE_API_KEY'], os.environ['PINECONE_ENV']) |
There was a problem hiding this comment.
We could do the "variables exist" check here
core/utils/constants.py
Outdated
| Extension.PGVECTOR_HNSW: {'m': 32, 'ef_construction': 128, 'ef': 10}, | ||
| Extension.LANTERN: {'m': 32, 'ef_construction': 128, 'ef': 10}, | ||
| Extension.NEON: {'m': 32, 'ef_construction': 128, 'ef': 10}, | ||
| Cloud.PINECONE: { 'name': '', 'metric': 'cosine', 'pods': 1, 'replicas': 1, 'pod_type': 'p2' }, |
There was a problem hiding this comment.
I think all the other indices are using L2 by default, and sift uses the L2 metric for ground truth.
There was a problem hiding this comment.
There was an option of euclidean distance which seems to be l2-norm, but I think in our index it is l2 squared. Will it work as expected?
Added benchmarking for Pinecone
Currently this only benchmarks index creation latency. (create index + upsert data in batches)
The
core/utils/pinecone_async_index.pyis just wrapper for Pinecone Index class, so async requests will be supported when querying index. Referance