Releases: ssc-oscar/python-woc
Releases · ssc-oscar/python-woc
v0.4.1
- Fixes: decode_value hangs when decoding a long string that contains non-utf8 characters.
Full Changelog (2025-11-30)
v0.4.0
- python-woc is now fork safe! That means you can do this:
from multiprocessing import Pool
from woc.local import WocMapsLocal
woc = WocMapsLocal()
def worker(idx):
return woc.show_content("tree", idx)
with Pool(8) as pool:
for r in pool.imap(worker, ["706aa4dedb560358bff21c3120a0b09532d3484d",
"3ccf6f8320740a1afec68b38b3b9ba46cedef368",
"e5798457aebae7c84eff7b80b50c3a938cc4cb63",
"836f04d5b374033b1608269e2f3aaabae263a0db",
"f54cb5527226aa2096307c08e15c62248b98f763",
"da65e1401d11a955686b8a49e46b9a457f3febab",
"a28f1558be9867d35cc1fa17477565c08786cf83",
"4db2ad30097924cbe5da9c0f2c49350fdc19c3a4",
"1cf86145b4a9492ebbe0fa640638504946315ca6",
"29a422c19251aeaeb907175e9b3219a9bed6c616",
"51968a7a4e67fd2696ffd5ccc041560a4d804f5d"]):
print(r)- When
on_badis "error", python-woc raises an KeyError when querying bad keys.
>>> from woc.local import WocMapsLocal
>>> woc = WocMapsLocal(on_bad='error')
>>> woc.get_values("c2p","0a36c08880da83a84209efe5aa90ca3f9b1dc453")
KeyError: 'Key 0a36c08880da83a84209efe5aa90ca3f9b1dc453 is marked as bad: tons of fake blobs'Bad keys are stored in wocprofile.json, and you need to regenerate the profile to reflect this change.
{
"bads": {
"p": {
"bitzhoumy_helloworld": "damaged trees",
"thachmai_mobiliPlay": "single repo for torvalds alias"
},
"c": {
"3f631f976149d8702d0b1496df7b98f16a9357ed": "2013166 blobs",
"14bde94da008ac1c65e0c066ee269315e47c0987": "Completed Search Engine with Cosine Similarity and Champion Lists, storing the entire inverted index in terms, with each term having its own pickle file."
}
}
}- Fixes a bug where
RootProject.commitsqueries the wrong map.
Full Changelog (2025-11-05)
v0.3.2
- Supports python up to 3.13
- Fix data type detection for po2pn & b2fa
Full changelog (2025-09-30)
v0.3.1
Added support for:
- c2tag
commit to tag - b2cff
file renames by git mv, blob -> (commit, old_file, new_file) - commit/tree.tch
get_values('tree.tch','X')=show_content('tree','X')
Full changelog (2025-08-07)
v0.2.6
- Add support for tag (show_content) and c2tag (get_values)
Full Changelog: v0.2.5...v0.2.6
v0.2.5
- Fixes entries not aligned to a multiple of 3 when parsing
cs3large maps, causing index error - Encoding fallbacks to latin-1 when chatdet fails
- Fixes
iter_valueswon't terminate whenon_large='head'
Full list of changes (2024-12-21)
v0.2.4
- Add
iter_values: Iterates over values rather than consuming a list; can be useful when querying large maps. - Add
all_keys: Iterates over keys in a map. - Switch from builtin gzip to rapidgzip for fast, random access to large maps.
exclude_largehas been removed; useon_large='ignore'instead'.
v0.2.2
v0.2.1
Add /home/wocprofile.json to paths.
Full Changelog: v0.2.0...v0.2.1
v0.2.0
Woc is a huuugee dataset of hundreds of files and hundreds of terabytes. To make sure everything is in good shape after transmission, wocProfile v2 adds an optional digest field to verify the integrity of each of the files.
To create a wocProfile with file digests:
python3 -m woc.detect /woc --with-digest > wocprofile.json
To verify the digests:
python3 -m woc.verify --profile wocprofile.json
This version does not break the old profile schema (v1).
Full Changelog: v0.1.2...v0.2.0