-
Notifications
You must be signed in to change notification settings - Fork 1
[WIP] Add BiSeqDict, MultiSeqDict, and MultiBiSeqDict modules #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add three new dictionary types built on top of SeqDict: - BiSeqDict: Many-to-one bidirectional dictionary with reverse lookups - MultiSeqDict: One-to-many dictionary (keys map to multiple values) - MultiBiSeqDict: Many-to-many bidirectional dictionary These modules preserve insertion order and work with any types (not just comparable), leveraging SeqDict's FNV hashing implementation.
|
Thanks for this PR! I checked the BiSeqDict module. It looks like you still have comparable for all of the type vars which will needlessly restrict the user. |
|
@MartinSStewart Thanks for the feedback! You're absolutely right. I've completely fixed all three modules:
All modules now work with any types, fully leveraging SeqDict and SeqSet's FNV hashing. No more comparable restrictions anywhere! 🎉 Latest commit: e94d10b |
Fixes feedback from @MartinSStewart: The modules were unnecessarily restricting type variables to 'comparable'. Changes: - Replace Set with SeqSet (no comparable constraint needed) - Rename type vars from 'comparable1/comparable2' to 'k/v' for clarity - All three modules (BiSeqDict, MultiSeqDict, MultiBiSeqDict) now work with any types, not just comparables This fully leverages SeqDict's FNV hashing implementation.
Add comprehensive tests proving the modules work with non-comparable types: - BiSeqDict tests with custom types and records - MultiSeqDict tests with custom types - MultiBiSeqDict tests with custom types and bidirectional lookups Note: Tests cannot be run due to pre-existing lamdera/codecs dependency issue in the repo.
supermario
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need a very thorough review as it's a lot of presumably AI-genn'ed code addition to primitive concepts we want to be both correct and performant.
Can you update the readme as well please with another section for these additional types and their uses?
I'd also like to see the tests be a little more comprehensive as well to match the existing level of testing for SeqDict/SeqSet.
…dency As suggested by @supermario, inlined gatherEqualsBy and gatherWith into Internal.ListHelpers module to keep dependency tree minimal for a core package. This removes the elm-community/list-extra dependency.
- Add encodeBiSeqDict/decodeBiSeqDict to BiSeqDict.elm - Add encodeMultiSeqDict/decodeMultiSeqDict to MultiSeqDict.elm - Add encodeMultiBiSeqDict/decodeMultiBiSeqDict to MultiBiSeqDict.elm - Enhance README with realistic opaque ID type examples - Add many-to-many chat/documents example for MultiBiSeqDict - Document known codec generation issue with Lamdera compiler
This adds Wire3 codec generation support for three new container types from lamdera/containers: - BiSeqDict: Bidirectional many-to-one dictionary - MultiSeqDict: One-to-many dictionary - MultiBiSeqDict: Many-to-many bidirectional dictionary Changes: - Add DiffableType constructors for the three new types - Add Wire3 encoder/decoder patterns following SeqDict pattern - Add TypeHash support for type hashing and migrations - Add Evergreen migration helpers for all three types Related PR: lamdera/containers#1 Test repo: https://github.com/supermario/qwertytrewq All three types use the same 2-parameter (key, value) structure as SeqDict and work with opaque types that aren't comparable.
Related PRs and TestingCompiler support PR: lamdera/compiler#69 This PR requires the compiler changes in lamdera/compiler#69 to properly generate Wire3 codecs for the three new container types when used in BackendModel. Test repository: https://github.com/CharlonTank/qwertytrewq The test repo demonstrates real-world usage with Testing Instructions# Clone test repo
git clone https://github.com/CharlonTank/qwertytrewq
cd qwertytrewq
# Use override script
./override-dillon.sh /path/to/containers
# Compile (requires compiler from lamdera/compiler#69)
LDEBUG=1 EXPERIMENTAL=1 LOVR="$(pwd)/overrides" lamdera make src/Backend.elm✅ With both PRs: Compilation succeeds |
- Add Id.elm with proper opaque ID types (ChatId, DocumentId) - Update Backend.elm to use Id types instead of Never constructors - Update Types.elm to use Id ChatId and Id DocumentId - Update README with links to both PRs and test results - Add COMPILER_CHANGES_NEEDED.md documentation - Install UUID package dependency Test now compiles successfully with modified compiler! Related PRs: - lamdera/containers#1 - lamdera/compiler#69
I'm working on making extensive tests for these modules but how do you run the tests locally with the modified Lamdera compiler? I tried |
- BiSeqDictTests.elm: 93 → 424 lines (45+ tests covering all operations plus bidirectional lookups) - MultiSeqDictTests.elm: 82 → 598 lines (51 tests for one-to-many relationships) - MultiBiSeqDictTests.elm: 91 → 913 lines (70 tests for many-to-many with reverse index consistency) All test suites now match SeqDict/SeqSet coverage level with: - Build, query, transform, and combine operations - Custom type tests (proving no comparable constraint) - Comprehensive fuzz tests for property-based testing - Reverse lookup tests specific to bidirectional types Updated README to confirm Wire3 codec support is working. Co-Authored-By: Claude <noreply@anthropic.com>
|
@CharlonTank sorry missed your questions. I think you'll need to make sure you compile the project with should work, assuming that |
supermario
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments to discuss, apologies I've not had the time to do some proper evaluation before this.
README.md
Outdated
| --> Just workspace1 | ||
|
|
||
| -- Reverse lookup: Who are all members of workspace1? | ||
| BiSeqDict.getReverse workspace1 userWorkspaces |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps BiSeqDict.getKeys workspace1 userWorkspaces would be clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Though still reads a bit awkward with the collection name - "get the keys for doc1 chat documents"... open to other ideas
README.md
Outdated
| |> MultiSeqDict.insert property2 unit201 | ||
|
|
||
| -- Get all units for property1 | ||
| MultiSeqDict.get property1 propertyUnits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| MultiSeqDict.get property1 propertyUnits | |
| MultiSeqDict.getAll property1 propertyUnits |
A little torn on this one. .get might be what you reach for naturally when writing code.
But .getAll will be much more helpful and clear when reading code – all other .get are singular in other collections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed 👍
README.md
Outdated
| --> SeqSet.fromList [doc1, doc2] | ||
|
|
||
| -- Which chats contain doc1? | ||
| MultiBiSeqDict.getReverse doc1 chatDocuments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, consideration for change to both .get -> .getAll and .getReverse -> .getKeys.
But when reading through the example code, the insertion/removal reads beautifully, but the retrieval reads clunkily, my brain is juggling which direction things are going.
MultiBiSeqDict.getAll chat1 chatDocuments
Reads to me as "get all chat1 documents" -> not bad.
MultiBiSeqDict.getReverse doc1 chatDocuments
MultiBiSeqDict.getKeys doc1 chatDocuments
Reads to me as "get the reverse document chat documents?"
Or "get the keys for doc1 chat documents...?"
Feel like there's an opportunity for something much clearer here. Will think a bit more but ideas welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same - open to ideas if something reads better
| remove from to (MultiSeqDict d) = | ||
| MultiSeqDict <| | ||
| SeqDict.update from (Maybe.andThen (SeqSet.remove to >> normalizeSet)) d | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we're missing a removeValues version – i.e. for the x-to-many variants, we'd want to be able to remove all values of some given value on the right hand side.
propertyUnits |> OneToMany.removeValues unit102
chatDocuments |> ManyToMany.removeValues doc1There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, will add
| <sup>*Non-equatable Elm values are currently: functions, `Bytes`, `Html`, `Json.Value`, `Task`, `Cmd`, `Sub`, `Never`, `Texture`, `Shader`, and any datastructures containing these types.</sup> | ||
|
|
||
|
|
||
| ## BiSeqDict, MultiSeqDict, and MultiBiSeqDict (bidirectional and multi-value dictionaries) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One general thing I felt reading through as a whole was even after fully understanding what's going on, reading BiSeqDict and MultiBiSeqDict and MultiSeqDict I kept thinking "wait hold on which one is that?", scrolling back to the titles below to see the BiSeqDict (Many-to-One) and going "Ah right yes, this is the many-to-one one.
That just kept happening over and over... 😅
So what if instead, we names them the thing they are?
- ManyToOne
- OneToMany
- ManyToMany
I'm not 100% a fan of this naming in general, but after redoing the examples with this naming at least to me reads a lot nicer and helps keep clarity in my head of what's going on, so I think it is a step better naming than the SeqDict versions:
ManyToOne (formerly BiSeqDict)
import ManyToOne exposing (ManyToOne)
type UserId = UserId Never
type WorkspaceId = WorkspaceId Never
userWorkspaces : ManyToOne (Id UserId) (Id WorkspaceId)
userWorkspaces =
ManyToOne.empty
|> ManyToOne.insert aliceId workspace1
|> ManyToOne.insert bobId workspace1
|> ManyToOne.insert charlieId workspace2
-- Forward lookup
ManyToOne.get aliceId userWorkspaces
--> Just workspace1
-- Reverse lookup (renamed API)
ManyToOne.getKeys workspace1 userWorkspaces
--> SeqSet.fromList [ aliceId, bobId ]OneToMany (formerly MultiSeqDict)
import OneToMany exposing (OneToMany)
type PropertyId = PropertyId Never
type UnitId = UnitId Never
propertyUnits : OneToMany (Id PropertyId) (Id UnitId)
propertyUnits =
OneToMany.empty
|> OneToMany.insert property1 unit101
|> OneToMany.insert property1 unit102
|> OneToMany.insert property2 unit201
-- Forward lookup
OneToMany.getAll property1 propertyUnits
--> SeqSet.fromList [ unit101, unit102 ]
-- Reverse lookup (renamed API)
OneToMany.getKeys unit101 propertyUnits
--> SeqSet.fromList [ property1 ]
-- Remove a specific association
OneToMany.remove property1 unit102 propertyUnitsManyToMany (formerly MultiBiSeqDict)
import ManyToMany exposing (ManyToMany)
type ChatId = ChatId Never
type DocumentId = DocumentId Never
chatDocuments : ManyToMany (Id ChatId) (Id DocumentId)
chatDocuments =
ManyToMany.empty
|> ManyToMany.insert chat1 doc1
|> ManyToMany.insert chat1 doc2
|> ManyToMany.insert chat2 doc1
-- Forward lookup
ManyToMany.getAll chat1 chatDocuments
--> SeqSet.fromList [ doc1, doc2 ]
-- Reverse lookup (renamed API)
ManyToMany.getKeys doc1 chatDocuments
--> SeqSet.fromList [ chat1, chat2 ]
-- Transfer
chatDocuments
|> ManyToMany.remove chat1 doc2
|> ManyToMany.insert chat3 doc2There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this naming because it naturally fits in with OneToOne.elm module I'd like to add
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this. Will wait for team consensus on exact names
|
Yes - will implement API changes ( |
- get → getAll for x-to-many (MultiSeqDict, MultiBiSeqDict) - getReverse → getKeys for bidirectional types (BiSeqDict, MultiBiSeqDict) - Add removeValues to MultiSeqDict and MultiBiSeqDict
miniBill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass review. In general, I think these data structures could be very useful, but I wonder if putting them inside lamdera/containers is the best path forward, especially considering that versioning lamdera/* packages is currently... interesting.
Ideally I'd love to see a containers-extra package, but at the moment it would be unpublishable which would make it hard to use.
| gatherEqualsBy : (a -> b) -> List a -> List ( a, List a ) | ||
| gatherEqualsBy extract list = | ||
| gatherWith (\a b -> extract a == extract b) list | ||
|
|
||
|
|
||
| {-| Group equal elements together using a custom equality function. Elements will be | ||
| grouped in the same order as they appear in the original list. The same applies to | ||
| elements within each group. | ||
| gatherWith (==) [1,2,1,3,2] | ||
| --> [(1,[1]),(2,[2]),(3,[])] | ||
| -} | ||
| gatherWith : (a -> a -> Bool) -> List a -> List ( a, List a ) | ||
| gatherWith testFn list = | ||
| let | ||
| helper : List a -> List ( a, List a ) -> List ( a, List a ) | ||
| helper scattered gathered = | ||
| case scattered of | ||
| [] -> | ||
| List.reverse gathered | ||
|
|
||
| toGather :: population -> | ||
| let | ||
| ( gathering, remaining ) = | ||
| List.partition (testFn toGather) population | ||
| in | ||
| helper remaining (( toGather, gathering ) :: gathered) | ||
| in | ||
| helper list [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is O(n²). We can do better with hashing
| ) | ||
| in | ||
| reverseWithoutOld | ||
| |> SeqDict.update to (Maybe.withDefault SeqSet.empty >> SeqSet.insert from >> Just) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>> is inefficient, let's expand this to a single lambda with a pipeline inside. Actually, because update is just get + insert, we can ~inline them and avoid touching the dictionary at all if not needed
| d.reverse | ||
| |> SeqDict.update oldTo_ | ||
| (Maybe.map (SeqSet.remove from) | ||
| >> Maybe.andThen normalizeSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace usage of >> with a pipeline
| SeqDict.update from fn d.forward | ||
| |> fromDict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be much slower and have worse memory costs than implementing the full logic. To avoid duplication we could implement it in terms of insert and remove, but let's not call fromDict
| BiSeqDict | ||
| { d | ||
| | forward = SeqDict.remove from d.forward | ||
| , reverse = SeqDict.filterMap (\_ set -> SeqSet.remove from set |> normalizeSet) d.reverse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is O(n), we can do better
| SeqDict.update value | ||
| (\maybeKeys -> | ||
| Just <| | ||
| case maybeKeys of | ||
| Nothing -> | ||
| SeqSet.singleton key | ||
|
|
||
| Just keys_ -> | ||
| SeqSet.insert key keys_ | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably ~inline update here for simplicity?
| -} | ||
| filter : (k -> v -> Bool) -> BiSeqDict k v -> BiSeqDict k v | ||
| filter fn (BiSeqDict d) = | ||
| -- TODO diff instead of throwing away and creating from scratch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It really depends on how much the filter keeps, unfortunately, and we can't predict that. If filter only keeps a small portion of items, fromDict is faster, if it keeps most then doing a folded remove is going to be faster.
| union : BiSeqDict k v -> BiSeqDict k v -> BiSeqDict k v | ||
| union (BiSeqDict left) (BiSeqDict right) = | ||
| -- TODO diff instead of throwing away and creating from scratch? | ||
| SeqDict.union left.forward right.forward | ||
| |> fromDict | ||
|
|
||
|
|
||
| {-| Keep a key-value pair when its key appears in the second dictionary. | ||
| Preference is given to values in the first dictionary. | ||
| -} | ||
| intersect : BiSeqDict k v -> BiSeqDict k v -> BiSeqDict k v | ||
| intersect (BiSeqDict left) (BiSeqDict right) = | ||
| -- TODO diff instead of throwing away and creating from scratch? | ||
| SeqDict.intersect left.forward right.forward | ||
| |> fromDict | ||
|
|
||
|
|
||
| {-| Keep a key-value pair when its key does not appear in the second dictionary. | ||
| -} | ||
| diff : BiSeqDict k v -> BiSeqDict k v -> BiSeqDict k v | ||
| diff (BiSeqDict left) (BiSeqDict right) = | ||
| -- TODO diff instead of throwing away and creating from scratch? | ||
| SeqDict.diff left.forward right.forward | ||
| |> fromDict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For union/intersect/diff the algorithm for updating the reverse mapping might actually be nontrivial, will need to think about it
Summary
This PR adds three new bidirectional and multi-value dictionary types built on top of SeqDict:
Key Features
comparableconstraint (uses FNV hashing)Changes
elm.jsonto expose new moduleselm-community/list-extradependency (required for MultiSeqDict.fromFlatList)Status
The modules compile successfully but comprehensive tests need to be added.
Example Usage
BiSeqDict (Many-to-One)
MultiSeqDict (One-to-Many)
MultiBiSeqDict (Many-to-Many)
TODO
🤖 This PR is based on adapting the elm-bidict library to use SeqDict instead of Dict.