Skip to content

Conversation

@CharlonTank
Copy link

@CharlonTank CharlonTank commented Nov 4, 2025

Summary

This PR adds three new bidirectional and multi-value dictionary types built on top of SeqDict:

  • BiSeqDict: Many-to-one bidirectional dictionary with reverse lookups
  • MultiSeqDict: One-to-many dictionary (keys map to multiple values)
  • MultiBiSeqDict: Many-to-many bidirectional dictionary

Key Features

  • ✅ Built directly on SeqDict's HAMT implementation
  • ✅ No comparable constraint (uses FNV hashing)
  • ✅ Preserves insertion order
  • ✅ Better performance than AssocList-based implementations

Changes

  • Added 3 new modules (1,246 lines of code)
  • Updated elm.json to expose new modules
  • Added elm-community/list-extra dependency (required for MultiSeqDict.fromFlatList)

Status

⚠️ Work In Progress - Needs Testing

The modules compile successfully but comprehensive tests need to be added.

Example Usage

BiSeqDict (Many-to-One)

manyToOne : BiSeqDict String Int
manyToOne =
    BiSeqDict.empty
        |> BiSeqDict.insert "A" 1
        |> BiSeqDict.insert "B" 2
        |> BiSeqDict.insert "C" 1  -- Multiple keys can map to same value

BiSeqDict.getReverse 1 manyToOne
--> Set.fromList ["A", "C"]

MultiSeqDict (One-to-Many)

oneToMany : MultiSeqDict String Int
oneToMany =
    MultiSeqDict.empty
        |> MultiSeqDict.insert "A" 1
        |> MultiSeqDict.insert "A" 2  -- One key can map to multiple values

MultiSeqDict.get "A" oneToMany
--> Set.fromList [1, 2]

MultiBiSeqDict (Many-to-Many)

manyToMany : MultiBiSeqDict String Int
manyToMany =
    MultiBiSeqDict.empty
        |> MultiBiSeqDict.insert "A" 1
        |> MultiBiSeqDict.insert "B" 2
        |> MultiBiSeqDict.insert "A" 2  -- Both directions support multiple mappings

MultiBiSeqDict.get "A" manyToMany
--> Set.fromList [1, 2]

MultiBiSeqDict.getReverse 2 manyToMany
--> Set.fromList ["A", "B"]

TODO

  • Add comprehensive test suite
  • Add documentation examples
  • Performance benchmarks
  • Review API naming consistency with SeqDict

🤖 This PR is based on adapting the elm-bidict library to use SeqDict instead of Dict.

Add three new dictionary types built on top of SeqDict:
- BiSeqDict: Many-to-one bidirectional dictionary with reverse lookups
- MultiSeqDict: One-to-many dictionary (keys map to multiple values)
- MultiBiSeqDict: Many-to-many bidirectional dictionary

These modules preserve insertion order and work with any types
(not just comparable), leveraging SeqDict's FNV hashing implementation.
@MartinSStewart
Copy link
Member

Thanks for this PR! I checked the BiSeqDict module. It looks like you still have comparable for all of the type vars which will needlessly restrict the user.

@CharlonTank
Copy link
Author

CharlonTank commented Nov 4, 2025

@MartinSStewart Thanks for the feedback! You're absolutely right.

I've completely fixed all three modules:

  • ✅ Replaced Set with SeqSet everywhere (function calls, type signatures, doc examples)
  • ✅ Renamed type variables from comparable1/comparable2 to k/v for clarity
  • ✅ Fixed normalizeSet function signatures and implementations

All modules now work with any types, fully leveraging SeqDict and SeqSet's FNV hashing. No more comparable restrictions anywhere! 🎉

Latest commit: e94d10b

Fixes feedback from @MartinSStewart: The modules were unnecessarily
restricting type variables to 'comparable'.

Changes:
- Replace Set with SeqSet (no comparable constraint needed)
- Rename type vars from 'comparable1/comparable2' to 'k/v' for clarity
- All three modules (BiSeqDict, MultiSeqDict, MultiBiSeqDict) now work
  with any types, not just comparables

This fully leverages SeqDict's FNV hashing implementation.
Add comprehensive tests proving the modules work with non-comparable types:
- BiSeqDict tests with custom types and records
- MultiSeqDict tests with custom types
- MultiBiSeqDict tests with custom types and bidirectional lookups

Note: Tests cannot be run due to pre-existing lamdera/codecs dependency issue in the repo.
Copy link
Member

@supermario supermario left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need a very thorough review as it's a lot of presumably AI-genn'ed code addition to primitive concepts we want to be both correct and performant.

Can you update the readme as well please with another section for these additional types and their uses?

I'd also like to see the tests be a little more comprehensive as well to match the existing level of testing for SeqDict/SeqSet.

…dency

As suggested by @supermario, inlined gatherEqualsBy and gatherWith into
Internal.ListHelpers module to keep dependency tree minimal for a core package.

This removes the elm-community/list-extra dependency.
- Add encodeBiSeqDict/decodeBiSeqDict to BiSeqDict.elm
- Add encodeMultiSeqDict/decodeMultiSeqDict to MultiSeqDict.elm
- Add encodeMultiBiSeqDict/decodeMultiBiSeqDict to MultiBiSeqDict.elm
- Enhance README with realistic opaque ID type examples
- Add many-to-many chat/documents example for MultiBiSeqDict
- Document known codec generation issue with Lamdera compiler
CharlonTank added a commit to CharlonTank/compiler that referenced this pull request Nov 5, 2025
This adds Wire3 codec generation support for three new container types
from lamdera/containers:
- BiSeqDict: Bidirectional many-to-one dictionary
- MultiSeqDict: One-to-many dictionary
- MultiBiSeqDict: Many-to-many bidirectional dictionary

Changes:
- Add DiffableType constructors for the three new types
- Add Wire3 encoder/decoder patterns following SeqDict pattern
- Add TypeHash support for type hashing and migrations
- Add Evergreen migration helpers for all three types

Related PR: lamdera/containers#1
Test repo: https://github.com/supermario/qwertytrewq

All three types use the same 2-parameter (key, value) structure as SeqDict
and work with opaque types that aren't comparable.
@CharlonTank
Copy link
Author

CharlonTank commented Nov 5, 2025

Related PRs and Testing

Compiler support PR: lamdera/compiler#69

This PR requires the compiler changes in lamdera/compiler#69 to properly generate Wire3 codecs for the three new container types when used in BackendModel.

Test repository: https://github.com/CharlonTank/qwertytrewq

The test repo demonstrates real-world usage with MultiBiSeqDict (Id ChatId) (Id DocumentId) and compiles successfully with both PRs applied.

Testing Instructions

# Clone test repo  
git clone https://github.com/CharlonTank/qwertytrewq
cd qwertytrewq

# Use override script
./override-dillon.sh /path/to/containers

# Compile (requires compiler from lamdera/compiler#69)
LDEBUG=1 EXPERIMENTAL=1 LOVR="$(pwd)/overrides" lamdera make src/Backend.elm

✅ With both PRs: Compilation succeeds
❌ Without compiler PR: Codec generation errors

CharlonTank added a commit to CharlonTank/qwertytrewq that referenced this pull request Nov 5, 2025
- Add Id.elm with proper opaque ID types (ChatId, DocumentId)
- Update Backend.elm to use Id types instead of Never constructors
- Update Types.elm to use Id ChatId and Id DocumentId
- Update README with links to both PRs and test results
- Add COMPILER_CHANGES_NEEDED.md documentation
- Install UUID package dependency

Test now compiles successfully with modified compiler!

Related PRs:
- lamdera/containers#1
- lamdera/compiler#69
@CharlonTank CharlonTank marked this pull request as ready for review November 5, 2025 06:55
@CharlonTank
Copy link
Author

CharlonTank commented Nov 5, 2025

This will need a very thorough review as it's a lot of presumably AI-genn'ed code addition to primitive concepts we want to be both correct and performant.

Can you update the readme as well please with another section for these additional types and their uses?

I'd also like to see the tests be a little more comprehensive as well to match the existing level of testing for SeqDict/SeqSet.

I'm working on making extensive tests for these modules but how do you run the tests locally with the modified Lamdera compiler? I tried LDEBUG=1 elm-test --compiler and LDEBUG=1 elm-test-rs --compiler but both fail on the kernel module imports (Elm.Kernel.FNV, Elm.Kernel.JsArray2). What's the recommended approach for testing Lamdera packages with kernel code?

- BiSeqDictTests.elm: 93 → 424 lines (45+ tests covering all operations plus bidirectional lookups)
- MultiSeqDictTests.elm: 82 → 598 lines (51 tests for one-to-many relationships)
- MultiBiSeqDictTests.elm: 91 → 913 lines (70 tests for many-to-many with reverse index consistency)

All test suites now match SeqDict/SeqSet coverage level with:
- Build, query, transform, and combine operations
- Custom type tests (proving no comparable constraint)
- Comprehensive fuzz tests for property-based testing
- Reverse lookup tests specific to bidirectional types

Updated README to confirm Wire3 codec support is working.

Co-Authored-By: Claude <noreply@anthropic.com>
@supermario
Copy link
Member

@CharlonTank sorry missed your questions. I think you'll need to make sure you compile the project with lamdera make successfully first, and then after that

LDEBUG=1 elm-test --compiler `which lamdera`

should work, assuming that lamdera --version-full you have on PATH is the correct compiled one with the additions you have.

Copy link
Member

@supermario supermario left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments to discuss, apologies I've not had the time to do some proper evaluation before this.

README.md Outdated
--> Just workspace1

-- Reverse lookup: Who are all members of workspace1?
BiSeqDict.getReverse workspace1 userWorkspaces
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps BiSeqDict.getKeys workspace1 userWorkspaces would be clearer?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Though still reads a bit awkward with the collection name - "get the keys for doc1 chat documents"... open to other ideas

README.md Outdated
|> MultiSeqDict.insert property2 unit201

-- Get all units for property1
MultiSeqDict.get property1 propertyUnits
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MultiSeqDict.get property1 propertyUnits
MultiSeqDict.getAll property1 propertyUnits

A little torn on this one. .get might be what you reach for naturally when writing code.

But .getAll will be much more helpful and clear when reading code – all other .get are singular in other collections.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed 👍

README.md Outdated
--> SeqSet.fromList [doc1, doc2]

-- Which chats contain doc1?
MultiBiSeqDict.getReverse doc1 chatDocuments
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, consideration for change to both .get -> .getAll and .getReverse -> .getKeys.

But when reading through the example code, the insertion/removal reads beautifully, but the retrieval reads clunkily, my brain is juggling which direction things are going.

MultiBiSeqDict.getAll chat1 chatDocuments

Reads to me as "get all chat1 documents" -> not bad.

MultiBiSeqDict.getReverse doc1 chatDocuments
MultiBiSeqDict.getKeys doc1 chatDocuments

Reads to me as "get the reverse document chat documents?"
Or "get the keys for doc1 chat documents...?"

Feel like there's an opportunity for something much clearer here. Will think a bit more but ideas welcome.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same - open to ideas if something reads better

remove from to (MultiSeqDict d) =
MultiSeqDict <|
SeqDict.update from (Maybe.andThen (SeqSet.remove to >> normalizeSet)) d

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we're missing a removeValues version – i.e. for the x-to-many variants, we'd want to be able to remove all values of some given value on the right hand side.

propertyUnits |> OneToMany.removeValues unit102

chatDocuments |> ManyToMany.removeValues doc1

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, will add

<sup>*Non-equatable Elm values are currently: functions, `Bytes`, `Html`, `Json.Value`, `Task`, `Cmd`, `Sub`, `Never`, `Texture`, `Shader`, and any datastructures containing these types.</sup>


## BiSeqDict, MultiSeqDict, and MultiBiSeqDict (bidirectional and multi-value dictionaries)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One general thing I felt reading through as a whole was even after fully understanding what's going on, reading BiSeqDict and MultiBiSeqDict and MultiSeqDict I kept thinking "wait hold on which one is that?", scrolling back to the titles below to see the BiSeqDict (Many-to-One) and going "Ah right yes, this is the many-to-one one.

That just kept happening over and over... 😅

So what if instead, we names them the thing they are?

  • ManyToOne
  • OneToMany
  • ManyToMany

I'm not 100% a fan of this naming in general, but after redoing the examples with this naming at least to me reads a lot nicer and helps keep clarity in my head of what's going on, so I think it is a step better naming than the SeqDict versions:

ManyToOne (formerly BiSeqDict)

import ManyToOne exposing (ManyToOne)

type UserId = UserId Never
type WorkspaceId = WorkspaceId Never

userWorkspaces : ManyToOne (Id UserId) (Id WorkspaceId)
userWorkspaces =
    ManyToOne.empty
        |> ManyToOne.insert aliceId workspace1
        |> ManyToOne.insert bobId workspace1
        |> ManyToOne.insert charlieId workspace2

-- Forward lookup
ManyToOne.get aliceId userWorkspaces
--> Just workspace1

-- Reverse lookup (renamed API)
ManyToOne.getKeys workspace1 userWorkspaces
--> SeqSet.fromList [ aliceId, bobId ]

OneToMany (formerly MultiSeqDict)

import OneToMany exposing (OneToMany)

type PropertyId = PropertyId Never
type UnitId = UnitId Never

propertyUnits : OneToMany (Id PropertyId) (Id UnitId)
propertyUnits =
    OneToMany.empty
        |> OneToMany.insert property1 unit101
        |> OneToMany.insert property1 unit102
        |> OneToMany.insert property2 unit201

-- Forward lookup
OneToMany.getAll property1 propertyUnits
--> SeqSet.fromList [ unit101, unit102 ]

-- Reverse lookup (renamed API)
OneToMany.getKeys unit101 propertyUnits
--> SeqSet.fromList [ property1 ]

-- Remove a specific association
OneToMany.remove property1 unit102 propertyUnits

ManyToMany (formerly MultiBiSeqDict)

import ManyToMany exposing (ManyToMany)

type ChatId = ChatId Never
type DocumentId = DocumentId Never

chatDocuments : ManyToMany (Id ChatId) (Id DocumentId)
chatDocuments =
    ManyToMany.empty
        |> ManyToMany.insert chat1 doc1
        |> ManyToMany.insert chat1 doc2
        |> ManyToMany.insert chat2 doc1

-- Forward lookup
ManyToMany.getAll chat1 chatDocuments
--> SeqSet.fromList [ doc1, doc2 ]

-- Reverse lookup (renamed API)
ManyToMany.getKeys doc1 chatDocuments
--> SeqSet.fromList [ chat1, chat2 ]

-- Transfer
chatDocuments
    |> ManyToMany.remove chat1 doc2
    |> ManyToMany.insert chat3 doc2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this naming because it naturally fits in with OneToOne.elm module I'd like to add

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this. Will wait for team consensus on exact names

@CharlonTank
Copy link
Author

Yes - will implement API changes (getAll, getKeys, removeValues) and wait for naming consensus before renaming

- get → getAll for x-to-many (MultiSeqDict, MultiBiSeqDict)
- getReverse → getKeys for bidirectional types (BiSeqDict, MultiBiSeqDict)
- Add removeValues to MultiSeqDict and MultiBiSeqDict
Copy link
Member

@miniBill miniBill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass review. In general, I think these data structures could be very useful, but I wonder if putting them inside lamdera/containers is the best path forward, especially considering that versioning lamdera/* packages is currently... interesting.

Ideally I'd love to see a containers-extra package, but at the moment it would be unpublishable which would make it hard to use.

Comment on lines +16 to +45
gatherEqualsBy : (a -> b) -> List a -> List ( a, List a )
gatherEqualsBy extract list =
gatherWith (\a b -> extract a == extract b) list


{-| Group equal elements together using a custom equality function. Elements will be
grouped in the same order as they appear in the original list. The same applies to
elements within each group.
gatherWith (==) [1,2,1,3,2]
--> [(1,[1]),(2,[2]),(3,[])]
-}
gatherWith : (a -> a -> Bool) -> List a -> List ( a, List a )
gatherWith testFn list =
let
helper : List a -> List ( a, List a ) -> List ( a, List a )
helper scattered gathered =
case scattered of
[] ->
List.reverse gathered

toGather :: population ->
let
( gathering, remaining ) =
List.partition (testFn toGather) population
in
helper remaining (( toGather, gathering ) :: gathered)
in
helper list []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is O(n²). We can do better with hashing

)
in
reverseWithoutOld
|> SeqDict.update to (Maybe.withDefault SeqSet.empty >> SeqSet.insert from >> Just)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>> is inefficient, let's expand this to a single lambda with a pipeline inside. Actually, because update is just get + insert, we can ~inline them and avoid touching the dictionary at all if not needed

d.reverse
|> SeqDict.update oldTo_
(Maybe.map (SeqSet.remove from)
>> Maybe.andThen normalizeSet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace usage of >> with a pipeline

Comment on lines +145 to +146
SeqDict.update from fn d.forward
|> fromDict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be much slower and have worse memory costs than implementing the full logic. To avoid duplication we could implement it in terms of insert and remove, but let's not call fromDict

BiSeqDict
{ d
| forward = SeqDict.remove from d.forward
, reverse = SeqDict.filterMap (\_ set -> SeqSet.remove from set |> normalizeSet) d.reverse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is O(n), we can do better

Comment on lines +304 to +313
SeqDict.update value
(\maybeKeys ->
Just <|
case maybeKeys of
Nothing ->
SeqSet.singleton key

Just keys_ ->
SeqSet.insert key keys_
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably ~inline update here for simplicity?

-}
filter : (k -> v -> Bool) -> BiSeqDict k v -> BiSeqDict k v
filter fn (BiSeqDict d) =
-- TODO diff instead of throwing away and creating from scratch?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It really depends on how much the filter keeps, unfortunately, and we can't predict that. If filter only keeps a small portion of items, fromDict is faster, if it keeps most then doing a folded remove is going to be faster.

Comment on lines +382 to +405
union : BiSeqDict k v -> BiSeqDict k v -> BiSeqDict k v
union (BiSeqDict left) (BiSeqDict right) =
-- TODO diff instead of throwing away and creating from scratch?
SeqDict.union left.forward right.forward
|> fromDict


{-| Keep a key-value pair when its key appears in the second dictionary.
Preference is given to values in the first dictionary.
-}
intersect : BiSeqDict k v -> BiSeqDict k v -> BiSeqDict k v
intersect (BiSeqDict left) (BiSeqDict right) =
-- TODO diff instead of throwing away and creating from scratch?
SeqDict.intersect left.forward right.forward
|> fromDict


{-| Keep a key-value pair when its key does not appear in the second dictionary.
-}
diff : BiSeqDict k v -> BiSeqDict k v -> BiSeqDict k v
diff (BiSeqDict left) (BiSeqDict right) =
-- TODO diff instead of throwing away and creating from scratch?
SeqDict.diff left.forward right.forward
|> fromDict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For union/intersect/diff the algorithm for updating the reverse mapping might actually be nontrivial, will need to think about it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants