Commit 0b83c0f
committed
Squashed commit of the following:
commit 43c4dc2
Author: sua yoo <sua@webrecorder.org>
Date: Mon Nov 3 09:12:50 2025 -0800
task: Add dedupe form control to workflow (#2932)
- Adds new "Deduplication" section to workflows
- Allows users to use a collection for deduplication
- Various refactors for consistency
commit 2fcf6d7
Author: Ilya Kreymer <ikreymer@users.noreply.github.com>
Date: Tue Oct 28 09:47:25 2025 -0700
Dedup Backend Initial Implementation (#2868)
Fixes #2867
The backend implementation involves:
Operator
- A new CollIndex CRD type, btrix-crds updated to 0.2.0
- Operator that manages the new CRD type, creating a new Redis instance
when the index should exist (uses redis_dedupe_memory and redis_dedupe_storage chart values)
- dedupe_importer_channel can configure crawler channel for index imports
- Operator starts the crawler in 'indexer' mode
Workflows & Crawls:
- Workflows have a new 'dedupeCollId' field for dedupe while crawling
The `dedupeCollId` must also be a collection that the crawl is
auto-added to.
- There is a new waiting state: `waiting_for_dedupe_index` that is
entered if a crawl is starting, but index is not yet ready.
- Each crawl has bi-directional links for crawls that it requires for
dedupe via `requiresCrawls` and other crawls for which this crawl is
required via `requiredByCrawls`.
- autoAddCollections automatically updated to always include
`dedupeCollId` collection.
Collection:
- Collection has a new `hasDedupeIndex` field
- Items added/removed to/from collection result in marking CollIndex object for updates by updating collItemsUpdatedAt timestamp to trigger a reindex
- CollIndex object deleted on collection delete
For indexing, dependent on version of crawler from
webrecorder/browsertrix-crawler#884
that supports indexing mode.
---------
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>1 parent ca3f226 commit 0b83c0f
File tree
61 files changed
+1658
-266
lines changed- backend/btrixcloud
- migrations
- operator
- chart
- app-templates
- btrix-crds
- templates
- charts
- templates
- test
- frontend
- docs/docs/user-guide
- src
- components/ui
- context/search-org
- events
- features
- archived-items
- collections
- linked-collections
- crawl-workflows
- pages/org
- settings/components
- strings/crawl-workflows
- types
- utils
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
61 files changed
+1658
-266
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
| 52 | + | |
52 | 53 | | |
53 | 54 | | |
54 | 55 | | |
| |||
57 | 58 | | |
58 | 59 | | |
59 | 60 | | |
| 61 | + | |
| 62 | + | |
60 | 63 | | |
61 | 64 | | |
62 | 65 | | |
| |||
81 | 84 | | |
82 | 85 | | |
83 | 86 | | |
| 87 | + | |
84 | 88 | | |
85 | | - | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
86 | 97 | | |
87 | 98 | | |
88 | 99 | | |
| |||
91 | 102 | | |
92 | 103 | | |
93 | 104 | | |
| 105 | + | |
94 | 106 | | |
95 | 107 | | |
96 | 108 | | |
| |||
141 | 153 | | |
142 | 154 | | |
143 | 155 | | |
| 156 | + | |
144 | 157 | | |
145 | 158 | | |
146 | 159 | | |
147 | 160 | | |
148 | 161 | | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
149 | 165 | | |
150 | 166 | | |
151 | 167 | | |
| |||
194 | 210 | | |
195 | 211 | | |
196 | 212 | | |
197 | | - | |
| 213 | + | |
198 | 214 | | |
199 | 215 | | |
200 | | - | |
| 216 | + | |
201 | 217 | | |
202 | 218 | | |
203 | 219 | | |
204 | 220 | | |
205 | 221 | | |
206 | 222 | | |
207 | | - | |
| 223 | + | |
208 | 224 | | |
209 | 225 | | |
210 | 226 | | |
211 | 227 | | |
212 | 228 | | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
213 | 240 | | |
214 | 241 | | |
215 | 242 | | |
| |||
221 | 248 | | |
222 | 249 | | |
223 | 250 | | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
224 | 261 | | |
225 | 262 | | |
226 | 263 | | |
| |||
229 | 266 | | |
230 | 267 | | |
231 | 268 | | |
232 | | - | |
233 | | - | |
234 | 269 | | |
235 | 270 | | |
236 | 271 | | |
| |||
240 | 275 | | |
241 | 276 | | |
242 | 277 | | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
243 | 281 | | |
244 | | - | |
| 282 | + | |
245 | 283 | | |
246 | 284 | | |
247 | 285 | | |
| |||
270 | 308 | | |
271 | 309 | | |
272 | 310 | | |
273 | | - | |
| 311 | + | |
274 | 312 | | |
275 | 313 | | |
276 | 314 | | |
| |||
294 | 332 | | |
295 | 333 | | |
296 | 334 | | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
297 | 353 | | |
298 | 354 | | |
299 | 355 | | |
| |||
396 | 452 | | |
397 | 453 | | |
398 | 454 | | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
399 | 465 | | |
400 | 466 | | |
401 | 467 | | |
| |||
639 | 705 | | |
640 | 706 | | |
641 | 707 | | |
| 708 | + | |
| 709 | + | |
| 710 | + | |
642 | 711 | | |
643 | 712 | | |
644 | 713 | | |
| |||
740 | 809 | | |
741 | 810 | | |
742 | 811 | | |
743 | | - | |
| 812 | + | |
| 813 | + | |
| 814 | + | |
744 | 815 | | |
745 | 816 | | |
746 | 817 | | |
| |||
749 | 820 | | |
750 | 821 | | |
751 | 822 | | |
| 823 | + | |
| 824 | + | |
| 825 | + | |
| 826 | + | |
752 | 827 | | |
753 | 828 | | |
754 | 829 | | |
| |||
783 | 858 | | |
784 | 859 | | |
785 | 860 | | |
| 861 | + | |
786 | 862 | | |
787 | | - | |
| 863 | + | |
788 | 864 | | |
789 | 865 | | |
790 | 866 | | |
791 | 867 | | |
792 | | - | |
| 868 | + | |
| 869 | + | |
| 870 | + | |
793 | 871 | | |
794 | 872 | | |
795 | 873 | | |
| |||
1000 | 1078 | | |
1001 | 1079 | | |
1002 | 1080 | | |
1003 | | - | |
| 1081 | + | |
| 1082 | + | |
| 1083 | + | |
| 1084 | + | |
| 1085 | + | |
| 1086 | + | |
| 1087 | + | |
1004 | 1088 | | |
1005 | 1089 | | |
1006 | 1090 | | |
1007 | 1091 | | |
1008 | | - | |
| 1092 | + | |
| 1093 | + | |
| 1094 | + | |
1009 | 1095 | | |
1010 | 1096 | | |
1011 | 1097 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
8 | 18 | | |
9 | 19 | | |
10 | 20 | | |
| |||
319 | 329 | | |
320 | 330 | | |
321 | 331 | | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
322 | 340 | | |
323 | 341 | | |
324 | 342 | | |
| |||
346 | 364 | | |
347 | 365 | | |
348 | 366 | | |
| 367 | + | |
349 | 368 | | |
350 | 369 | | |
351 | 370 | | |
| |||
362 | 381 | | |
363 | 382 | | |
364 | 383 | | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
365 | 387 | | |
366 | 388 | | |
367 | 389 | | |
| |||
605 | 627 | | |
606 | 628 | | |
607 | 629 | | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
608 | 650 | | |
609 | 651 | | |
610 | 652 | | |
| |||
632 | 674 | | |
633 | 675 | | |
634 | 676 | | |
635 | | - | |
| 677 | + | |
636 | 678 | | |
637 | 679 | | |
638 | 680 | | |
639 | 681 | | |
640 | 682 | | |
641 | 683 | | |
642 | 684 | | |
| 685 | + | |
| 686 | + | |
| 687 | + | |
| 688 | + | |
| 689 | + | |
| 690 | + | |
| 691 | + | |
| 692 | + | |
643 | 693 | | |
644 | 694 | | |
645 | 695 | | |
| |||
654 | 704 | | |
655 | 705 | | |
656 | 706 | | |
| 707 | + | |
| 708 | + | |
| 709 | + | |
| 710 | + | |
| 711 | + | |
657 | 712 | | |
658 | 713 | | |
659 | 714 | | |
660 | | - | |
| 715 | + | |
661 | 716 | | |
662 | 717 | | |
663 | 718 | | |
| |||
1123 | 1178 | | |
1124 | 1179 | | |
1125 | 1180 | | |
| 1181 | + | |
| 1182 | + | |
| 1183 | + | |
| 1184 | + | |
1126 | 1185 | | |
1127 | 1186 | | |
1128 | 1187 | | |
| |||
0 commit comments