Bitcoin graph dataset #2912

VJalili · 2025-10-19T23:21:16Z

Description of changes:

This PR adds a YAML file that describes the Bitcoin Graph dataset resource.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

pschmied · 2025-11-05T18:55:06Z

Hi @VJalili , fantastic start on your 101 tutorial noteboook! Our assumption is that, for the release version, you'll repoint the examples to work from the full corpus you're making available on AWS. Other than that, looks great.

VJalili · 2025-11-06T03:19:17Z

Hi @pschmied, thanks for the review!

Your assumption is correct. To reiterate, the Bitcoin graph we're making available on AWS is a large-scale single graph (>2.4B nodes and ~40B edges). A common practice for training ML models on such a large graph is to train on sampled communities. The 101 tutorial is focused on using pre-sampled communities; these pre-sampled communities enable the ML community to quickly explore the dataset and "smoke test" its compatibility with various graph neural network architectures. The pre-sampled communities will be hosted on AWS, and for the release, we will update the links on the notebook to point to buckets on AWS. We'll also update the notebook to guide users toward using batches of the data (i.e., independent sub-graphs in TSV files).

Moreover, the dataset is also prepared for usage in graph databases (e.g., Neo4j or Amazon Neptune). We recommend the community load the dataset into a graph database, as it provides them with the option of sampling application-specific communities for their ML pipeline (we provide both methods and tutorials on this page). Since this use-case involves using specialized graph databases, runs on ~1TB of data, and takes days to run, we provide dedicated documentation and guidelines, and these resources will also point to the dataset on AWS (e.g., on the data release page).

VJalili · 2025-11-11T04:03:01Z

@pschmied I prepared a more comprehensive notebook that covers all the data hosted on this dataset's AWS bucket. Here is the link to the notebook: https://github.com/B1AAB/GraphStudio/blob/main/g101/g101.ipynb

If you find this more comprehensive and focused than the other, I can update the link in the yaml file to refer to g101.

pschmied · 2025-11-11T16:36:56Z

@VJalili I love it—perhaps combine the content? We really do want to make sure the community challenge question / problem remains. In general, the more data providers can demonstrate opinionated usage of a given dataset, the more help it is to would-be data users. Really appreciate your efforts here!

VJalili · 2025-11-11T17:14:59Z

@pschmied Glad you liked it!

perhaps combine the content?

I like that, we can merge.

We really do want to make sure the community challenge question / problem remains.

Are you referring to the Q: What is one question that you have answered using these data? Can you show us how you came to that answer? question? Also, does it need to be in the same words, or can we rephrase it to better match the dataset?

In general, the more data providers can demonstrate opinionated usage of a given dataset, the more help it is to would-be data users.

Sure! We can keep Kaggle as the alternative option.

pschmied · 2025-11-11T17:19:44Z

I was thinking more of the last question:

Q: What is one unanswered question that you think could be answered using these data? Do you have any recommendations or advice for someone wanting to answer this question?

You are doing a great job of illustrating things you have done / can do with the data.

And no, we're not wed to the literal template format. We generally want a basic intro notebook to have those elements, but we intentionally left room for improvement / expansion :-)

VJalili · 2025-11-12T02:16:59Z

I like that, it will be very helpful, thanks @pschmied

Please take a look at the updated notebook in the following PR; warmly appreciate all feedback!

B1AAB/GraphStudio#1

VJalili added 2 commits October 19, 2025 19:19

Draft bitcoin graph dataset yaml.

0b6095b

Merge branch 'main' into bitcoin-graph

fe7c91d

kmk142789 approved these changes Oct 28, 2025

View reviewed changes

VJalili added 2 commits October 28, 2025 18:57

Add publication details.

4b895b7

Update AuthorURL to include 'https://' prefix

5ac9359

VJalili marked this pull request as ready for review October 28, 2025 22:58

Merge branch 'main' into bitcoin-graph

305c3c6

VJalili added 3 commits November 10, 2025 23:04

Update resource details in eba.yaml

00b90f3

Specify S3 Bucket type in eba.yaml

3e8cb26

Enhance dataset description for Bitcoin Graph

bc7b955

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bitcoin graph dataset #2912

Bitcoin graph dataset #2912

Uh oh!

VJalili commented Oct 19, 2025

Uh oh!

pschmied commented Nov 5, 2025

Uh oh!

VJalili commented Nov 6, 2025

Uh oh!

VJalili commented Nov 11, 2025

Uh oh!

pschmied commented Nov 11, 2025

Uh oh!

VJalili commented Nov 11, 2025 •

edited

Loading

Uh oh!

pschmied commented Nov 11, 2025

Uh oh!

VJalili commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Bitcoin graph dataset #2912

Are you sure you want to change the base?

Bitcoin graph dataset #2912

Uh oh!

Conversation

VJalili commented Oct 19, 2025

Uh oh!

pschmied commented Nov 5, 2025

Uh oh!

VJalili commented Nov 6, 2025

Uh oh!

VJalili commented Nov 11, 2025

Uh oh!

pschmied commented Nov 11, 2025

Uh oh!

VJalili commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pschmied commented Nov 11, 2025

Uh oh!

VJalili commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

VJalili commented Nov 11, 2025 •

edited

Loading