- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.3k
Rewrite "About" page to better reflect Git's current position #2103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it!
There's just one thing I have thoughts about: We shouldn't oversell Git's scalability.
        
          
                content/about/_index.html
              
                Outdated
          
        
      | large repositories from day one. | ||
| Speed and performance has always been a primary design goal of Git. | ||
| </p> | ||
|  | ||
| <p> | ||
| Git also stores repository history efficiently. As of 2025, the current version | ||
| of the Linux kernel's source code is 1.7 GB. | ||
| Git stores the full history of the Linux project (1.4 million commits) in only 5.5 GB. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am hesitant to agree with "built to handle large repositories from day one". It's not only that the largest Git repo can not be handled well by Git (and larger monorepos exist, e.g. Google's, that have no prayer of living inside a Git repository given that the Git project does not recognize this shortcoming as something that should be fixed). It's also that Git has been built to handle the Linux repository, assuming that that is a large repository.
Today, only people on the Git repository consider the Linux repository a large one.
But even on the Git mailing list, core contributors had been made aware of larger repositories and how Git struggles with them. I am thinking e.g. about GCC's attempts as early as 2007 which failed (and it took until 2020 for them to be able to make the switch, partially because Git's support for larger repositories had been improved so much), and I am also thinking about Git's initial total lack of support for subprojects (and Git's introduction of submodules might not strike everybody equally as a net improvement), I am thinking about the need to introduce shallow clones (which are now a maintenance and CPU-hog drag, which should have been helped by the introduction of the partial clone feature but weren't because the core Git reviewers rejected the idea to fetch older commits on demand and insisted on having the entire reachable commits at all times), and I am thinking about the necessity to introduce Scalar because plain, regular Git was utterly ill-prepared to support repository sizes of the likes of Microsoft Office's.
To a large extent, Git is still quite ill-prepared to support large repositories, as the largeness sometimes stems from large binary files that are still, to this date, in dear want of seeing better recognition of their needs in the Git project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, that makes sense! I like the idea of adding a paragraph about Git's performance limitations. My impression from reading what you wrote is that:
- If you have many large binary files, you may need to use an extension like git LFS or something else, even if your repository is not that big. (i'm not really sure how to quantify any of that since I've never worked with a lot of large binary files, would love ideas)
- Git can struggle with repositories that are much larger than the Linux kernel, for example if your repository has 300GB of code, it may have performance issues.
Does that sound right? I guess I'd like to put some kind of number on it, like "if your repo is more than X GB, Git may start to have issues"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If you have many large binary files, you may need to use an extension like git LFS or something else, even if your repository is not that big.
Correct. And Git LFS is not really a solution, I consider it more of a "proof gone wrong" that Git's core architecture needs some lovin'.
- Git can struggle with repositories that are much larger than the Linux kernel, for example if your repository has 300GB of code, it may have performance issues.
Right. If your repository weighs more than 50GB, most hosting services (except Azure Repos) will gently try to guide you elsewhere.
I guess I'd like to put some kind of number on it, like "if your repo is more than X GB, Git may start to have issues"
Nobody has firm numbers. There is this project called git sizer which I once thought had done proper research and firmly-founded evidence what works and what doesn't, but it turned out that they also made up their thresholds from thin air like the rest of us (the issue reply suggests that there was data, but if there really was data about public repositories, that data would have been easily produced by way of an answer instead of the handwaving I got).
In short: I think this is all a bit too contentious for an "About" page. I would simply remove the claim that Git handles large repositories well, and then go on with our merry lives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this?
Git was built to work on the Linux kernel, meaning that it was built to handle repositories with tens of millions of lines of code from the start.
This doesn't make any claims about whether a repo with tens of millions of lines of code is "large" or not.
Personally I have never worked on a project bigger than the Linux kernel and (though I know Git has limits) I find it helpful to know that it works well on a project of that scale, since that means it'll probably work well for my project too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great, updated the PR to use that phrasing!
These pages are trying to convince the reader that Git is better than Subversion, which isn't the relevant thing to communicate to people learning about Git today. Write a single short About page, and make the rest of the About pages redirect to /about Also: - remove special "About" template - remove tests for "small-and-fast" Signed-off-by: Julia Evans <julia@jvns.ca>
Update the copy on the homepage to reflect the new content of the About page, and remove the now-deleted about pages from the sidebar. Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Julia Evans <julia@jvns.ca>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Let's merge it, and iterate from there if needed.
| Nice work here! Clean and simple. | 
As discussed in #2058, Git is no longer an upstart, instead it's by far the most widely used version control system. I have a few goals here:
I removed "Git is easy to learn" on the homepage because I'm not so sure that's accurate, and all of the other "About" pages because they were framed as a comparison to Subversion, and it seemed very hard to rework them. If folks think that some of the old About content shouldn't be deleted I'd be happy to talk about that though.
You can see it in action here: about page, homepage
Fixes #2058