|
| 1 | +# What are git hashes and why are they alaphanumeric? |
| 2 | + |
| 3 | + |
| 4 | +## What is a hash? |
| 5 | + |
| 6 | +- a hash is a fixed size value that can be used to represent data of arbitrary sizes |
| 7 | +- the *output* of a hashing function |
| 8 | +- often fixed to a hash table |
| 9 | + |
| 10 | +Common examples of hashing are lookup tables and encryption with a cyrptographic hash. |
| 11 | + |
| 12 | +A cyrptographic hash is additionally: |
| 13 | +- unique |
| 14 | +- not reversible |
| 15 | +- similar inputs hash to very different values so they appear uncorrelated |
| 16 | + |
| 17 | + |
| 18 | +Hashes can then be used for a lot of purposes: |
| 19 | +- message integrity (when sending a message, the unhashed message and its hash are both sent; the message is real if the sent message can be hashed to produce the same has) |
| 20 | +- password verification (password selected by the user is hashed and the hash is stored; when attempting to login, the input is hashed and the hashes are compared) |
| 21 | +- file or data identifier (eg in git) |
| 22 | + |
| 23 | + |
| 24 | + |
| 25 | +## Hashing in Git |
| 26 | + |
| 27 | + |
| 28 | +in git, 40 characters that uniquely represent either each object and are used as the key to retrieve the object as a value. Recall there are multiple types of objects: tree, commit, blob. |
| 29 | + |
| 30 | + |
| 31 | + |
| 32 | + |
| 33 | +Git as originally designed to use SHA-1 |
| 34 | + |
| 35 | +SHA-1 is weak. Git switched to hardened HSA-1 in response to a collision |
| 36 | + |
| 37 | +> In that case it adjusts the SHA-1 computation to result in a safe hash. This means that it will compute the regular SHA-1 hash for files without a collision attack, but produce a special hash for files with a collision attack, where both files will have a different unpredictable hash. |
| 38 | +[from](https://crypto.stackexchange.com/questions/44141/what-is-hardened-sha-1-how-does-it-work-and-how-much-protection-does-it-offer). |
| 39 | + |
| 40 | +Learn more about the [SHA-1 collision attach](https://shattered.io/) |
| 41 | + |
| 42 | +[they will change again soon](https://git-scm.com/docs/hash-function-transition/) |
| 43 | + |
| 44 | + |
| 45 | +For now though it's still SHA-1 like. |
| 46 | + |
| 47 | + |
| 48 | +``` |
| 49 | +cd ../github-inclass-brownsarahm/ |
| 50 | +``` |
| 51 | + |
| 52 | + |
| 53 | +Mostly, a shorter version of the commit is sufficient to be unique, so we can use those to refer to commits by just a few characters: |
| 54 | +- minimum 4 |
| 55 | +- must be unique |
| 56 | + |
| 57 | +For most project 7 characters is enough and by default, git will give you 7 digits if you use `--abbrev-commit` and git will automatically use more if needed. |
| 58 | + |
| 59 | +````{margin} |
| 60 | +```{admonition} Further Reading |
| 61 | +[the pro git book](https://git-scm.com/book/en/v2/Git-Tools-Revision-Selection#Short-SHA-1) has more details on this |
| 62 | +``` |
| 63 | +```` |
| 64 | + |
| 65 | +``` |
| 66 | +git log --abbrev-commit --pretty=oneline |
| 67 | +``` |
| 68 | + |
| 69 | +``` |
| 70 | +313c1d7 (HEAD -> main) Revert "c4" |
| 71 | +a47d362 c5 |
| 72 | +94d236b c4 |
| 73 | +7d17fdc c3 |
| 74 | +c1807f4 revert chang 1 |
| 75 | +948eda1 change 2 |
| 76 | +83690bc change 1 |
| 77 | +a399978 (origin/new_feature, new_feature) new feature ready for PR |
| 78 | +3c9980a (origin/main, origin/HEAD) create about |
| 79 | +ec3dd02 Merge pull request #4 from introcompsys/create_readme |
| 80 | +db2e41d (origin/create_readme) Create README.md |
| 81 | +1613072 Initial commit |
| 82 | +``` |
| 83 | + |
| 84 | +git uses the SHA hash primarily for uniuqeness, not privacy |
| 85 | + |
| 86 | + |
| 87 | +It does provide some *security* assurances, because we can check the content |
| 88 | +against the hash to make sure it is what it matches. |
| 89 | + |
| 90 | + |
| 91 | + |
| 92 | + |
| 93 | +We can use git to hash things for us, without writing them: |
| 94 | + |
| 95 | +``` |
| 96 | +echo "learning hashes" | git hash-object --stdin |
| 97 | +``` |
| 98 | + |
| 99 | +``` |
| 100 | +3ba345cea32208d6612e97830fdb0b1ae70ea8bd |
| 101 | +``` |
| 102 | + |
| 103 | + |
| 104 | + |
| 105 | +The SHA-1 digest is 20 bytes or 160 bits, which is 40 characters in hexadecimal. The number of randomly hashed objects needed to ensure a 50% probability of a single collision is about $2^{80}$ (the formula for determining collision probability is p = (n(n-1)/2) * (1/2^160)). $2^{80}$ is 1.2 x 1024 or 1 million billion billion. That’s 1,200 times the number of grains of sand on the earth. |
| 106 | + |
| 107 | +## What is a Number ? |
| 108 | + |
| 109 | + |
| 110 | +a mathematical object used to count, measure and label |
| 111 | + |
| 112 | +## What is a number system? |
| 113 | + |
| 114 | +While numbers represent are quantities that conceptually, exist all over, the numbers themselves are a cultural artifact. For example, we all have a value representing a single item. |
| 115 | + |
| 116 | + |
| 117 | +In modern, western cultures our is a a hindu-arabic |
| 118 | + |
| 119 | +- invented by Hindu mathematicians in India 600 or earlier |
| 120 | +- called "Arabic" numerals in the West because Arab merchants introduced them to Europeans |
| 121 | +- slow adoption |
| 122 | + |
| 123 | +We use a **place based** system. That means that the position or place of the symbol changes its meaning. So 1, 10, and 100 are all different values. This system is also a decimal system, or base 10. So we refer to the places and the ones ($10^0$), the tens ($10^1$), the hundreds($10^2$), etc for all powers of 10. |
| 124 | + |
| 125 | +### Roman Numerals |
| 126 | + |
| 127 | +Not all systems are place based, for example Roman numerals. In this system the symbols are added if they are the same value or decreasing and subtracted if increasing. There are symbols for specific values: 1=I, V=5, X=10, L =50, C = 100, D=500, M = 1000. Then III = 1+1+1 = 3 and IV = -1 + 5 = 4, VI = 5+1 = 6 and XLIX = -10 + 50 -1 +10 = 49. |
| 128 | + |
| 129 | +### Binary |
| 130 | + |
| 131 | +Binary is any base two system, and it can be represented using any different characters. |
| 132 | + |
| 133 | +Binary number systems have origins in ancient cultures: |
| 134 | +- Egypt (fractions) 1200 BC |
| 135 | +- China 9th century BC |
| 136 | +- India 2nd century BC |
| 137 | + |
| 138 | +In computer science we use binary because mechanical computers began using relays (open/closed) to implement logical (boolean) operations and then digital computers use on and off in their circuits. |
| 139 | + |
| 140 | +We represent binary using the same hindu-arabic symbols that we use for other numbers, but only the 0 and 1(the first two). We also keep it as a place-based number system so the places are the ones($2^0$), twos ($2^1$), fours ($2^2$), eights ($2^3$), etc for all powers of 2. |
| 141 | + |
| 142 | +so in binary, the number of characters in the word binary is 110. |
| 143 | + |
| 144 | + |
| 145 | +### Octal |
| 146 | + |
| 147 | +Is base 8. This too has history in other cultures, not only in computer science. It is rooted in cultures that counted using the spaces *between* fingers instead of counting using fingers. |
| 148 | + |
| 149 | + |
| 150 | +[use by native americans from present day CA](https://www.jstor.org/stable/2686959?origin=crossref&seq=1#metadata_info_tab_contents) |
| 151 | + |
| 152 | +and |
| 153 | + |
| 154 | +[ Pamean languages in Mexico ](http://linguistics.berkeley.edu/~avelino/Avelino_2006.pdf) |
| 155 | + |
| 156 | + |
| 157 | +As in binary we use hindu-arabic symbols, 0,1,2,3,4,5,6,7 (the first eight). Then nine is 11. |
| 158 | + |
| 159 | +In computer science we use octal a lot because it reduces every 3 bits of a number in binary to a single character. So for a large number, in binary say `101110001100` we can change to `5614` which is easier to read, for a person. |
| 160 | + |
| 161 | +### Hexadecimal |
| 162 | + |
| 163 | + |
| 164 | +base 16, commonin CS because its 4 bits. we use 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F. |
| 165 | + |
| 166 | +This is how the git hash is 160 bits, or 20 bytes (one byte is 8 bits) but we represent it as 40 characters. 160/4=40. |
| 167 | + |
| 168 | + |
| 169 | +## Review today's class |
| 170 | + |
| 171 | +```{include} ../_review/2022-10-12.md |
| 172 | +``` |
| 173 | + |
| 174 | + |
| 175 | + |
| 176 | +## Prepare for Next Class |
| 177 | + |
| 178 | +```{include} ../_prepare/2022-10-12.md |
| 179 | +``` |
| 180 | + |
| 181 | + |
| 182 | + |
| 183 | +## More Practice |
| 184 | + |
| 185 | +```{include} ../_practice/2022-10-12.md |
| 186 | +``` |
| 187 | + |
| 188 | + |
| 189 | +## Questions After Class |
| 190 | + |
| 191 | + |
| 192 | +### Are git commit hashes predictable/significant in any way? Or is the risk of using a low-bit hash generator negligible for github usage? |
| 193 | + |
| 194 | + |
| 195 | + |
| 196 | +### I am still confused on what TRULY a git hash is. Is it an object? Is it the string to the object? |
| 197 | + |
| 198 | +This is a place where being casual with language makes it hard. A hash, in general is the output of a hashing algorithm. In git, everything that is stored gets hashed and git uses the hashes as the unique values to refer to each stored object. For example, if I take the hash that is the *commit number* from the most recent commit above, I can use `git cat-file` to look at the file, or git object that for the commit. |
| 199 | + |
| 200 | + |
| 201 | +``` |
| 202 | +git cat-file -p 313c1d7 |
| 203 | +``` |
| 204 | + |
| 205 | +``` |
| 206 | +tree 47182af2099fc6075832c13798ee16b713ebb285 |
| 207 | +parent a47d362409bc4ad0cb57e73b0e7a4c1a1a586f43 |
| 208 | +author Sarah M Brown <brownsarahm@uri.edu> 1664849752 -0400 |
| 209 | +committer Sarah M Brown <brownsarahm@uri.edu> 1664849752 -0400 |
| 210 | +
|
| 211 | +Revert "c4" |
| 212 | +
|
| 213 | +This reverts commit 94d236b459d5035ab3f2c2676a888b27cd77e80d. |
| 214 | +``` |
| 215 | +The "contents" of the commit are several things: |
| 216 | +- the hash of the snapshot of the contents as a tree object |
| 217 | +- the hash of the previous commit |
| 218 | +- author information |
| 219 | +- commit message |
| 220 | + |
| 221 | +We can also check the type: |
| 222 | +``` |
| 223 | +git cat-file -t 313c1d7 |
| 224 | +``` |
| 225 | + |
| 226 | +``` |
| 227 | +commit |
| 228 | +``` |
| 229 | + |
| 230 | +We can check the type of the parent to verify it points to the last commit. |
| 231 | + |
| 232 | +``` |
| 233 | +git cat-file -t a47d3624 |
| 234 | +``` |
| 235 | + |
| 236 | +``` |
| 237 | +commit |
| 238 | +``` |
| 239 | + |
| 240 | + |
| 241 | +We can then look at the contents of the tree as well, using its has has the file name to first check the type of |
| 242 | +``` |
| 243 | +git cat-file -t 47182af |
| 244 | +``` |
| 245 | + |
| 246 | +``` |
| 247 | +tree |
| 248 | +``` |
| 249 | +then display: |
| 250 | + |
| 251 | +``` |
| 252 | +git cat-file -p 47182af |
| 253 | +``` |
| 254 | + |
| 255 | +``` |
| 256 | +040000 tree 95b60ce8cdec1bc4e1df1416e0c0e6ecbd3e7a8c .github |
| 257 | +100644 blob bcc1d9287b5cae329629cf3e0779b065b72dad7a README.md |
| 258 | +100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 about.md |
| 259 | +100644 blob 96d35c05b19fc42bc44153b8a97512001c37a09e new_feature.md |
| 260 | +``` |
| 261 | +this is all of the contents of the directory at the time of that commit, one blob object for each file and one tree object for the single directory. |
| 262 | + |
| 263 | +We can also look at a file |
| 264 | + |
| 265 | +``` |
| 266 | +git cat-file -p bcc1d9 |
| 267 | +``` |
| 268 | + |
| 269 | +``` |
| 270 | +# github-inclass-brownsarahm |
| 271 | +github-inclass-brownsarahm created by GitHub Classroom |
| 272 | +``` |
| 273 | +and verify it is the same contents as on disk |
| 274 | + |
| 275 | +```bash |
| 276 | +cat README.md |
| 277 | +``` |
| 278 | + |
| 279 | +``` |
| 280 | +# github-inclass-brownsarahm |
| 281 | +github-inclass-brownsarahm created by GitHub Classroom |
| 282 | +(base) brownsarahm@github-inclass-brownsarahm $ |
| 283 | +``` |
| 284 | + |
| 285 | + |
| 286 | + |
| 287 | +### What are uses of Octal |
| 288 | + |
| 289 | +Octal is convenient *because* it is consistent with binary representations easily. 1 place in octal is 3 bits (and 4 in hexadecimal is 4 bits). |
| 290 | + |
| 291 | +We will see it for file representations, because they have 3 parts, each of which can be on or off (ie it's 3 bits of information). |
| 292 | + |
| 293 | +### What is the most interesting number system created? |
| 294 | + |
| 295 | +That is a good thing to explore, but interesting is relative. |
| 296 | + |
| 297 | +### the `-w` option to `git hash-object` writes the hash number to a file? |
| 298 | + |
| 299 | +No, it writes the *content* to a file named based on the hash in the `.git/objects` folder. |
| 300 | + |
| 301 | + |
| 302 | +### With the information given to us last class, is it possible to complete the work for gitplumbingreview.md and gitplumbingdetail.md? |
| 303 | + |
| 304 | +Do what you can and that task will come back after next class. |
| 305 | + |
| 306 | +```{important} |
| 307 | +I revised the text of the the activities as posted here relative to what I posted in class to clarify |
| 308 | +``` |
0 commit comments