Skip to content

Commit 2aa6d7d

Browse files
committed
10/12 notes and toc link
1 parent 662130d commit 2aa6d7d

File tree

5 files changed

+333
-0
lines changed

5 files changed

+333
-0
lines changed

_practice/2022-10-12.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
1. Learn more about how git is working on changing from SHA-1 to SHA-256 and answer the transition questions below)
2+
1. Research the use of non base 10 numbers systems in another culture and contribute one to your group repo
3+
1. (priority) In a language of your choice or pseudocode, write a short program to convert, without using libraries, between all pairs of (binary, decimal, hexidecimal) in `numbers.md`. Test your code, but include in the markdown file enclosed in three backticks so that it is a "code block" write the name of the language after the ticks like:
4+
5+
````
6+
```python
7+
# python code
8+
```
9+
10+
````
11+
12+
```
13+
## transition questions
14+
1. Why make the switch?
15+
2. Learn more about one collision
16+
3. What impact will the swith have on how git works?
17+
4. If you have scripts that operate on git repos, what might you do to prepare for the switch?
18+
```

_prepare/2022-10-12.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
1. Bring to class a scenario where you think a small command line program or bash script could be useful. A command line program is a program that we execute on the command line. For example the courseutils kwlcheck is one I wrote.
2+
1. Bring one scenario in git that you have seen or anticipate that we have not seen the solution for

_review/2022-10-12.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
1. review the notes
2+
1. find 2 more real world examples of using other number systems (either different bases or different symbols and bases) that are current. Describe them in `numbers.md`
3+
1. Read about [hexpeak](https://en.wikipedia.org/wiki/Hexspeak) from Wikipedia for an overview and one additional source in your kwl repo in `hexspeak.md`. Come up with a word or two on your own.
4+
1. Create a single branch for all of the work related to today's class in your KWL

_toc.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ parts:
2525
- file: notes/2022-09-28
2626
- file: notes/2022-10-03
2727
- file: notes/2022-10-05
28+
- file: notes/2022-10-12
2829
- caption: Activities
2930
chapters:
3031
- file: activities/kwl

notes/2022-10-12.md

Lines changed: 308 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,308 @@
1+
# What are git hashes and why are they alaphanumeric?
2+
3+
4+
## What is a hash?
5+
6+
- a hash is a fixed size value that can be used to represent data of arbitrary sizes
7+
- the *output* of a hashing function
8+
- often fixed to a hash table
9+
10+
Common examples of hashing are lookup tables and encryption with a cyrptographic hash.
11+
12+
A cyrptographic hash is additionally:
13+
- unique
14+
- not reversible
15+
- similar inputs hash to very different values so they appear uncorrelated
16+
17+
18+
Hashes can then be used for a lot of purposes:
19+
- message integrity (when sending a message, the unhashed message and its hash are both sent; the message is real if the sent message can be hashed to produce the same has)
20+
- password verification (password selected by the user is hashed and the hash is stored; when attempting to login, the input is hashed and the hashes are compared)
21+
- file or data identifier (eg in git)
22+
23+
24+
25+
## Hashing in Git
26+
27+
28+
in git, 40 characters that uniquely represent either each object and are used as the key to retrieve the object as a value. Recall there are multiple types of objects: tree, commit, blob.
29+
30+
31+
32+
33+
Git as originally designed to use SHA-1
34+
35+
SHA-1 is weak. Git switched to hardened HSA-1 in response to a collision
36+
37+
> In that case it adjusts the SHA-1 computation to result in a safe hash. This means that it will compute the regular SHA-1 hash for files without a collision attack, but produce a special hash for files with a collision attack, where both files will have a different unpredictable hash.
38+
[from](https://crypto.stackexchange.com/questions/44141/what-is-hardened-sha-1-how-does-it-work-and-how-much-protection-does-it-offer).
39+
40+
Learn more about the [SHA-1 collision attach](https://shattered.io/)
41+
42+
[they will change again soon](https://git-scm.com/docs/hash-function-transition/)
43+
44+
45+
For now though it's still SHA-1 like.
46+
47+
48+
```
49+
cd ../github-inclass-brownsarahm/
50+
```
51+
52+
53+
Mostly, a shorter version of the commit is sufficient to be unique, so we can use those to refer to commits by just a few characters:
54+
- minimum 4
55+
- must be unique
56+
57+
For most project 7 characters is enough and by default, git will give you 7 digits if you use `--abbrev-commit` and git will automatically use more if needed.
58+
59+
````{margin}
60+
```{admonition} Further Reading
61+
[the pro git book](https://git-scm.com/book/en/v2/Git-Tools-Revision-Selection#Short-SHA-1) has more details on this
62+
```
63+
````
64+
65+
```
66+
git log --abbrev-commit --pretty=oneline
67+
```
68+
69+
```
70+
313c1d7 (HEAD -> main) Revert "c4"
71+
a47d362 c5
72+
94d236b c4
73+
7d17fdc c3
74+
c1807f4 revert chang 1
75+
948eda1 change 2
76+
83690bc change 1
77+
a399978 (origin/new_feature, new_feature) new feature ready for PR
78+
3c9980a (origin/main, origin/HEAD) create about
79+
ec3dd02 Merge pull request #4 from introcompsys/create_readme
80+
db2e41d (origin/create_readme) Create README.md
81+
1613072 Initial commit
82+
```
83+
84+
git uses the SHA hash primarily for uniuqeness, not privacy
85+
86+
87+
It does provide some *security* assurances, because we can check the content
88+
against the hash to make sure it is what it matches.
89+
90+
91+
92+
93+
We can use git to hash things for us, without writing them:
94+
95+
```
96+
echo "learning hashes" | git hash-object --stdin
97+
```
98+
99+
```
100+
3ba345cea32208d6612e97830fdb0b1ae70ea8bd
101+
```
102+
103+
104+
105+
The SHA-1 digest is 20 bytes or 160 bits, which is 40 characters in hexadecimal. The number of randomly hashed objects needed to ensure a 50% probability of a single collision is about $2^{80}$ (the formula for determining collision probability is p = (n(n-1)/2) * (1/2^160)). $2^{80}$ is 1.2 x 1024 or 1 million billion billion. That’s 1,200 times the number of grains of sand on the earth.
106+
107+
## What is a Number ?
108+
109+
110+
a mathematical object used to count, measure and label
111+
112+
## What is a number system?
113+
114+
While numbers represent are quantities that conceptually, exist all over, the numbers themselves are a cultural artifact. For example, we all have a value representing a single item.
115+
116+
117+
In modern, western cultures our is a a hindu-arabic
118+
119+
- invented by Hindu mathematicians in India 600 or earlier
120+
- called "Arabic" numerals in the West because Arab merchants introduced them to Europeans
121+
- slow adoption
122+
123+
We use a **place based** system. That means that the position or place of the symbol changes its meaning. So 1, 10, and 100 are all different values. This system is also a decimal system, or base 10. So we refer to the places and the ones ($10^0$), the tens ($10^1$), the hundreds($10^2$), etc for all powers of 10.
124+
125+
### Roman Numerals
126+
127+
Not all systems are place based, for example Roman numerals. In this system the symbols are added if they are the same value or decreasing and subtracted if increasing. There are symbols for specific values: 1=I, V=5, X=10, L =50, C = 100, D=500, M = 1000. Then III = 1+1+1 = 3 and IV = -1 + 5 = 4, VI = 5+1 = 6 and XLIX = -10 + 50 -1 +10 = 49.
128+
129+
### Binary
130+
131+
Binary is any base two system, and it can be represented using any different characters.
132+
133+
Binary number systems have origins in ancient cultures:
134+
- Egypt (fractions) 1200 BC
135+
- China 9th century BC
136+
- India 2nd century BC
137+
138+
In computer science we use binary because mechanical computers began using relays (open/closed) to implement logical (boolean) operations and then digital computers use on and off in their circuits.
139+
140+
We represent binary using the same hindu-arabic symbols that we use for other numbers, but only the 0 and 1(the first two). We also keep it as a place-based number system so the places are the ones($2^0$), twos ($2^1$), fours ($2^2$), eights ($2^3$), etc for all powers of 2.
141+
142+
so in binary, the number of characters in the word binary is 110.
143+
144+
145+
### Octal
146+
147+
Is base 8. This too has history in other cultures, not only in computer science. It is rooted in cultures that counted using the spaces *between* fingers instead of counting using fingers.
148+
149+
150+
[use by native americans from present day CA](https://www.jstor.org/stable/2686959?origin=crossref&seq=1#metadata_info_tab_contents)
151+
152+
and
153+
154+
[ Pamean languages in Mexico ](http://linguistics.berkeley.edu/~avelino/Avelino_2006.pdf)
155+
156+
157+
As in binary we use hindu-arabic symbols, 0,1,2,3,4,5,6,7 (the first eight). Then nine is 11.
158+
159+
In computer science we use octal a lot because it reduces every 3 bits of a number in binary to a single character. So for a large number, in binary say `101110001100` we can change to `5614` which is easier to read, for a person.
160+
161+
### Hexadecimal
162+
163+
164+
base 16, commonin CS because its 4 bits. we use 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F.
165+
166+
This is how the git hash is 160 bits, or 20 bytes (one byte is 8 bits) but we represent it as 40 characters. 160/4=40.
167+
168+
169+
## Review today's class
170+
171+
```{include} ../_review/2022-10-12.md
172+
```
173+
174+
175+
176+
## Prepare for Next Class
177+
178+
```{include} ../_prepare/2022-10-12.md
179+
```
180+
181+
182+
183+
## More Practice
184+
185+
```{include} ../_practice/2022-10-12.md
186+
```
187+
188+
189+
## Questions After Class
190+
191+
192+
### Are git commit hashes predictable/significant in any way? Or is the risk of using a low-bit hash generator negligible for github usage?
193+
194+
195+
196+
### I am still confused on what TRULY a git hash is. Is it an object? Is it the string to the object?
197+
198+
This is a place where being casual with language makes it hard. A hash, in general is the output of a hashing algorithm. In git, everything that is stored gets hashed and git uses the hashes as the unique values to refer to each stored object. For example, if I take the hash that is the *commit number* from the most recent commit above, I can use `git cat-file` to look at the file, or git object that for the commit.
199+
200+
201+
```
202+
git cat-file -p 313c1d7
203+
```
204+
205+
```
206+
tree 47182af2099fc6075832c13798ee16b713ebb285
207+
parent a47d362409bc4ad0cb57e73b0e7a4c1a1a586f43
208+
author Sarah M Brown <brownsarahm@uri.edu> 1664849752 -0400
209+
committer Sarah M Brown <brownsarahm@uri.edu> 1664849752 -0400
210+
211+
Revert "c4"
212+
213+
This reverts commit 94d236b459d5035ab3f2c2676a888b27cd77e80d.
214+
```
215+
The "contents" of the commit are several things:
216+
- the hash of the snapshot of the contents as a tree object
217+
- the hash of the previous commit
218+
- author information
219+
- commit message
220+
221+
We can also check the type:
222+
```
223+
git cat-file -t 313c1d7
224+
```
225+
226+
```
227+
commit
228+
```
229+
230+
We can check the type of the parent to verify it points to the last commit.
231+
232+
```
233+
git cat-file -t a47d3624
234+
```
235+
236+
```
237+
commit
238+
```
239+
240+
241+
We can then look at the contents of the tree as well, using its has has the file name to first check the type of
242+
```
243+
git cat-file -t 47182af
244+
```
245+
246+
```
247+
tree
248+
```
249+
then display:
250+
251+
```
252+
git cat-file -p 47182af
253+
```
254+
255+
```
256+
040000 tree 95b60ce8cdec1bc4e1df1416e0c0e6ecbd3e7a8c .github
257+
100644 blob bcc1d9287b5cae329629cf3e0779b065b72dad7a README.md
258+
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 about.md
259+
100644 blob 96d35c05b19fc42bc44153b8a97512001c37a09e new_feature.md
260+
```
261+
this is all of the contents of the directory at the time of that commit, one blob object for each file and one tree object for the single directory.
262+
263+
We can also look at a file
264+
265+
```
266+
git cat-file -p bcc1d9
267+
```
268+
269+
```
270+
# github-inclass-brownsarahm
271+
github-inclass-brownsarahm created by GitHub Classroom
272+
```
273+
and verify it is the same contents as on disk
274+
275+
```bash
276+
cat README.md
277+
```
278+
279+
```
280+
# github-inclass-brownsarahm
281+
github-inclass-brownsarahm created by GitHub Classroom
282+
(base) brownsarahm@github-inclass-brownsarahm $
283+
```
284+
285+
286+
287+
### What are uses of Octal
288+
289+
Octal is convenient *because* it is consistent with binary representations easily. 1 place in octal is 3 bits (and 4 in hexadecimal is 4 bits).
290+
291+
We will see it for file representations, because they have 3 parts, each of which can be on or off (ie it's 3 bits of information).
292+
293+
### What is the most interesting number system created?
294+
295+
That is a good thing to explore, but interesting is relative.
296+
297+
### the `-w` option to `git hash-object` writes the hash number to a file?
298+
299+
No, it writes the *content* to a file named based on the hash in the `.git/objects` folder.
300+
301+
302+
### With the information given to us last class, is it possible to complete the work for gitplumbingreview.md and gitplumbingdetail.md?
303+
304+
Do what you can and that task will come back after next class.
305+
306+
```{important}
307+
I revised the text of the the activities as posted here relative to what I posted in class to clarify
308+
```

0 commit comments

Comments
 (0)