What are Git Snapshots

· 2 min read

The more I use Git, the more I like it, but some underlying knowledge isn’t systematic or comprehensive, such as what snapshots are and how Git stores these snapshots. Regarding these mechanisms, I’m documenting them here.

Snapshot Recording

  • During each commit, Git scans all files in the repository. If a file changes, it generates a new Blob binary file containing the complete content of the file at the time of commit. If a file doesn’t change, it records a link pointing to the previously stored file

  • Each commit itself has an index storage, using this index to locate both changed and unchanged files

The diagram below helps understand the repository state for each snapshot (version)

Snapshot Storage

Knowing the snapshot recording strategy, where are snapshots stored? ======> In the .git hidden folder

There are many items in the file, here we only care about the locations storing historical snapshots, namely index and objects. For other parts, it’s recommended to consult Pro Git for understanding

To understand how Git specifically stores data, let’s initialize an empty project

mkdir git-demo & cd git-demo & git init & echo "just a demo"> README.md

Start executing git operations

git add

When executing git add ., the index file stores the index of files to be committed. To view the index, you need to use the low-level command git ls-files -s

$ git ls-files -s
100644 a730a28e53d8defdda8fe953829afdfc906e463a 0	README.md

Note: Because it’s a binary file, you can’t view it directly as text, only as shown above. You can see the index file records the file name README.md and the blob file name a730a28e53d8defdda8fe953829afdfc906e463a stored in the Git file system, which is a 40-character SHA-1 value

The specific blob files are stored in .git/objects. Note that the first two characters a7 are the folder name, and the remaining 38 characters are the file name.

At this point, you can use git cat-file -p a730a2 to view the complete content of the committed file.

$ git cat-file -p a730a2
just a demo

git commit

When executing git commit -m 'init readme', after successfully committing to the local repository

$ git commit -m 'init readme'
[master (root-commit) 79821c6] init readme
 1 file changed, 1 insertion(+)
 create mode 100644 README.md

Looking again at the .git/objects directory, you’ll find two additional folders

$  ll .git/objects/
total 0
drwxr-xr-x  3 qhe  staff    96B Dec 20 22:22 4e
drwxr-xr-x  3 qhe  staff    96B Dec 20 22:22 79
drwxr-xr-x  3 qhe  staff    96B Dec 20 16:24 a7
drwxr-xr-x  2 qhe  staff    64B Dec 20 16:24 info
drwxr-xr-x  2 qhe  staff    64B Dec 20 16:24 pack

Among these, 79 records the content of this commit, while 4e records a tree object that stores file names and other information related to the commit

$ git cat-file -t 4edb6d
tree

At this point, we roughly understand how daily Git operations record and store these individual node snapshots.

Memory Usage

As mentioned above, changed files are stored in their entirety as binary files. In the long run, this would consume significant memory. Git has optimizations for this

Git balances time and space utilization for optimized storage. It saves the complete file for the current latest version, while for older or infrequently used versions, it only stores diffs. This achieves a certain balance between storage space and read/load speed.

Summary

Listing the Git diagram for our daily basic operations

  • During git add, files are stored in the staging area index/objects files
  • During git commit, files are stored in the local repository, i.e., objects files

Of course, during git push, these objects are sent to the upstream server, but remember that Git is distributed, so the upstream and our local content are actually the same.

Final Thoughts

  • Git feels simple yet very powerful, which should be a characteristic of excellent software design.
  • Understanding these underlying principles of Git helps use Git more efficiently, and also provides some reference significance for problems encountered in daily development, such as the storage strategy mentioned above.

Reference Documentation