Ora

What is Git object file?

Published in Git Internals 4 mins read

A Git object file is a fundamental building block of Git, representing every piece of data in your repository, from file contents and directory structures to commit history. These files are stored in a content-addressable filesystem within the .git/objects directory, meaning each object's name (a 40-character SHA-1 hash) is derived directly from its content. This unique identifier ensures data integrity and efficient storage.

The Four Core Git Object Types

Git categorizes all data into four primary object types, each serving a distinct purpose in reconstructing your project's history and structure:

Object Type Description
Blob Stores the raw content of a file. It's a "Binary Large Object" and contains no metadata like filename, path, or permissions. If two files have identical content, Git efficiently stores only one blob object for both.
Tree Represents a directory. It is an object (a file, really) which contains a list of pointers (SHA-1 hashes) to other blobs (files) or trees (subdirectories). Each line in the tree object's file contains a pointer to one such object (tree or blob), while also providing the mode, object type, and a name for the file or directory, effectively mapping out a directory's contents at a given point in time.
Commit A snapshot of your project at a specific time. It points to a single tree object (representing the entire project's directory structure at that moment), one or more parent commit(s) (linking to previous history), the author and committer information, timestamps, and the commit message.
Tag Marks specific points in history as important, usually pointing to a commit. Tags can be lightweight (just a pointer to a commit) or annotated (a full object with a message, author, date, and optional GPG signature).

How Git Uses Object Files

Git's unique object model provides several key benefits:

  • Data Integrity: Because object names are SHA-1 hashes of their content, any accidental or malicious alteration to an object will change its hash, immediately signaling corruption.
  • Efficiency: Git avoids storing duplicate content. If the same file (or even parts of files via delta compression in packfiles) exists in multiple versions or branches, it only stores the unique blob once.
  • Immutability: Once an object is created, it cannot be changed. Any modification to a file results in a new blob object, and consequently, new tree and commit objects to reference it. This immutability is fundamental to Git's ability to track history reliably.

Practical Insight: Inspecting Git Objects

You can directly inspect Git objects using the git cat-file command. This is invaluable for understanding the internal workings of your repository.

Common git cat-file options:

  • -t: Shows the type of the object.
  • -p: Pretty-prints the content of the object.
  • -s: Shows the size of the object.

Example Scenario: Tracing a Commit's Contents

Let's assume you have a commit hash, for instance, f00b41c....

  1. Find the object type of the commit:

    git cat-file -t f00b41c

    Output: commit

  2. Inspect the content of the commit object:

    git cat-file -p f00b41c

    This will output something like:

    tree a1b2c3d4e5f67890abcdef1234567890abcdef
    parent c9d8e7f6a5b4c3d2e1f0987654321fedcba98765
    author John Doe <[email protected]> 1678886400 +0000
    committer John Doe <[email protected]> 1678886400 +0000
    
    Initial project setup

    Notice the tree line, which points to the hash of the tree object representing the project's state at this commit.

  3. Inspect the content of the associated tree object:
    Using the tree hash from the previous step (e.g., a1b2c3d4e5f6...):

    git cat-file -p a1b2c3d4e5f6

    Output might look like:

    100644 blob 0a1b2c3d4e5f67890abcdef1234567890abcd README.md
    040000 tree 1e2f3a4b5c6d7e8f90a1b2c3d4e5f67890123456 src

    This shows that the tree contains a file named README.md (a blob) and a directory named src (another tree).

By understanding Git object files, you gain a deeper appreciation for how Git efficiently manages version control and maintains the integrity of your codebase.