Ora

What do you mean by shadow paging?

Published in Database Recovery Techniques 6 mins read

Shadow paging is a robust technique used primarily in database management systems and file systems to ensure the atomicity and durability of data modifications. It operates on a copy-on-write principle, meaning that updates are never performed directly on the original data pages. Instead, a new, "shadow" copy of the modified data is created and written to a separate location on the disk.

Understanding the Core Concept

The fundamental idea behind shadow paging is to avoid in-place updates to data pages, which are units of physical storage (typically ranging from 1 to 64 KiB). This approach guarantees that the system always has a consistent view of the data, even if a failure occurs during an update.

Shadow paging relies on two main components:

  • Current Page Table: This table acts as the primary directory, holding pointers to all the physical disk locations of the currently valid data pages. It represents the last committed state of the database or file system.
  • Shadow Page Table: This is a temporary copy of the Current Page Table. When changes are made, new or modified pages are written, and their addresses are recorded in this Shadow Page Table.

How Shadow Paging Works: A Step-by-Step Process

The process ensures that a transaction, which is a sequence of operations, either completes entirely (commits) or has no effect (rolls back), maintaining data integrity even in the face of system crashes.

  1. Initial State: The system begins with a Current Page Table that points to all the existing, valid data pages on the storage.
  2. Modification Request: When a request is made to modify data on a specific page, the system initiates the shadow paging mechanism.
  3. Copy-on-Write Execution:
    • A duplicate of the Current Page Table is created, becoming the Shadow Page Table.
    • Instead of overwriting the original page, a new physical page (a "shadow page") is allocated on the disk.
    • The contents of the original page are copied to this newly allocated shadow page.
    • The desired modifications are then applied only to this new shadow page.
    • The entry in the Shadow Page Table corresponding to that logical page is updated to point to the physical location of this newly modified shadow page. Pages that are not modified continue to be referenced from the Shadow Page Table just as they were in the Current Page Table.
  4. Committing the Changes:
    • Once all modifications for a transaction are complete and written to their respective shadow pages, the system is ready to commit.
    • The commit operation involves a single, atomic action: the pointer that references the Current Page Table is simply updated to point to the newly constructed Shadow Page Table. This single pointer change effectively makes all the new pages and their changes visible and permanent simultaneously.
    • The old Current Page Table (and the original data pages it referenced) becomes obsolete and can be marked for garbage collection.
  5. Rollback (if necessary): If a transaction fails or is aborted before the commit step, the Shadow Page Table is simply discarded. Since the Current Page Table was never altered, the original, consistent state of the data remains intact, requiring no complex undo operations.

Key Components of Shadow Paging

Component Description
Page A fixed-size block of physical storage (e.g., 1-64 KiB) that holds data. This is the granular unit of modification in shadow paging.
Current Page Table A data structure, typically residing on disk, that maps logical page identifiers to their current physical disk addresses. It represents the live, committed state of the data.
Shadow Page Table A temporary, in-memory copy of the Current Page Table used during a transaction. It accumulates pointers to newly written or modified pages, preparing for the commit.
Copy-on-Write The core principle where data modifications are performed on newly allocated copies of pages, preserving the original data until a commit operation formally switches the pointers.
Atomic Commit The critical final step where the system's pointer to the "current" page table is atomically updated to point to the Shadow Page Table, making all transaction changes durable and visible in one fell swoop.

Advantages of Shadow Paging

Shadow paging provides several significant advantages, particularly for systems requiring high data integrity and simplified recovery:

  • Simplified Crash Recovery: In the event of a system crash, recovery is straightforward. If the crash occurs before the commit, the Current Page Table remains untouched, and the incomplete Shadow Page Table is simply discarded. No complex undo or redo operations are necessary.
  • Guaranteed Atomicity: All changes within a transaction are either fully applied or entirely discarded, ensuring data consistency. The atomic switch of the page table pointer is the key to this guarantee.
  • No Redo/Undo Logging: Unlike other recovery schemes that rely on extensive transaction logs for recovery, shadow paging eliminates the need for these complex logging mechanisms, which can reduce write overhead during normal operation.

Disadvantages of Shadow Paging

Despite its strengths, shadow paging also has some drawbacks:

  • Garbage Collection Overhead: After a successful commit, the pages that were part of the old Current Page Table become obsolete ("garbage"). These old pages must be identified and reclaimed, which introduces complexity and potential overhead for garbage collection.
  • Storage Fragmentation: The continuous allocation of new physical pages can lead to data being scattered across the disk, potentially causing storage fragmentation and impacting subsequent read performance.
  • Increased Disk Space Usage: Temporarily, more disk space is consumed because both the old versions of pages and their new shadow copies coexist until the old pages are garbage collected.
  • Loss of Data Locality: Related data that was physically contiguous might become dispersed across the disk due to the allocation of new pages, potentially affecting I/O efficiency.

Practical Applications

Shadow paging's underlying principles are applied in various computer science domains where robust transaction management and crash recovery are critical:

  • Database Management Systems (DBMS): Many transactional databases use shadow paging, sometimes in combination with logging, to ensure the ACID properties (Atomicity, Consistency, Isolation, Durability) of transactions.
  • File Systems: Some modern file systems, such as ZFS and Btrfs, employ copy-on-write (CoW) mechanisms that are conceptually very similar to shadow paging. This enables features like snapshots, data integrity checks, and improved crash resilience.
  • Virtual Machine Snapshots: The ability to create snapshots of virtual machine disk states often leverages copy-on-write techniques, allowing a base disk image to remain unchanged while subsequent writes go to a new, differential disk file.

In essence, shadow paging provides a simple yet powerful method for achieving data consistency and resilience in persistent storage systems by fundamentally changing how data updates are performed.