Ora

What is dirty table in Informatica MDM?

Published in MDM Data Concepts 4 mins read

A dirty table in Informatica MDM is a crucial temporary staging area that holds references to base object records for which match tokens need to be generated or regenerated. It serves as a queue for the Informatica MDM Tokenize process.

Understanding the Dirty Table in Informatica MDM

The core function of a dirty table is to facilitate the efficient generation of match tokens, which are essential for identifying duplicate records and linking related data within the Master Data Management (MDM) Hub.

Key Characteristics

  • Temporary Nature: It is not a permanent data store but a transient work area.
  • ROWID_OBJECT Column: The dirty table contains a ROWID_OBJECT column. This column uniquely identifies the specific base object records that require match token generation.
  • Input for Tokenization: It acts as the primary input for the Informatica MDM Tokenize process.
  • Cleaned Up After Processing: For each unique ROWID_OBJECT entry, the Tokenize process generates match tokens, and then the dirty table entry is cleaned up (removed). This ensures that the table only contains records still pending tokenization.

How Data Enters the Dirty Table

Records typically get marked as "dirty" and their ROWID_OBJECT inserted into the dirty table when:

  • New Records are Loaded: When fresh data is onboarded into an MDM Base Object.
  • Existing Records are Updated: If an existing record in a Base Object is modified, especially if the changes affect attributes that are part of the matching rules (e.g., name, address, date of birth).
  • System Processes: Certain MDM batch jobs or integrations might flag records for re-tokenization.

The Tokenize Process and Dirty Tables

The dirty table is an integral part of the Tokenize process, one of the fundamental processes in Informatica MDM. This process involves:

  1. Identifying Dirty Records: The Tokenize process continuously monitors the dirty table for new entries.
  2. Retrieving Base Object Data: For each ROWID_OBJECT found in the dirty table, the process fetches the corresponding record's data from the relevant base object.
  3. Generating Match Tokens: Based on the configured match rules for that base object, the system generates unique match tokens. These tokens are essentially condensed, encoded representations of the record's matching attributes.
  4. Storing Match Tokens: The generated match tokens are then stored in the match key table associated with the base object. The match key table is where all match tokens reside, used later by the Match process to find potential duplicates.
  5. Cleaning the Dirty Table: Once the match tokens have been successfully generated and stored in the match key table, the ROWID_OBJECT entry is removed from the dirty table.

This lifecycle ensures that the MDM Hub's matching capabilities are always up-to-date with the latest data changes, without re-processing records unnecessarily.

For more detailed information, refer to the Informatica documentation on the dirty table.

Practical Insight

Consider a customer master data scenario. If a new customer record arrives or an existing customer's address is updated, the ROWID_OBJECT for that customer is placed in the dirty table. The Tokenize process then uses this entry to generate or update the customer's match tokens. These tokens are crucial for finding other customers with similar names, addresses, or other identifying information to prevent duplicate entries and ensure a single, accurate view of the customer.

Summary Table: Dirty Table at a Glance

Feature Description
Purpose To queue base object records that need match token generation/regeneration.
Key Column ROWID_OBJECT (identifies the base object record).
Associated Process Informatica MDM Tokenize Process.
Input Source Data changes (inserts, updates) in Base Objects.
Output Destination Match Key Table (stores the generated match tokens).
State Temporary; entries are removed upon successful processing.