A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale. A data transaction is a series of data exchanges that are conducted in a single operation. For example, when a customer withdraws money from a bank account, the bank conducts several data exchanges at the same time in one data transaction, including verifying the account has sufficient balance, verifying identity, and debiting the withdrawal from the account. A transactional data lake is a type of data lake that not only stores data at scale but also supports transactional operations and ensures that data is accurate, consistent, and allows you to track how data and data structure changes over time. These properties are collectively known as Atomicity, Consistency, Isolation, and Durability (ACID):
- Atomicity guarantees that each transaction is a single event that either succeeds or fails completely; there is no half-way status.
- Consistency ensures that all data written is valid according to the defined rules of the data lake, ensuring that data is accurate and reliable.
- Isolation ensures multiple transactions can occur at the same time without interfering with each other, ensuring that each transaction executes independently.
- Durability means that data is not lost or corrupted once a transaction is submitted. Data can be recovered in the event of a system failure, such as a power outage.