Medallion Architecture
Medallion architecture is a data storage solution that uses a tiered approach to manage data within a Lakehouse, with the bronze layer representing raw data, the silver layer representing transformed or aggregated data, and the gold layer representing data that is ready for consumption or analysis.
Categories:
less than a minute
These are just quick n dirty notes, will fill this out with more substance another time.
Medallion Architecture
Bronze
- Append only
- Retain everything
- Soft-deletes if necessary
- Hard-deletes if required by regulatory reasons
- Don’t parse the underlying data,
Silver
- Ease of query
- Clean data
- ACID transactions
- Enterprise data model
- 3rd normal form
- De-normalisation for performance reasons
- Uses Delta Lake tables
- Preserves grain of original data (no aggregation)
- Eliminates duplicate records
- Enforce production schema
Gold Layer
- Designed for a particular user community
- Reduces costs associated with ad hoc queries
- Allows fine grained permissions
- Power ML applications, reporting, dashboards, ad hoc analytics
- Shifts query updates to production workloads
Last modified January 4, 2023: More scaffolding... (bb549c8)