ETL vs ELT

Extract, Transform and Load (where the transform phase and load phase can be swapped) refers to processes for organising a data pipeline into conceptual or actual phases or stages.

Categories:

Data

2 minute read

WIP: Incomplete.

We worked on this for the Thinkathon in 2023 Aginic Bootcamp and while it was an entertaining exercise I felt unsatisfied with the brief overview we were able to cover. I personally would like a more thorough analysis of the differences, reasons and best practices regarding the topic.

What do you think? Leave a comment below if you feel this would be worthwhile.

Extract Transform Load vs Extract Load Transform

Extract Transform Load (ETL)

Has been used by data engineers since the 1970s. ¹
Primarily used for relational data warehouses where the sources may be varied but the destination only supports structured data.

Pros

Cheaper storage
The extract and transform stages are typically combined, allowing for faster access to insights because the data is immediately available once it is loaded

Cons / Challenges

Potential loss of data, when transformation fails on highly volatile data
If the source data is changing, it can be impossible to re-run the process on the original data when a problem is discovered changes
Multiple downstream sources might need same source data, increasing pressure on source system

Extract Load Transform (ELT)

It’s probably more like ELTLTLTL

Pros

Can be more robust as replicating data is less complex than transforming and so if the pipeline has problems in the transformation stage, it can be rerun from the raw data without disturbing
Can re-run pipelines on the same data if anything goes wrong
Process is more staged

Cons / Challenges

Increased storage cost, keeping raw data
Governance (GDPR / HIPAA / CCPA) can be more difficult to do right:
- TODO: Re-find the article that made this argument ² ³ ⁴ ⁵ ⁶

References

Last modified January 10, 2023: whoops (6fdf6e0)