What is the difference between ETL and ELT?
- With traditional ETL processing, data is extracted and loaded into a centralized data repository (typically located on the ETL server itself)
- It is then transformed by the ETL server process, then moved yet again to its final target.
- This is obviously processor intensive (servers needed to be very powerful to perform this kind of work), and network intensive (the data needs to be moved 2x over the network for each load).
- ELT taps into vast processing potential of MPP database engines and distributed frameworks like Hadoop, by taking the transformation process to where the data lives. Exploring this processing potential is the primary goal of the Extract Load Transform (ELT) methodology.
- Here first the data is extracted and loaded into MPP platform. These loads can be performed using bulk loading utilities so that load can be completed as quickly as possible.
- Once the data is in, the transformations are carried out natively within these powerful engines using native function libraries.
- This avoids setting up separate servers for processing the data and reduces the IO.