The data you extract from the CRM says “female” or “male”, which your algorithm would not understand. To avoid confusing timestamps, you convert events from both Mario and Jessica to UTC (3 pm).Įxample 3: You are building a machine learning classifier that helps you determine which customers are more likely to repurchase. But Mario plays it in Italy at 5 pm, while Jessica plays it in New York at 11 am (both local time). For example, Mario and Jessica are playing each other in your app’s game. Before extracting and saving events to your event_raw table, you convert local time zones to UTC timestamps. The data you collect is time-stamped with the local timezone of each user. It takes on many forms such as unifying capitalizations, converting strings to the same encoding standard, synchronizing date and time values, aligning units across different granulations, etc.Įxample 1: Take product names from your eCommerce shop across all language variants and translate them into English before inserting them into the product_details table.Įxample 2: You are running an app and are interested in usage analytics. Data mapping (also called translation or conversion), takes one data input and changes it to its equivalent in another format. Data Mappingĭata mapping is one of the most common types of data transformations across all operations and industries. Certain columns or rows are filtered out before saving or inserting the data into the data storage.Įxample: For the table sales_2021, you filter out all data from orders that were placed before. Data Filteringĭata filtering is one of the simplest transformations. 8 Types of data transformationsĭepending on the type of transformation you employ to clean your data, you could be using one of the prototypes of data transformation. Recommended read: Complete ETL process overview. Irrespective of the architecture of your choice, all etl transformations can be categorized into a couple of prototypical types, which we are going to breakdown in this blog. Only afterward the data is transformed.īoth paradigms have advantages and shortcomings and are better thought of as two strategies for different DataOps challenges.ĮTL is usually better suited for smaller data pipelines, while ELT is the go-to design pattern for big data. Raw data is extracted from the source system and loaded into a target data warehouse (e.g. ELT (Notice the L before the T): The data loading happens before the transformations.ETL: The traditional process of extracting data to the staging area, where it is transformed before being loaded into its final destination storage.That is why when it comes to data engineering architecture there are two distinct ways of incorporating transformations into data pipelines: Usually, the steps of the ETL process overlap and are done in parallel wherever possible, to get the freshest data available ASAP. ETL vs ELT: The two ways to architect transformationsĮTL is an idealized form of data architecture that portrays data pipelines as sequentially linear processes. Usually, cleaned data is loaded to business intelligence (BI) tools, where it is ready for visualization and analytics done by the business users. In a typical ETL process, data transformation follows data extraction, where raw data is extracted to the staging area (an intermediate, often in-memory storage).Īfter data is transformed, it is then loaded to its data store: a target database (such as the relational databases MySQL or PostgreSQL), a data warehouse, a data lake, or even multiple destinations. This process requires some technical knowledge and is usually done by data engineers or data scientists. This involves cleaning (removing duplicates, fill-in missing values), reshaping (converting currencies, pivot tables), and computing new dimensions and metrics. What is data transformation in an ETL process?ĭata transformation is part of an ETL process and refers to preparing data for analysis.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |