Data Loading

progress bar

The final step in the ETL process is to load the transformed data into the target datastore. For this example, I have created a landing database on the same SQL Server, but this final target could be anywhere data can be stored. Given that much of the data is already in the ideal final shape, much of this step is to create the new schema and move the data.

The first tables to be loaded were those with no foreign key dependencies, as they can just be moved without additional work. One of these tables was a new structure to normalize nationality across the data set. With the small subset of tables loaded, it was time to move on to tables containing dependencies. Not only was the schema added to the target database, but as part of the schema creation, the foreign key relationships were added. Adding the relationship can be done inline when creating the table. To add the foreign key relationship in the column definition passed into create table, the column must first be defined with the appropriate type. Then append references {table}({foreign key identifying column}) after the column's type definition. String it all together, the result is nationality_id int references dbo.Nationality(id). Here is a complete example of defining the table schema:

create table dbo.Driver (
    id int not null,
    reference_name varchar(25) not null, 
    number int, 
    code varchar(5) null,
    forename nvarchar(25) not null,
    surname nvarchar(50) not null,
    date_of_birth varchar(10) not null,
    nationality_id int references dbo.Nationality(id),
    url varchar(100)

    constraint pk_driver primary key clustered (
        id asc
    ) with (
        pad_index = off, 
        statistics_norecompute = off,
        ignore_dup_key = off,
        allow_row_locks = on,
        allow_page_locks = on,
        optimize_for_sequential_key = off
        ) on [primary]
) on [primary]

Loading the tables in this order made loading this data easy. As the related tables were already in the target database, the foreign key column could be populated with all the other data in the table. I was able to repeat this pattern until all the tables were loaded into the target database. With all the data loaded into the target, there are only 2 things left to complete. First, a quick retrospective on the decisions made and what changes would be made in processing the next ETL. Second, playing with the data and putting it to use!