SQL Functions
As discussed in previous posts (ETL and Extracting with Bulk Insert), the first step in the ETL process is to land the data. Once landed in SQL Server, it is best practice to never assume the landing schema will be the final target schema. Since this data set does not appear to have a gauntlet of data types, I wanted to see if I could write something to clue me on to the data types.
The CSV files contained columns that were clearly labeled as ids. With a quick handful of queries, I was able to confirm the integer columns were well enough. The remainder of the columns were varchar
. Just by scanning the tables, it was easy to see that the varchar
data type would not be sufficient. The problem became, how do I check what the data type should be for one value?
. Then, how do I perform this data type check for all varchar values?
.
Performing the check for one column turned out to be using a built-in SQL Server function ISNUMERIC(expression)
. With this and a check for a decimal place within the varchar
to determine if the value is a decimal or an integer. It would be silly to copy and paste this conditional all over the type checking scripts, a stored procedure is too big of a tool, the middle of the road option is SQL Functions. There are hundreds of functions that come out of the box, as developers we can also define our own. For example:
use FormulaOneAnalysis
drop function if exists dbo.TypeCheck
go
create function dbo.TypeCheck(@param varchar(max))
returns varchar(max)
begin
declare @response varchar(max)
set @response = ''
-- check if the parameter passed is a numeric value
if (isnumeric(@param) = 1)
begin
set @response = concat(@response, 'numeric')
-- check if the value is a decimal or an int
if (charindex('.', @param) > 0)
set @response = concat(@response, ':decimal')
else
set @response = concat(@response, ':int')
end
else
begin
set @response = concat(@response, 'varchar:', len(@param))
end
-- return the concatenated string response
return @response
end;
With the function defined we are just a query away from using it:
use FormulaOneAnalysis
select dbo.TypeCheck(Name)
from RacesTypeAnalysis
With this knowledge, there is no need to copy-paste from script to script building a maintenance nightmare to maintain.