BookmarkSubscribeRSS Feed
jarno
SAS Employee

Preparing to write this Juletip I went back to the basics of Xmas. Where is Santa Claus from? Of course, Lapland in the north of Finland. He even has his own Santa Claus village in Rovaniemi.

 

Then I was pondering, come Christmas Eve, Santa will hop on his sleigh and even though it might be a flying sled, he would still need some snow to get it safely off the ground. So, I started to research, will there be enough snow in Rovaniemi on Christmas Eve. For this experiment I used SAS Studio on SAS Viya 2024.10 release. Also, because I’m dealing with snow, I just had to use Snowflake cloud database to store my finished analytical base table data! In this example, I’m doing some simple data management in SAS Studio, to prove the point that anyone can do data prep and analytics on freely available open data.

To answer the research question, I visited the download observations site of the Finnish Meteorological Institute: https://en.ilmatieteenlaitos.fi/download-observations They offer superb selection of open data and offer many parameters to customize what you are downloading.  I also noticed they provide open APIs as well, a word about those in the conclusion.

00_fmi_obseravations.png

 

I then proceeded to download the weather data for Rovaniemi Airport observation station, as that is the station closest to Santa’s Village. I chose the data from years 2013-2023. This I received in consumer friendly xlsx format, it was 4018 rows of data.

 

01_data_in_excel.png

 

My tool of choice was SAS Studio and its flow designer. First step was to import the xlsx into a SAS table called SNOW_RAW:

 

03_import_xlsx.png

I used the Import data step which is capable in importing CSV and XLS/X files. Import ran successfully but looking at the imported columns,  I unfortunately noticed a data quality problem.

The most interesting column that tells the depth of snow used -1 almost consistently to indicate no snow at all – except for year 2014 where they had on some rows used the ‘-‘ (dash) character. Also, I found the convention of using -1 to indicate no snow a bit problematic for analysis, as I’d rather have zero to symbolize no snow.

Fortunately there is a tool to investigate the string patterns in data columns – SAS Information Catalog! So I analyzed the table there and found out the following:

 

012b_dq_issue_in_infocat.png

There are indeed 7 rows that use the dash ‘-‘ to symbolize no snow, so that was the culprit!

Having a visual look and the actual data proved my suspicions. This quite well proves that discovering data quality problems with SAS Information Catalog is easy for any data user.

 

02_dq_issue.png

This caused SAS to interpret the column as character instead of numeric and that is a problem when doing analyses on the data. Thus, I wanted to remedy this and started to work on improving the quality of data for the snow depth column. First I use the Recode Values step to clean out the -1 and ‘-‘ values in the SNOW column.

04_cleanse.png

It’s super simple to use, just state the old value and new value that will be used as replacement. As you can see in the below pic, I’m recoding both -1 and dash values to zero. Also you can define whether you want to create a new column or change the existing one. I’m creating a new column called SNOW_RECODED.

041_recoded.png

As my snow column is still in character format, next I want to convert that to a new numeric column called SNOW_CLEAN. For that I use the Calculate Columns step. Of course, the versatile Query step would work just as well, but calculate columns has a few perks that make it easy to use. For example, it provides a ready-made tool for converting from char to numeric:

042_convert_column.png

The created code can be seen in the expression builder and it’s very straightforward as expected. It uses the SAS input function and requires a SAS informat as parameter.

05_expression builder.png

To create a DATE column, I can simply combine the Day, Month and Year columns with SAS cats function to concatenate them together. Then I use the SAS input function to read them in as dates using the good old DDMMYY10. informat.

input ((cats('Day'n,'-','Month'n,'-','Year'n)),DDMMYY10.)

SAS date formats are always a bit cryptic, but they are well documented. See here for DDMMYYx. Now that I’m happy with my data I can proceed to loading the data into Snowflake and create some basic analyses to see if Santa will find snow on Xmas eve. Notice the ‘S’ symbol on data to indicate that the data has now been stored in Snowflake. A use case for this could be that the next consumer of data down the data chain needs their data stored in Snowflake, but also showcases the seamless integration between SAS and Snowflake. Also it has the word SNOW, so it’s appropriate for my theme!

 

06_load_to_snow_analyze.png

Using Characterize Data step I can see that the mean for snow depth for the last 11 years on Xmas eve has been about 36cm, so according to near historical data, Santa’s sled should get a safe start.

07_desc_stat.png

Then I use the Bar Chart step to find out corroborating evidence for a swift sleighride for Santa, as I’m filtering the graph for Month=12 and Day=24 to find out historical snow depths for Xmas eve.

08_bar_chart.png

Application of the Series plot step is why I needed to create the continuous DATE variable as it shows the whole time period in one graph. Looking at the all year round snow depths, this also predicts smooth sledding for Santa.

09_series_plot.png

This concludes the example, and I hope it shows that data management can be simple also with real data. Also, it worth to notice that SAS Studio has grown wings and is now a powerful tool for many types of data manipulation and analysis. SAS Studio is being developed at amazing pace with new features released monthly with each release and a great place to find out about the latest developments is here: What’s New in SAS Studio

Next step to take this further would be to utilize the Finnish Meteorological Instute’s open APIs for fetching data and they also provide ready-made Python, R and JavaScript libraries for this. How about creating a SAS Studio Custom Step for extracting the weather data through one of these APIs? Perhaps something for the next exercise… until then, happy holidays!

sas-innovate-white.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.

 

Early bird rate extended! Save $200 when you sign up by March 31.

Register now!

Discussion stats
  • 0 replies
  • 644 views
  • 6 likes
  • 1 in conversation