BookmarkSubscribeRSS Feed

SAS Data Preparation 2.2: Profile – Content Analysis and Tagging

Started ‎08-20-2018 by
Modified ‎08-20-2018 by
Views 2,576

In this YouTube video I introduced the Data Profiling feature that is part of SAS Data Preparation powered by SAS Viya.  Now with the release of SAS Viya 3.4, there is even more functionality that is part of the profiling feature in SAS Data Preparation.  In this article, I will explore the additional functionality of Identification Analysis and Tagging.

 

The identification of what type of data is in a column uses an Identification Analysis definition from the SAS Quality Knowledge Base.  The definition makes a best guess of what type of data is contained in the column by assigning an ID analysis score for different identity designations.  The column is then tagged with the designation with the highest score.  This tagging can then be used to help determine what data preparation to perform or identify personal identifying information.  Note:  The analysis is based on the locale (country/language combination) that is specified in your Default locale for Quality Knowledge Base setting.  In my case, it is set to Use the default server setting which is English-United States.

 

1_DataExplorerGeneralSettings.png

 

Also, keep in mind the SAS Quality Knowledge Base QKB) for Contact Information (CI) version 29 or later is required as it contains an identification analysis definition called Field Content which is used to perform this analysis during profiling.

 

First, I need to turn on the option to perform the identification analysis of columns when profiling the data set.  This is done in the application settings for SAS Data Explorer by checking the box for the option Analyze column contents while running a profile in the Profile section.  Note:  Selecting this option does impact the profiling performance and will take longer to receive the profile report results.

 

2_DataExplorerProfileSettings.png

 

Next, I navigate to the data set I want to profile and select the button to run the profile.

 

3_RunProfile.png

 

After the profile has finished executing, I review the profile report on the Profile tab.

 

4_ProfileReport.png

 

I can drill down into of the columns such as ADDRESS_1 and view its column-specific profile information.  The column's properties now display an ID analysis score result based on the identification analysis.  The results are relative scores and do not correspond to actual record counts.  In this case, the value with the highest score is Delivery Address.

 

5_ProfileColumnIDAnalysis.png

 

That means that the ADDRESS_1 column is automatically tagged with the Delivery Address value.  To view a column's tag, I select the Details tab and then click on the tag button for the column.

 

6_ViewColumnTag.png

 

I confirm that the ADDRESS_1 column is auto-tagged with the value Delivery Address based on the identification analysis performed on the column.

 

7_ChooseTags.png

 

I can choose to remove this tag or add others to the column.  This tagging can be used to help determine what kind of data preparation I want to perform on the column.

 

As a final step, you may want to uncheck the option Analyze column contents while running a profile in the Profile section since it affects the profile performance and only turn it back on if you need this identification analysis for another data set.

 

8_DataExplorerProfileSettings.png

 

 

For more information on SAS Data Preparation 2.2 powered by SAS Viya 3.4, you can refer to the documentation for SAS Viya 3.4: Data Preparation.

Version history
Last update:
‎08-20-2018 03:20 PM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Labels
Article Tags