By now I'm sure you have heard of the new SAS product – SAS Visual Investigator. If not, SAS Visual Investigator is a highly configurable application that empowers intelligence analysts to:
SAS Visual Investigator is a very powerful tool, but it is only as good as the data that is imported into it.
Therefore, data preparation is an essential component of any SAS Visual Investigator implementation project. SAS Visual Investigator supports importing data from: delimited text files, CSVs, Excel spreadsheets, PostgreSQL, and Oracle. Note: Excel spreadsheets are a new format that is supported with the release of SAS Visual Investigator 10.2 in 17w13. Data that this imported from delimited text files, CSVs, or Excel spreadsheets are uploaded (copied) to a PostgreSQL database internal to SAS Visual Investigator; therefore, importing data from these formats is best suited for one-off investigations or demonstration purposes. When data is imported from PostgreSQL or Oracle, SAS Visual Investigator simply stores a connection to the source data which means if there are updates to these connected sources SAS Visual Investigator will pick up those updates during its re-indexing process. Hence, preparing your SAS Visual Investigator data to either PostgreSQL or Oracle is recommended.
There are lots of data preparation tools you can use and of course SAS has some – SAS Data Integration Studio, DataFlux Data Management Studio as well as SAS Code. When preparing your data to either PostgreSQL or Oracle for import into SAS Visual Investigator some items you will want to consider are: data quality, data enrichment, and the incremental update/deletion process.
When preparing your data for SAS Visual investigator you may want to address data quality issues by performing actions such as parsing, casing, standardization, and fuzzy matching. The SAS Quality Knowledge Base (QKB) has out-of-the-box definitions to help you perform these data quality functions on many different data types (Name, Address, Phone, etc.) and the QKB definitions can be called from the SAS data preparation tools mentioned above. You also have the ability to customize the out-of-the-box definitions and/or create new definitions for other data types for use in your data preparation efforts. Refer to the DataFlux Data Management Studio: Customize the QKB course for more information on creating your own QKB definitions. Below are some examples of data quality issues you may want to address as part of your data preparation efforts:
Parsing: Prior to data preparation your data set may have people's names all in one field, but in SAS Visual Investigator you may want to parse the data in separate fields (e.g., First Name, Middle Name, and Last Name).
|Data Value Prior to Parsing||Data Value After Parsing|
|First Name||Middle Name||Last Name|
|Donald Robert Williams||Donald||Robert||Williams|
|Susan Jane Smith||Susan||Jane||Smith|
Casing: Prior to data preparation your data set may have email addresses in various case formats, but in SAS Visual Investigator you want your email addresses displayed in a consistent lower case format.
|Data Value Prior to Casing||Data Value After Casing|
Standardization: Prior to data preparation your data set may have phone numbers in various formats, but in SAS Visual Investigator you want your phone numbers displayed in a consistent format.
|Data Value Prior to Standardization||Data Value After Standardization|
|1 (919) 531-0008||+1 919 531 0008|
|1-919-531-7588||+1 919 531 7588|
Fuzzy Matching: Prior to data preparation you may have several records for Donald Williams – Donald Williams, Donny Williams, and Don Williams. By preparing your data using fuzzy matching techniques you can combine those records into one master record and link them to their contributing records, thus allowing you to create a network in SAS Visual Investigator like the one pictured below.
As part of your data preparation process for SAS Visual Investigator you may also want to consider enriching your data sets by performing actions such as address verification and geocoding. Geocoding provides latitude and longitude information for a supplied address and if that address has been verified the geocode results will be more accurate. SAS Visual Investigator uses latitude and longitude information to plot a point on a map that you can display in the application.
Incremental Update/Deletion Process
SAS Visual Investigator 10.2 has the ability to support incremental updates and record deletions from your source data; however, you must prepare your SAS Visual Investigator source tables properly in order to take advantage of these new features. For incremental updates, you must designate a date/time field on your source table as the "last updated at time field". In order for SAS Visual Investigator to be aware of record deletions in your source table, you must create a delete table with specific table design requirements for each entity and relationship in SAS Visual Investigator. The "About Incremental Updates and Deletions" section of the SAS Visual Investigator Administrator's Guide discusses the specific requirements for using these new features.
In conclusion, SAS Visual Investigator is a very powerful tool, but it is only as good as its data.