BookmarkSubscribeRSS Feed
11427qvolaf
Calcite | Level 5

A research analyst is given an Excel file and is asked to import it into SAS or SQL and provide this new dataset to your superior in addition to summary statistics. How would you handle this import? Also, once the SAS dataset is complete how would you check that the import worked? Please also list summary statistics or tables that you would provide.

2 REPLIES 2
japelin
Rhodochrosite | Level 12

What kind of Excel file is it? Usually use proc import.
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/acpcref/p0jf3o1i67m044n1j0kz51ifhpvs.htm 

 

Also, it is difficult to give a general answer for summary statistics, as it depends on the situation and what kind of information is needed.

You can use proc univariate to calculate a variety of statistics.
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/procstat/procstat_univariate_syntax01.htm  

 

ballardw
Super User

@11427qvolaf wrote:

A research analyst is given an Excel file and is asked to import it into SAS or SQL and provide this new dataset to your superior in addition to summary statistics. How would you handle this import? Also, once the SAS dataset is complete how would you check that the import worked? Please also list summary statistics or tables that you would provide.


Unless there is a good reason, such as dozens of sheets with exactly the same layout on each sheet I usually convert Excel files to CSV and write a data step to read the data. Why? Because then I control the properties of every variable and don't have to deal with guesses as to which variables should be character, numeric; dates, currency or identifiers; lengths of character variables. If this will be an on going process with multiple files this is particularly important as then I will be able to combine data as needed. You will find 100's of threads on this forum about attempts to combine data with problems where "variable is defined as both character and numeric". All too often these have one or more Proc Import and/or use of wizards that use Import at heart to read data. Every single file read that way the code behind the procs makes decisions on how to use the data and depending on the values of a few rows they often vary.

 

How do I know it worked? Really depends on the complexity and size of the file and what the source documentation (you do have something that describes what the file is supposed to contain don't you? If not, that is something to get from the source. If they can't provide that how do they know what should be in the file.)

 

First is READ the log. A data step may well provide "invalid data" messages when reading the values. Such as attempting to read 30FEB2021 as a date. Not my code problem in this case but errors in the source data. Or a value that is supposed to be numeric that sometimes has a word like "NULL" "Missing" or "N/A". Different approaches again depending on what this is used for. I often use custom informats to address known values (like when the source says "Missing values will be represented by N/A" , again documentation) and behave as expected.

 

Proc freq is one tool to quickly look at individual variables. It will show things like 1) how many missing values (if the source documentation says the variable should never be missing you know there is an issue), 2) if unique identifier variables have counts of 2 or more then they are not unique , 3) all values for each variable; if a variable is supposed to have values of 1,2,3 and 4 and you see values like 27 something is off, 4) you get a feel for distribution of values.

 

Other checks depend on what is supposed to be in the data and can be quite complex.

 

Any statistics or summary reports would depend upon need. I would start by asking the person that will get the summaries what they want/expect. Often in the course of providing that something may come up that you discuss with the user about additional needs depending on the purpose.

 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 674 views
  • 0 likes
  • 3 in conversation