BookmarkSubscribeRSS Feed

The Lost Art of Reading and Loading Raw Data Into SAS® Q&A, Slides, and On-Demand Recording

Started ‎09-26-2023 by
Modified ‎09-27-2023 by
Views 735

Watch this Ask the Expert session to learn the critical step of uploading your data into SAS. 

 

Watch the Webinar

 

You will learn:

  • The importance of raw data.
  • How to load data directly into SAS tables, bypassing the PROC IMPORT “black box.”
  • How to audit data from the original point of collection and identify possible irregularities before analysis.
  • Three of the four methods for reading and loading raw data.
  • How SAS parses data, converting character and numeric streams into meaningful SAS variables.

 

The questions from the Q&A segment held at the end of the webinar are listed below and the slides from the webinar are attached.

 

Q&A

What if you're missing the leading zero for 01/01/11?

It will still work without the leading zero.

 

Can you use infile/input with a csv/xlsx file?

You can only use infile/input to read a text file. NOT XLSX. If it is a CSV file then yes you can, it is considered a delimited file with a delimiter of a comma.

 

You do have a couple of options: 1) you can save a .xlsx file as a text file, .csv, thus allowing you to read using INFILE/INPUT. But the only reason I'd do that is if i did not license the "SAS ACCESS to PC FILE FORMATS" module in SAS; so I think of this options as a work-around. Option 2) is better: Assuming you have the above license, write a libname statement pointing to a particular Excel workbook, like this: libname myxls xlss "s:\workshop\data\excel_workbook.xlsx;

With this method, you may then access any worksheet in the above workbook (excel_workbook.xls) as if it were an existing SAS table, via a SET statement! so, like this:

data new;

set myxlsx.sheet1;

some logic here....;

a bit more logic....;

run;

 

 

What do you do to determine the correct positions/how many positions there are for each variable? Like what if row 5000 has 8 characters for var1 but row 1 has 3 characters? Do we just eyeball it and guesstimate?

You can use Notepad to help determine field positions. If you open your raw data file in notepad, on the bottom right it shows field positions. Some companies provide a data dictionary with field positions.

 

Can forward slash (/) stand in its own line? Or does there need to follow something?

Do you mean (/) used for reading a record? It can stand on its own, which means read a record. But without any details following, it will not transfer to PDV.

 

Follow-up question: is there a similar way to specify what informat with PROC import?

No, you cannot with PROC import. It is not as flexible as the infile/input in the Data step. Proc import is out of the box and does it's best to interpret. But if it doesn't do what you want, that's when you use infile/input.

 

You need a trailing @ sign to hold the input.

Correct!

 

Could you please point me to code examples (SAS Documentation or SAS conference proceedings paper) of reading unstructured HTTP response (HTML) data meeting certain conditions for the rows?

https://support.sas.com/resources/papers/proceedings12/121-2012.pdf

http://support.sas.com/resources/papers/proceedings11/062-2011.pdf

 

Do these input examples work the same when using a direct connection to Teradata?

Teradata is a database; SAS can connect to it directly using the SAS/ACCESS engine to Teradata. Teradata tables are NOT text files, so this technique is not applicable. Hope that answers your question.

 

If job title in second line ends in 60 instead of 55, will it read the salary correctly?

Great question. What will happen is your job title will look odd and be something like this: Sales Manager108,25. Because the column point for salary is at 56, the point will in fact move back and successfully read salary.

 

Is a subsetting WHERE more efficient than a subsetting IF in a data step that reads in a SAS data set, for instance, from a SET statement?

Yes, you are correct. When reading a SAS dataset using the SET statement, it can "access" the columns and their values directly before it gets to the PDV. This allows the use of the WHERE statement for those existing columns, and, as you mentioned, is more efficient.

 

Great content, thanks. I agree it's a lost art. I would appreciate hearing your opinion why this is still relevant today when many data analysts access data using some version of GUI and data import wizards.

This topic is relevant today for at least three reasons:   1) Data Import wizards don’t - and often can’t - deal with complex hierarchical files.   Ultimately reading raw data requires an understanding of the data structure and the subtle nuances involved necessary to read accurately; 2) When auditing, confirming, or doing a peer-review, the reviewer must be able to verify that the data coming off the data collection system was read correctly.   It’s NOT safe or advisable to ignore this critical first step, just assuming that the newly-created SAS file is spot on correct – because is often is not correct; and 3) Proc IMPORT writes only the most basic data step, omitting all the necessary logic, variable creation, data manipulation, and validation that ultimately has to be done.   From an efficiency perspective, it makes incredible sense for the analyst to build ONE data step that both imports and performs necessary computations in the same step.

 

 

Recommended Resources

On Line SAS Documentation

Moving from SAS®9 to SAS® Viya®

Move to Viya Board

Please see additional resources in the attached slide deck.

 

Want more tips? Be sure to subscribe to the Ask the Expert board to receive follow up Q&A, slides and recordings from other SAS Ask the Expert webinars.  

Version history
Last update:
‎09-27-2023 12:31 PM
Updated by:
Contributors

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Article Labels
Article Tags