BookmarkSubscribeRSS Feed
amyk
Fluorite | Level 6

I have multiple CSV files I am importing, however, a few variables have different data types and I am trying to set them the same way. I am having a difficult time in how to change the formats using a proc import so I don't have redundant input logic for each file which has different file names. Can someone help with what would be the best way? 

5 REPLIES 5
Tom
Super User Tom
Super User

Don't use PROC IMPORT to read CSV files.  Write your own data step.  You could run PROC IMPORT on one of them and recall the generated code (or copy from the SAS log). But it write really ugly code and you could write much clearer code yourself.

amyk
Fluorite | Level 6

sorry but I am trying to understand what you mean by writing my own data step, Do you mean for me to use infile statement instead because I was trying to avoid that because I will be using the process multiple times and I wanted to make it somewhat static. 

Kurt_Bremser
Super User

@amyk wrote:

sorry but I am trying to understand what you mean by writing my own data step, Do you mean for me to use infile statement instead because I was trying to avoid that because I will be using the process multiple times and I wanted to make it somewhat static. 


Then proc import is clearly NOT what you want; its results are unpredictable, depending on the contents of the current file. If you want a consistent result, write a data step (you can take one of those that were created by proc import as a blueprint; find them in the log) and run that same data step on all similarly structured csv files.

Tom
Super User Tom
Super User

@amyk wrote:

sorry but I am trying to understand what you mean by writing my own data step, Do you mean for me to use infile statement instead because I was trying to avoid that because I will be using the process multiple times and I wanted to make it somewhat static. 


If the data file structure is static then you definitely will want to write you own data step.  The problem with PROC IMPORT is that it must guess at how to define each variable based on just the current sample of the data.  

 

If instead the data structure is dynamic, but you have multiple files that are all in the new (current) structure, then you could first concatenate the raw files and then point PROC IMPORT at the combined file.  At least then PROC IMPORT would make variable definitions that work for all of the data.  Of course they might be totally different that what the rest of your program expects.

ballardw
Super User

@amyk wrote:

sorry but I am trying to understand what you mean by writing my own data step, Do you mean for me to use infile statement instead because I was trying to avoid that because I will be using the process multiple times and I wanted to make it somewhat static. 


 

Once you have a working data step for one file then you change the source on the infile and the output data set name. Two very small changes and you won't have problems with changing content types from set to set.

Hopefully you have a description somewhere of the expected data such as; length of character fields to set the length of variables in the data step so that all the values read will actual fit; layout of date, time or datetime fields; which values should be character even if they look numeric (Hint: account or person identifiers almost always should be character, I have never been asked to determine the "average account number" or "standard deviation of Zip code" )

A data step program will ensure compliance with your description document.

 

And with a little experience you can actually read multiple files at one time. For example one of my projects sends me data files each month. I run some quality checks and send reports back to the source. Some times they update the values of the data and I have a replacement source file for one or more months. So I read all of the source files for the current year at one time and the corrected data is incorporated into the new year to date data set.

Guess what: that is one data step program with an infile statement that incorporates a wildcard to read all the CSV files matching a specific naming pattern in folder.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 2840 views
  • 2 likes
  • 4 in conversation