DATA Step, Macro, Functions and more

Start to finish how do I import a CSV file with the DATA step?

Accepted Solution Solved
Reply
Contributor
Posts: 71
Accepted Solution

Start to finish how do I import a CSV file with the DATA step?

This process is a constant struggle for me. I would like to import a delimited file. Some character fields have a variable number of characters that I do not know in advance. In SQL these would be of type VARYING CHARACTER or TEXT. What INFORMAT do I need to import a character field with a length of 0-1024? How do I ensure SAS doesn't move past the delimiter into another field?

Accepted Solutions
Solution
‎02-01-2018 03:05 PM
Super User
Super User
Posts: 7,845

Re: Start to finish how do I import a CSV file with the DATA step?

Note that if you really don't know what is in the file then look at it first. The LIST statement is really good for this as it will show you the hexcode for invisible characters, like tabs.

So you might look at the first 10 lines to see what is there,  Does it have a header line? Is it using comma, semi-colon, pipe, tab, something else as the delimiter?

data _null_;
  infile 'mytextfile.txt' obs=10;
  input;
  list;
run;

Also then you can just copy the first line from the SAS LOG and paste it into the program editor so that you can use it to begin making names for the variables.

 

If you want to get a better idea of what is in the file then read it into a bunch character variables and examine it.  Find the maximum length. How many different values.  Do they look like numbers? dates? time? 

So you might start by reading the first 9,999 observations into 40 columns of character variables.

data  sample;
  infile 'mytextfile.txt' obs=10000 firstobs=2 dsd truncover;
  length x1-x40 $50 ;
  input x1-x40;
run;

View solution in original post


All Replies
Super User
Posts: 9,557

Re: Start to finish how do I import a CSV file with the DATA step?

Posted in reply to tomcmacdonald

Use the correct delimiter.

Assign a sufficient length to the variable.

Do not use a format in the input statement.

 

If the dataset has a sufficient number of observations, store it with the compress=yes option, so you don't waste disk space with blanks.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Super User
Super User
Posts: 7,845

Re: Start to finish how do I import a CSV file with the DATA step?

Posted in reply to tomcmacdonald

Just use the DSD option and a list mode INPUT statement.

1) Define you variables. Use a LENGTH or ATTRIB statement.  

2) Add a FORMAT statement for any value that requires it. Most variables do NOT need to have a format attached. Main exception is DATE, TIME or DATETIME values that where it would be hard for humans to understand the unformatted values.

3) Either add an INFORMAT statement for variables where the text strings in the CSV file need to be translated to store the right value in the SAS dataset.  For most values SAS does not need to use an INFORMAT to read the value.  Again the main exception is for DATE, TIME and DATETIME values.  Also numeric values that have been formatted in the text with $ or commas will require an informat.  Note you could do this by adding an informat specification to the INPUT statement, but make sure to prefixed the informat specification with : to make sure that input is still list mode.  Also note that you do not need to supply a width value for the informat specification since SAS will ignore it when reading data in list mode.  In list mode SAS adjusts the informat to the width of the data.

4) Add an INPUT statement that just lists the fields to read. 

 

data mydata;
  infile 'mytextfile.txt'  dsd dlm=',' truncover firstobs=2;
  length name $20 age 8 date 8 salary 8;
  informat date mmyydd. salary comma. ;
  format date yymmdd10. ;
  input name age date salary ;
run;

Note that if you have defined the variables in the order that they appear in the CSV file the INPUT statement itself could use a positional variable list instead of listing all of the variable names again.

input name -- salary;

 

Note that SAS only uses fixed length character variables. It just means that when SAS stores the values in the dataset it pads them with spaces. It really doesn't matter at all with a CSV file since standard practice for processing CSV files it to ignore trailing (and leading) spaces anyway.  If you are reading a file that you don't know a lot about set them to something much longer than you need. You can always adjust them later.  I frequently will round character variables of multiples of 10, 25, 50 or 100 bytes just so I don't have to worry too much about the next example of this type of file having slightly longer values.

Solution
‎02-01-2018 03:05 PM
Super User
Super User
Posts: 7,845

Re: Start to finish how do I import a CSV file with the DATA step?

Note that if you really don't know what is in the file then look at it first. The LIST statement is really good for this as it will show you the hexcode for invisible characters, like tabs.

So you might look at the first 10 lines to see what is there,  Does it have a header line? Is it using comma, semi-colon, pipe, tab, something else as the delimiter?

data _null_;
  infile 'mytextfile.txt' obs=10;
  input;
  list;
run;

Also then you can just copy the first line from the SAS LOG and paste it into the program editor so that you can use it to begin making names for the variables.

 

If you want to get a better idea of what is in the file then read it into a bunch character variables and examine it.  Find the maximum length. How many different values.  Do they look like numbers? dates? time? 

So you might start by reading the first 9,999 observations into 40 columns of character variables.

data  sample;
  infile 'mytextfile.txt' obs=10000 firstobs=2 dsd truncover;
  length x1-x40 $50 ;
  input x1-x40;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 152 views
  • 2 likes
  • 3 in conversation