BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
This process is a constant struggle for me. I would like to import a delimited file. Some character fields have a variable number of characters that I do not know in advance. In SQL these would be of type VARYING CHARACTER or TEXT. What INFORMAT do I need to import a character field with a length of 0-1024? How do I ensure SAS doesn't move past the delimiter into another field?
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Note that if you really don't know what is in the file then look at it first. The LIST statement is really good for this as it will show you the hexcode for invisible characters, like tabs.

So you might look at the first 10 lines to see what is there,  Does it have a header line? Is it using comma, semi-colon, pipe, tab, something else as the delimiter?

data _null_;
  infile 'mytextfile.txt' obs=10;
  input;
  list;
run;

Also then you can just copy the first line from the SAS LOG and paste it into the program editor so that you can use it to begin making names for the variables.

 

If you want to get a better idea of what is in the file then read it into a bunch character variables and examine it.  Find the maximum length. How many different values.  Do they look like numbers? dates? time? 

So you might start by reading the first 9,999 observations into 40 columns of character variables.

data  sample;
  infile 'mytextfile.txt' obs=10000 firstobs=2 dsd truncover;
  length x1-x40 $50 ;
  input x1-x40;
run;

View solution in original post

3 REPLIES 3
Kurt_Bremser
Super User

Use the correct delimiter.

Assign a sufficient length to the variable.

Do not use a format in the input statement.

 

If the dataset has a sufficient number of observations, store it with the compress=yes option, so you don't waste disk space with blanks.

Tom
Super User Tom
Super User

Just use the DSD option and a list mode INPUT statement.

1) Define you variables. Use a LENGTH or ATTRIB statement.  

2) Add a FORMAT statement for any value that requires it. Most variables do NOT need to have a format attached. Main exception is DATE, TIME or DATETIME values that where it would be hard for humans to understand the unformatted values.

3) Either add an INFORMAT statement for variables where the text strings in the CSV file need to be translated to store the right value in the SAS dataset.  For most values SAS does not need to use an INFORMAT to read the value.  Again the main exception is for DATE, TIME and DATETIME values.  Also numeric values that have been formatted in the text with $ or commas will require an informat.  Note you could do this by adding an informat specification to the INPUT statement, but make sure to prefixed the informat specification with : to make sure that input is still list mode.  Also note that you do not need to supply a width value for the informat specification since SAS will ignore it when reading data in list mode.  In list mode SAS adjusts the informat to the width of the data.

4) Add an INPUT statement that just lists the fields to read. 

 

data mydata;
  infile 'mytextfile.txt'  dsd dlm=',' truncover firstobs=2;
  length name $20 age 8 date 8 salary 8;
  informat date mmyydd. salary comma. ;
  format date yymmdd10. ;
  input name age date salary ;
run;

Note that if you have defined the variables in the order that they appear in the CSV file the INPUT statement itself could use a positional variable list instead of listing all of the variable names again.

input name -- salary;

 

Note that SAS only uses fixed length character variables. It just means that when SAS stores the values in the dataset it pads them with spaces. It really doesn't matter at all with a CSV file since standard practice for processing CSV files it to ignore trailing (and leading) spaces anyway.  If you are reading a file that you don't know a lot about set them to something much longer than you need. You can always adjust them later.  I frequently will round character variables of multiples of 10, 25, 50 or 100 bytes just so I don't have to worry too much about the next example of this type of file having slightly longer values.

Tom
Super User Tom
Super User

Note that if you really don't know what is in the file then look at it first. The LIST statement is really good for this as it will show you the hexcode for invisible characters, like tabs.

So you might look at the first 10 lines to see what is there,  Does it have a header line? Is it using comma, semi-colon, pipe, tab, something else as the delimiter?

data _null_;
  infile 'mytextfile.txt' obs=10;
  input;
  list;
run;

Also then you can just copy the first line from the SAS LOG and paste it into the program editor so that you can use it to begin making names for the variables.

 

If you want to get a better idea of what is in the file then read it into a bunch character variables and examine it.  Find the maximum length. How many different values.  Do they look like numbers? dates? time? 

So you might start by reading the first 9,999 observations into 40 columns of character variables.

data  sample;
  infile 'mytextfile.txt' obs=10000 firstobs=2 dsd truncover;
  length x1-x40 $50 ;
  input x1-x40;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 6724 views
  • 2 likes
  • 3 in conversation