BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Beanpot
Fluorite | Level 6

Hi, I'm working on a dataset that was given to me in .txt format. The codebook gives me the names, positions and lengths of the variables. But the variables themselves are not labeled in the text file. When I import the datafile the contents looks like this:

 

PROC IMPORT OUT=<data>
DATAFILE = "location\file.txt"
DBMS = dlm REPLACE;

run;

Beanpot_0-1674835718758.png

 

How do I tell SAS where the variables are? In other words the name of variables 1, 2, 3, etc., their locations (e.g. position 20), and lengths (e.g. 1 character or 3 characters)?

 

1 ACCEPTED SOLUTION

Accepted Solutions
mtnbikerjoshua
Obsidian | Level 7

Hi @Beanpot,

 

The PROC IMPORT procedure gives you pretty limited options on how to import your dataset. You get a lot more flexibility if you import the data using a data step. You can check out the documentation for the input and infile statements to see how to do this. Also, here's a blog article that explains it pretty simply: 2 Ways to Import a Text File into SAS (Examples!) - SAS Example Code

 

In the simplest case, your code might look something like this:

data mydata;
  length var1 $20 var2 $10;
  input var1 $ var2 $;
  infile "/my/file/path.txt";
run;

 

You can also look in the log for the code generated by proc import and just copy and modify that.

 

Hope that helps!

 

Joshua

View solution in original post

4 REPLIES 4
mtnbikerjoshua
Obsidian | Level 7

Hi @Beanpot,

 

The PROC IMPORT procedure gives you pretty limited options on how to import your dataset. You get a lot more flexibility if you import the data using a data step. You can check out the documentation for the input and infile statements to see how to do this. Also, here's a blog article that explains it pretty simply: 2 Ways to Import a Text File into SAS (Examples!) - SAS Example Code

 

In the simplest case, your code might look something like this:

data mydata;
  length var1 $20 var2 $10;
  input var1 $ var2 $;
  infile "/my/file/path.txt";
run;

 

You can also look in the log for the code generated by proc import and just copy and modify that.

 

Hope that helps!

 

Joshua

Tom
Super User Tom
Super User

If the codebook gives you the position and length then it probably is a FIXED length file.  PROC IMPORT is for DELIMITED files.

 

The way to tell the difference is to LOOK at the text file.  You can use any simple text editor.  Or just use a simple data step like :

data _null_;
  infile "myfile.txt" obs=5;
  input;
  list;
run;

A fixed length file will look like:

Alfred  14M    69 112.5
Alice   13F  56.5    84
Barbara 13F  65.3    98
Carol   14F  62.8 102.5
Henry   14M  63.5 102.5

And a delimited file will look like:

Alfred,14,M,69,112.5
Alice,13,F,56.5,84
Barbara,13,F,65.3,98
Carol,14,F,62.8,102.5
Henry,14,M,63.5,102.5

You might also be able to tell from the description you have. If POSITION is just 1,2,3,4, etc then you have a delimited file.  But is position skips based on the width such as 1,9,12,13, .... then you have fixed length (fixed position) file.

 

To read a fixed position file you can use column mode:

input name $ 1-8 age 9-10 sex $ 11 height 12-17 weight 18-23;

Or formatted mode:

input name $8. age 3. sex $1. height 6. weight 6. ;

Or perhaps some combination by use @ pointer control,  This is useful when only some of the values are dates or other values that need informats.

input name $ 1-8 age 9-10 sex $ 11 @12 height 6. weight 18-23

If you have DATA with the NAME, TYPE, POSITION and LENGTH you can use it to generate the INPUT statement for you.

 

For delimited files just define the variables in the order they appear and then the INPUT statement can just use a positional variable list:

data want;
  infile "class.txt" dsd truncover ;
  length name $8 age 8 sex $1 height weight 8 ;
  input name -- weight ;
run;

If any variable REQUIRE and informat or a format then add those two statements.  This is normally only need for values like date, time or datetime where the human readable strings need special formats and informats.

ballardw
Super User

@Beanpot wrote:

Hi, I'm working on a dataset that was given to me in .txt format. The codebook gives me the names, positions and lengths of the variables. But the variables themselves are not labeled in the text file. When I import the datafile the contents looks like this:

 

How do I tell SAS where the variables are? In other words the name of variables 1, 2, 3, etc., their locations (e.g. position 20), and lengths (e.g. 1 character or 3 characters)?

 


Is the codebook too sensitive to share some of the details? If not share some of the book and we can probably get you started.

It is not unlikely that you may have to modify the names of variables depending on what you were given. SAS variable names are limited to 32 characters and only letters, digits and the _ character. So if the names don't fit those rules then modify the names to fit SAS rules and use a LABEL for the variables to get a nicer description in the data.

 

If the codebook is in a nice enough format, such as word processor tables or spreadsheets I have even used spreadsheets to generate the informat and input statements, variable names and labels reduce typing. Considering that some of the sets I have had to deal with have upwards of 500 variables that can be quite a time saver.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1268 views
  • 0 likes
  • 5 in conversation