BookmarkSubscribeRSS Feed
sks521
Quartz | Level 8

Hi,

 

How can I bring a few variables into SAS using infile statement please?

 

Thanks

S

15 REPLIES 15
hollandnumerics
Pyrite | Level 9
sks521,
Do you know where in the input records these variables are located?
INPUT @aa var1 @bb var2 @cc var3;
will input variables at columns aa, bb and cc.
Philip R Holland
Holland Numerics: Blog and Forums
http://blog.hollandnumerics.org.uk
sks521
Quartz | Level 8

Hi, How do I find that out? It's a CSV file which I need to import the data from but I don't want to import all variables into SAS, but only a selection of these variables.

andreas_lds
Jade | Level 19

The infile-statement reads text-files, not datasets.

 

Please describe what you have and what you need.

 

 

sks521
Quartz | Level 8

The data is in CSV format and I want to bring in a selected set of variables into SAS using infile statement. How do I do that please?

hollandnumerics
Pyrite | Level 9
sks521,
Unfortunately CSV files have to be imported as whole records, but you can then just keep your required set of the variables.
Use a KEEP statement in your DATA step, or use (KEEP= ) on the DATA statement.
Philip R Holland
Holland Numerics: Blog and Forums
http://blog.hollandnumerics.org.uk
sks521
Quartz | Level 8

Thanks, I tried using 'KEEP' but I assume SAS reads all data before importing in so it is giving me errors when it reads 'dirty variables' which I was trying to avoid so the whole procedure fails.

ballardw
Super User

As said, you can't skip variables in CSV. However you can read "junk" variables with only a single character and drop them.

Example

data want;
 infile "yourfile.csv" dlm=',' ;
 informat  dummy $1.;
 informat var1 var2 var3 best16.;
 /* suppose var1 that I want to keep is the third column in the csv,
 var2 is the 7th and var3 is the 9th
 */
 input 
   dummy
   dummy
   var1       
   dummy
   dummy
   dummy
   var2
   dummy
   var3
 ;
 drop dummy;
run;

The Informat assignment for the variable Dummy only reads one character, so hard to get into trouble if the CSV is properly formatted.

You can read multiple columns into the same variable, only the last assignment is kept. You do not need to read any values past the last one that you want.

 

Warning: You may need the DSD option if you have commas inside quoted values.

 

I have files like this where I discard 30+ variables routinely.

Tom
Super User Tom
Super User

The Informat assignment for the variable Dummy only reads one character,

Not really. Because the INFORMAT statement is the first place where DUMMY is referenced it will cause SAS to define DUMMY as a one byte character variable.

Because the INPUT statement is using list mode to read the data it will read the whole word from the CSV file, but since DUMMY is only one character long just the first character is stored.

 

Note you can simplify the INPUT statement by use <n>*varname syntax.

input 2*dummy var1 3*dummy var2 dummy var3 ;

 

andreas_lds
Jade | Level 19

Variables to be read are listed in the input-statement, unfortunately in SAS you have to list all variables when reading a csv file - as @hollandnumerics  already said.

 

Fortunately the input-statement can be used without naming any variable. If used that way the automatic-variable _infile_ contains the whole line and then scan-function can be used to extract only the required variables.

 

Example using sashelp.class and reading only name and age:

proc export data=sashelp.class file="class.csv" dbms=csv replace;
run;

data work.partial_class;
   length Name $ 10 Age 8;
   infile "class.csv" delimiter="," firstobs=2;
   input;

   Name = scan(_infile_, 1, ",");
   Age = input(scan(_infile_, 3, ","), best.);
run;

If your char-variables are enclosed in quotes, see the docs of scan-function, there is an option to handle such data.

sks521
Quartz | Level 8

Thanks but I don't understand what, 1 and 3 mean in the below syntax;

Name = scan(_infile_, 1, ",");
   Age = input(scan(_infile_, 3, ","), best.);

 and how would it change if we have more thank 2 variables to read in?

 

Thanks

S

hollandnumerics
Pyrite | Level 9
sks521,
1 means 1st variable, and 3 means 3rd variable, so you just need to know the index of each variable you need to keep. The example also shows how to cope differently with character and numeric values.

Have a look at class.csv created by the code from andreas_lds, and you should see what it is doing.
Philip R Holland
Holland Numerics: Blog and Forums
http://blog.hollandnumerics.org.uk
ballardw
Super User

Unfortunately parsing _infile_ may require a bit more work depending on the actual content of the data.

 

Consider:

data junk;
   infile datalines dlm=',';
   input;
   put _infile_;
   Name = scan(_infile_, 1, ",");
   Age = input(scan(_infile_, 3, ","), best.);
datalines;
"Jones, Smith",23
;
run;

The put was included so we can see that _infile_ behaves as expected with datalines input. The kicker is that the name is truncated but does read the age for this one.

 

@sks521 really needs to post example data if one of the posted solutions hasn't worked.

 

hollandnumerics
Pyrite | Level 9
@ballardw,

I think it is fair to say that any delimited text file can be a real challenge to parse. I'm often asked about problems reading records ending with numeric values in UNIX/Linux from text files saved on Windows platforms. My advice is generally to read in EVERYTHING as character data, then convert it to numeric, if needed, later. That way you capture everything, because nothing gets converted to missing!

..............Phil
Philip R Holland
Holland Numerics: Blog and Forums
http://blog.hollandnumerics.org.uk
ballardw
Super User

@hollandnumerics wrote:
@ballardw,

I think it is fair to say that any delimited text file can be a real challenge to parse. I'm often asked about problems reading records ending with numeric values in UNIX/Linux from text files saved on Windows platforms. My advice is generally to read in EVERYTHING as character data, then convert it to numeric, if needed, later. That way you capture everything, because nothing gets converted to missing!

..............Phil

Yep, some worse than others.

I have parsed text report files to recover data and such having parse different lines to find the "reset" between groups of data and dealing with mixes of delimited and fixed columns inside a single file.

Which is where I learned to be very careful with the actual content of _infile_ and the behavior of delimiters with the functions like Scan.  

 

And still the OP should post an actual example of his file.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 3952 views
  • 3 likes
  • 5 in conversation