Hi,
How can I bring a few variables into SAS using infile statement please?
Thanks
S
Hi, How do I find that out? It's a CSV file which I need to import the data from but I don't want to import all variables into SAS, but only a selection of these variables.
The infile-statement reads text-files, not datasets.
Please describe what you have and what you need.
The data is in CSV format and I want to bring in a selected set of variables into SAS using infile statement. How do I do that please?
Thanks, I tried using 'KEEP' but I assume SAS reads all data before importing in so it is giving me errors when it reads 'dirty variables' which I was trying to avoid so the whole procedure fails.
As said, you can't skip variables in CSV. However you can read "junk" variables with only a single character and drop them.
Example
data want; infile "yourfile.csv" dlm=',' ; informat dummy $1.; informat var1 var2 var3 best16.; /* suppose var1 that I want to keep is the third column in the csv, var2 is the 7th and var3 is the 9th */ input dummy dummy var1 dummy dummy dummy var2 dummy var3 ; drop dummy; run;
The Informat assignment for the variable Dummy only reads one character, so hard to get into trouble if the CSV is properly formatted.
You can read multiple columns into the same variable, only the last assignment is kept. You do not need to read any values past the last one that you want.
Warning: You may need the DSD option if you have commas inside quoted values.
I have files like this where I discard 30+ variables routinely.
The Informat assignment for the variable Dummy only reads one character,
Not really. Because the INFORMAT statement is the first place where DUMMY is referenced it will cause SAS to define DUMMY as a one byte character variable.
Because the INPUT statement is using list mode to read the data it will read the whole word from the CSV file, but since DUMMY is only one character long just the first character is stored.
Note you can simplify the INPUT statement by use <n>*varname syntax.
input 2*dummy var1 3*dummy var2 dummy var3 ;
Variables to be read are listed in the input-statement, unfortunately in SAS you have to list all variables when reading a csv file - as @hollandnumerics already said.
Fortunately the input-statement can be used without naming any variable. If used that way the automatic-variable _infile_ contains the whole line and then scan-function can be used to extract only the required variables.
Example using sashelp.class and reading only name and age:
proc export data=sashelp.class file="class.csv" dbms=csv replace;
run;
data work.partial_class;
length Name $ 10 Age 8;
infile "class.csv" delimiter="," firstobs=2;
input;
Name = scan(_infile_, 1, ",");
Age = input(scan(_infile_, 3, ","), best.);
run;
If your char-variables are enclosed in quotes, see the docs of scan-function, there is an option to handle such data.
Thanks but I don't understand what, 1 and 3 mean in the below syntax;
Name = scan(_infile_, 1, ",");
Age = input(scan(_infile_, 3, ","), best.);
and how would it change if we have more thank 2 variables to read in?
Thanks
S
Unfortunately parsing _infile_ may require a bit more work depending on the actual content of the data.
Consider:
data junk; infile datalines dlm=','; input; put _infile_; Name = scan(_infile_, 1, ","); Age = input(scan(_infile_, 3, ","), best.); datalines; "Jones, Smith",23 ; run;
The put was included so we can see that _infile_ behaves as expected with datalines input. The kicker is that the name is truncated but does read the age for this one.
@sks521 really needs to post example data if one of the posted solutions hasn't worked.
@hollandnumerics wrote:
@ballardw,
I think it is fair to say that any delimited text file can be a real challenge to parse. I'm often asked about problems reading records ending with numeric values in UNIX/Linux from text files saved on Windows platforms. My advice is generally to read in EVERYTHING as character data, then convert it to numeric, if needed, later. That way you capture everything, because nothing gets converted to missing!
..............Phil
Yep, some worse than others.
I have parsed text report files to recover data and such having parse different lines to find the "reset" between groups of data and dealing with mixes of delimited and fixed columns inside a single file.
Which is where I learned to be very careful with the actual content of _infile_ and the behavior of delimiters with the functions like Scan.
And still the OP should post an actual example of his file.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.