Help using Base SAS procedures

reading datafiles with different variables

Reply
N/A
Posts: 0

reading datafiles with different variables

Hi All,

Is there a way to read and vertically stack several datafiles that have some common variables and some variables that are different? This is without using several data statements and input statements or sas macros. Any ideas!! Thank you.
Trusted Advisor
Posts: 2,116

Re: reading datafiles with different variables

Posted in reply to deleted_user
Just use the SET statement in a DATA STEP. It will keep the common variable names aligned in the resultant dataset.
N/A
Posts: 0

Re: reading datafiles with different variables

Posted in reply to deleted_user
Thanks Doc@Duke for the response. The datafiles that I have are in .csv format in excel. Therefore, before I can use a SET statement in a data step, I'll need to import the datafiles into SAS. For example, if I have 15 datafiles with different variable structure in .csv format, I need to write 15 DATA steps and 15 input statements for all those datafiles first, and then use the SET statement in another datastep that will align the common variables and give missing values for a variable not present. I was wondering if there is some way to automate (ex. do loops or similar) the process of reading the datafiles with non-uniform variables from excel into SAS and vertically stack them up.
Respected Advisor
Posts: 3,799

Re: reading datafiles with different variables

Posted in reply to deleted_user
You can read any number of files in a single data step. There are two INFILE statement options that facilitate that activity. FILENAME and FILEVAR. When the value of FILEVAR change SAS will close the current infile and open the file specified in the FILEVAR. FILENAME is a variable that contains the name of the file being read. Usually FILEVAR is used when you have a list of from a meta data source, and FILENAME is used when you are reading from a concatenated FILEREF or with wildcards.

If the files require different INPUT statements that can be accommodated with IF statements that test value returned from FILENAME or perhaps you can "dip" into each file to determine the way it should be read.

With only 15 files and an unknown(to me) number of structures you may be just as well off using 15 PROC IMPORTS.

Post some small sample data files and I will show you the code.
N/A
Posts: 0

Re: reading datafiles with different variables

Posted in reply to data_null__
Suppose I have 3 example data sets in .csv format:

1) abc.csv

id, day, v1, v2, v3, v4
1, 1, VA, 100, 77, 0.23
2, 1, VA, 120, 44, 0.11
3, 1, VA, 135, 23, 0.45
..........;

2) def.csv

v1, x, id, v2, v3, v5, day, v4
VA, S, 1, 120, 35, 0.22, 2, 0.44
VA, S, 2, 110, 23, 0.01, 2, 0.33
VA, M, 3, 30, 21, 0.55, 2, 0.11
.......................;

3)xyz.csv

v2, v3, x, day, id
100, 23, M, 3, 1
120, 32, D, 3, 2
110, 43, D, 3, 3
....;

Can I stack them up without using 3 DATA/input statements? Thanks.
Super Contributor
Super Contributor
Posts: 3,174

Re: reading datafiles with different variables

Posted in reply to deleted_user
Nearly anything is possible with SAS (and the requisite SAS programming experience to support said application).

I have an application that uses the first row to determine the layout of the remaining rows in the CSV-format file, based on a common set of INFORMAT and LENGTH statements for the data-source referenced.

Explore using a DATA step to read the first row as a character string and use this information to generate your INPUT statement. Write your SAS-generated SAS code to a temporary file and then use %INCLUDE to invoke that code (your INPUT statement) in a subsequent DATA step.

The code below provides an oversimplified example using instream "cooked" data.



Scott Barry
SBBWorks, Inc.

filename sasdata temp;
* generate some data to read. ;
data _null_;
input;
file sasdata;
put _infile_;
datalines;
"a",b,c,d,e
a a,1,2,3,4
a,,2,3,4
b,,4,3,
run;
filename sastemp temp;
* read the header row to get var list. ;
data _null_;
infile sasdata ;
input ;
length varlist $500;
varlist = compbl(translate(_infile_,' ',',"'));
file sastemp;
put 'input ' varlist ';' ;
putlog '>info> echo input list: ' varlist=;
stop;
run;
* now read the data rows only, based on the header row as var list. ;
data _null_;
* standardized SAS variable attributes follow. ;
length a $3 b c d e 5;
informat a $char3.;
* ;
infile sasdata dlm=',' dsd missover firstobs=2;
%include sastemp;
putlog '>info> diag info: ' / _all_;
run;
Respected Advisor
Posts: 3,799

Re: reading datafiles with different variables

Posted in reply to deleted_user
This is similar to the code posted by SBB, applied to your data example data. There is one important meta data missing "variable type". If your data included that information it would be more or less completly dynamic. That is the kind if think that PROC IMPORT does.

Anyway here if the code. I added a few files to further test the program for files with zero records.

[pre]
dm 'clear log; clear output;';
filename ft15f001 "%sysfunc(pathname(work))\abc.csv";
parmcards;
id, day, v1, v2, v3, v4
1, 1, VA, 100, 77, 0.23
2, 1, VA, 120, 44, 0.11
3, 1, VA, 135, 23, 0.45
;;;;
run;

filename ft15f001 "%sysfunc(pathname(work))\def.csv";
parmcards;
v1, x, id, v2, v3, v5, day, v4
VA, S, 1, 120, 35, 0.22, 2, 0.44
VA, S, 2, 110, 23, 0.01, 2, 0.33
VA, M, 3, 30, 21, 0.55, 2, 0.11
;;;;
run;
filename ft15f001 "%sysfunc(pathname(work))\ghi.csv";
parmcards;
id, day, v1, v2, v3, v4
1, 1, VA, 100, 77, 0.23
2, 1, VA, 120, 44, 0.11
3, 1, VA, 135, 23, 0.45
;;;;
run;
filename ft15f001 "%sysfunc(pathname(work))\jkl.csv";
parmcards;
;;;;
run;

filename ft15f001 "%sysfunc(pathname(work))\xyz.csv";
parmcards;
v2, v3, x, day, id
100, 23, M, 3, 1
120, 32, D, 3, 2
110, 43, D, 3, 3
;;;;
run;

data Fields;
length command filevar $128;
command = catx(' ','dir /s /b',quote(catx('\',pathname('WORK'),'*.csv')));
do while(not eof);
infile dummy1 pipe filevar=command end=eof;
input;
filevar = _infile_;
do while(1);
infile dummy2 filevar=filevar obs=1 truncover eof=eof;
input fields $128.;
output;
end;
eof:
end;
stop;
run;
proc sort data=work.fields nodupkey;
by fields;
run;

filename FT57F001 temp;
data _null_;
file FT57F001;
put +6 'select(fields);';
do until(eof);
set fields end=eof;
input = translate(fields,' ',',');
put +9 'When(' fields:$quote130. ') input ' input ';';
end;
put +9 'Otherwise;';
put +9 'End;';
stop;
run;

data all;
length command filevar filename $128;
command = catx(' ','dir /s /b',quote(catx('\',pathname('WORK'),'*.csv')));
length v1 x $8;
do while(not eof);
infile dummy1 pipe filevar=command end=eof;
input;
filevar = _infile_;
filename = scan(_infile_,-1,'\');
do while(1);
infile dummy2 filevar=filevar truncover eof=eof dsd;
input fields $128.;
do while(1);
%inc FT57F001;
output;
end;
end;
eof:
end;
stop;
run;

proc print;
run;
[/pre]
N/A
Posts: 0

Re: reading datafiles with different variables

Posted in reply to deleted_user
Thank you guys for providing alternative (but complicated) solutions to my problem!!
Ask a Question
Discussion stats
  • 7 replies
  • 119 views
  • 0 likes
  • 4 in conversation