BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi All,

Is there a way to read and vertically stack several datafiles that have some common variables and some variables that are different? This is without using several data statements and input statements or sas macros. Any ideas!! Thank you.
7 REPLIES 7
Doc_Duke
Rhodochrosite | Level 12
Just use the SET statement in a DATA STEP. It will keep the common variable names aligned in the resultant dataset.
deleted_user
Not applicable
Thanks Doc@Duke for the response. The datafiles that I have are in .csv format in excel. Therefore, before I can use a SET statement in a data step, I'll need to import the datafiles into SAS. For example, if I have 15 datafiles with different variable structure in .csv format, I need to write 15 DATA steps and 15 input statements for all those datafiles first, and then use the SET statement in another datastep that will align the common variables and give missing values for a variable not present. I was wondering if there is some way to automate (ex. do loops or similar) the process of reading the datafiles with non-uniform variables from excel into SAS and vertically stack them up.
data_null__
Jade | Level 19
You can read any number of files in a single data step. There are two INFILE statement options that facilitate that activity. FILENAME and FILEVAR. When the value of FILEVAR change SAS will close the current infile and open the file specified in the FILEVAR. FILENAME is a variable that contains the name of the file being read. Usually FILEVAR is used when you have a list of from a meta data source, and FILENAME is used when you are reading from a concatenated FILEREF or with wildcards.

If the files require different INPUT statements that can be accommodated with IF statements that test value returned from FILENAME or perhaps you can "dip" into each file to determine the way it should be read.

With only 15 files and an unknown(to me) number of structures you may be just as well off using 15 PROC IMPORTS.

Post some small sample data files and I will show you the code.
deleted_user
Not applicable
Suppose I have 3 example data sets in .csv format:

1) abc.csv

id, day, v1, v2, v3, v4
1, 1, VA, 100, 77, 0.23
2, 1, VA, 120, 44, 0.11
3, 1, VA, 135, 23, 0.45
..........;

2) def.csv

v1, x, id, v2, v3, v5, day, v4
VA, S, 1, 120, 35, 0.22, 2, 0.44
VA, S, 2, 110, 23, 0.01, 2, 0.33
VA, M, 3, 30, 21, 0.55, 2, 0.11
.......................;

3)xyz.csv

v2, v3, x, day, id
100, 23, M, 3, 1
120, 32, D, 3, 2
110, 43, D, 3, 3
....;

Can I stack them up without using 3 DATA/input statements? Thanks.
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Nearly anything is possible with SAS (and the requisite SAS programming experience to support said application).

I have an application that uses the first row to determine the layout of the remaining rows in the CSV-format file, based on a common set of INFORMAT and LENGTH statements for the data-source referenced.

Explore using a DATA step to read the first row as a character string and use this information to generate your INPUT statement. Write your SAS-generated SAS code to a temporary file and then use %INCLUDE to invoke that code (your INPUT statement) in a subsequent DATA step.

The code below provides an oversimplified example using instream "cooked" data.



Scott Barry
SBBWorks, Inc.

filename sasdata temp;
* generate some data to read. ;
data _null_;
input;
file sasdata;
put _infile_;
datalines;
"a",b,c,d,e
a a,1,2,3,4
a,,2,3,4
b,,4,3,
run;
filename sastemp temp;
* read the header row to get var list. ;
data _null_;
infile sasdata ;
input ;
length varlist $500;
varlist = compbl(translate(_infile_,' ',',"'));
file sastemp;
put 'input ' varlist ';' ;
putlog '>info> echo input list: ' varlist=;
stop;
run;
* now read the data rows only, based on the header row as var list. ;
data _null_;
* standardized SAS variable attributes follow. ;
length a $3 b c d e 5;
informat a $char3.;
* ;
infile sasdata dlm=',' dsd missover firstobs=2;
%include sastemp;
putlog '>info> diag info: ' / _all_;
run;
data_null__
Jade | Level 19
This is similar to the code posted by SBB, applied to your data example data. There is one important meta data missing "variable type". If your data included that information it would be more or less completly dynamic. That is the kind if think that PROC IMPORT does.

Anyway here if the code. I added a few files to further test the program for files with zero records.

[pre]
dm 'clear log; clear output;';
filename ft15f001 "%sysfunc(pathname(work))\abc.csv";
parmcards;
id, day, v1, v2, v3, v4
1, 1, VA, 100, 77, 0.23
2, 1, VA, 120, 44, 0.11
3, 1, VA, 135, 23, 0.45
;;;;
run;

filename ft15f001 "%sysfunc(pathname(work))\def.csv";
parmcards;
v1, x, id, v2, v3, v5, day, v4
VA, S, 1, 120, 35, 0.22, 2, 0.44
VA, S, 2, 110, 23, 0.01, 2, 0.33
VA, M, 3, 30, 21, 0.55, 2, 0.11
;;;;
run;
filename ft15f001 "%sysfunc(pathname(work))\ghi.csv";
parmcards;
id, day, v1, v2, v3, v4
1, 1, VA, 100, 77, 0.23
2, 1, VA, 120, 44, 0.11
3, 1, VA, 135, 23, 0.45
;;;;
run;
filename ft15f001 "%sysfunc(pathname(work))\jkl.csv";
parmcards;
;;;;
run;

filename ft15f001 "%sysfunc(pathname(work))\xyz.csv";
parmcards;
v2, v3, x, day, id
100, 23, M, 3, 1
120, 32, D, 3, 2
110, 43, D, 3, 3
;;;;
run;

data Fields;
length command filevar $128;
command = catx(' ','dir /s /b',quote(catx('\',pathname('WORK'),'*.csv')));
do while(not eof);
infile dummy1 pipe filevar=command end=eof;
input;
filevar = _infile_;
do while(1);
infile dummy2 filevar=filevar obs=1 truncover eof=eof;
input fields $128.;
output;
end;
eof:
end;
stop;
run;
proc sort data=work.fields nodupkey;
by fields;
run;

filename FT57F001 temp;
data _null_;
file FT57F001;
put +6 'select(fields);';
do until(eof);
set fields end=eof;
input = translate(fields,' ',',');
put +9 'When(' fields:$quote130. ') input ' input ';';
end;
put +9 'Otherwise;';
put +9 'End;';
stop;
run;

data all;
length command filevar filename $128;
command = catx(' ','dir /s /b',quote(catx('\',pathname('WORK'),'*.csv')));
length v1 x $8;
do while(not eof);
infile dummy1 pipe filevar=command end=eof;
input;
filevar = _infile_;
filename = scan(_infile_,-1,'\');
do while(1);
infile dummy2 filevar=filevar truncover eof=eof dsd;
input fields $128.;
do while(1);
%inc FT57F001;
output;
end;
end;
eof:
end;
stop;
run;

proc print;
run;
[/pre]
deleted_user
Not applicable
Thank you guys for providing alternative (but complicated) solutions to my problem!!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 768 views
  • 0 likes
  • 4 in conversation