Is there a way to read and vertically stack several datafiles that have some common variables and some variables that are different? This is without using several data statements and input statements or sas macros. Any ideas!! Thank you.
Thanks Doc@Duke for the response. The datafiles that I have are in .csv format in excel. Therefore, before I can use a SET statement in a data step, I'll need to import the datafiles into SAS. For example, if I have 15 datafiles with different variable structure in .csv format, I need to write 15 DATA steps and 15 input statements for all those datafiles first, and then use the SET statement in another datastep that will align the common variables and give missing values for a variable not present. I was wondering if there is some way to automate (ex. do loops or similar) the process of reading the datafiles with non-uniform variables from excel into SAS and vertically stack them up.
You can read any number of files in a single data step. There are two INFILE statement options that facilitate that activity. FILENAME and FILEVAR. When the value of FILEVAR change SAS will close the current infile and open the file specified in the FILEVAR. FILENAME is a variable that contains the name of the file being read. Usually FILEVAR is used when you have a list of from a meta data source, and FILENAME is used when you are reading from a concatenated FILEREF or with wildcards.
If the files require different INPUT statements that can be accommodated with IF statements that test value returned from FILENAME or perhaps you can "dip" into each file to determine the way it should be read.
With only 15 files and an unknown(to me) number of structures you may be just as well off using 15 PROC IMPORTS.
Post some small sample data files and I will show you the code.
Nearly anything is possible with SAS (and the requisite SAS programming experience to support said application).
I have an application that uses the first row to determine the layout of the remaining rows in the CSV-format file, based on a common set of INFORMAT and LENGTH statements for the data-source referenced.
Explore using a DATA step to read the first row as a character string and use this information to generate your INPUT statement. Write your SAS-generated SAS code to a temporary file and then use %INCLUDE to invoke that code (your INPUT statement) in a subsequent DATA step.
The code below provides an oversimplified example using instream "cooked" data.
filename sasdata temp;
* generate some data to read. ;
filename sastemp temp;
* read the header row to get var list. ;
infile sasdata ;
length varlist $500;
varlist = compbl(translate(_infile_,' ',',"'));
put 'input ' varlist ';' ;
putlog '>info> echo input list: ' varlist=;
* now read the data rows only, based on the header row as var list. ;
* standardized SAS variable attributes follow. ;
length a $3 b c d e 5;
informat a $char3.;
infile sasdata dlm=',' dsd missover firstobs=2;
putlog '>info> diag info: ' / _all_;
This is similar to the code posted by SBB, applied to your data example data. There is one important meta data missing "variable type". If your data included that information it would be more or less completly dynamic. That is the kind if think that PROC IMPORT does.
Anyway here if the code. I added a few files to further test the program for files with zero records.