I have excel files with single observations (one row + the row with the column titles) come in. The excels all have the same number of columns and variable names - i.e. exact same structure.
Each excel's file name is an ID code.
After importing said excels, I will stack them and build a database. The problem is that the ID code is not contained within the excel itself...it only appears in the file name.
I would like to extract the ID code from the file title and add it as a variable entry.
Let say I have an excel that is called 20210123. I would like to grab this number and enter it into the database as the IDCode variable.
How would you go about coding this?
1. Get list of excel files
Now for each file:
2. Import Excel file
3. Add column to Excel file with file name
4. Append to master table
Or
1. Get list of Excel files
2. Import each Excel file and add name (2/3)
3. Append all data sets together.
First two macros here can get you started.
@K_S wrote:
I have excel files with single observations (one row + the row with the column titles) come in. The excels all have the same number of columns and variable names - i.e. exact same structure.
Each excel's file name is an ID code.
After importing said excels, I will stack them and build a database. The problem is that the ID code is not contained within the excel itself...it only appears in the file name.
I would like to extract the ID code from the file title and add it as a variable entry.
Let say I have an excel that is called 20210123. I would like to grab this number and enter it into the database as the IDCode variable.
How would you go about coding this?
is there a way to use the name of the file in the set statement as a variable entry
new_var=name of datasetname in set statement..
Thanks for the documentation you sent. A bit too advanced for me so it is pretty intimidating.
Then go through these ones first.
Yes, it's not a particularly difficult task you're trying to do but it automating something which is a bit of a different skill set.
You could also search on here, I know this question has been asked and answered many times. It depends on if you're trying to solve a problem or if you care if you understand how you solve it to be able to reuse the code in the future.
UCLA introductory tutorial on macro variables and macros
https://stats.idre.ucla.edu/sas/seminars/sas-macros-introduction/
Tutorial on converting a working program to a macro
This method is pretty robust and helps prevent errors and makes it much easier to debug your code. Obviously biased, because I wrote it 🙂 https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md
Examples of common macro usage
https://communities.sas.com/t5/SAS-Communities-Library/SAS-9-4-Macro-Language-Reference-Has-a-New-Ap...
Do you want to gel something like this:
/* test data */
%let folder = %sysfunc(pathname(WORK))/test;
options dlcreatedir;
libname test "&folder.";
%macro genData();
%do i = 1 %to 5;
proc export
data = sashelp.class(firstobs=&i. obs = &i.)
outfile = "&folder./&i..xlsx"
dbms=xlsx replace
;
run;
%end;
%mend;
%genData()
/* get Excels names */
data listOfFiles;
base = "&folder.";
length file $ 256;
folderRef = "_%sysfunc(datetime(), hex6.)0";
rc = filename(folderRef, base);
folderid = dopen(folderRef);
do i=1 to dnum(folderId); drop i;
file = dread(folderId, i);
if upcase(scan(file, -1, ".")) = "XLSX" then output;
end;
rc = filename(folderRef);
keep file;
run;
proc sort data = listOfFiles;
by file;
run;
proc print data = listOfFiles;
run;
?
Bart
What types of files are they? Modern XLSX files? Older XLS files?
Or are they not Excel files at all. Perhaps they are instead CSV files and it is just that your computer has been instructed to open them in Excel by default if not told to do something different.
If they are CSV files then it is MUCH easier.
So let's think about how to deal with a set of actual Excel files, either XLS or XLSX. One way to make a consistent data structure might be to first convert the files into a single CSV file and then read that.
First get a list of the files. Easiest way is to use your operating system command for listing files.
data files;
infile 'ls /my-directory/*.xlsx' pipe truncover ;
input filename $256.;
run;
Now that you have the list of files you can use it to generate code to import each file and write its line(s) of data to a CSV file.
So first generate the code.
filename code temp;
data _null_;
set files ;
file code;
length idcode $30 ;
idcode = scan(filename,-2,'./\');
put 'proc import dbms=xlsx out=next replace datafile=' filename :$quote. ';'
/ 'run;'
/ 'data _null_;'
/ ' idcode=' idcode :$quote. ';'
/ ' file csv dsd mod;'
/ ' set next;'
/ ' put (_all_) (+0);'
/ 'run;'
;
run;
Then point CSV at the file you want to use to store the new text file with all of the data and run the generated code.
filename csv temp;
%include code / source2;
Now that you have all of the data in the a single CSV file you can read in the data with a data step. That way you can control how each of the variables is defined.
data want;
infile csv dsd truncover ;
* Define the length and order of all of your variables ;
* remember numeric variables use length 8 because they are 64 bit floating point values;
length idcode $30 ..... all of your other variables .... ;
input (_all_) (+0);
* add any FORMAT or INFORMAT statements you might need ;
* add any LABEL statements you might need ;
run;
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.