- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi! Using SAS 9.4 here. I have large data sets from a recent experiment: my subjects have ~18 hours of minute-to-minute measurements summarized in excel spreadsheets. 36 subjects and 4 treatments. I came across this nice macro in the user guide that allows me to read all the data into SAS:
%macro drive(dir,ext);
%local cnt filrf rc did memcnt name;
%let cnt=0;
%let filrf=mydir;
%let rc=%sysfunc(filename(filrf,&dir));
%let did=%sysfunc(dopen(&filrf));
%if &did ne 0 %then %do;
%let memcnt=%sysfunc(dnum(&did));
%do i=1 %to &memcnt;
%let name=%qscan(%qsysfunc(dread(&did,&i)),-1,.);
%if %qupcase(%qsysfunc(dread(&did,&i))) ne %qupcase(&name) %then %do;
%if %superq(ext) = %superq(name) %then %do;
%let cnt=%eval(&cnt+1);
%put %qsysfunc(dread(&did,&i));
proc import datafile="&dir\%qsysfunc(dread(&did,&i))" out=dsn&cnt (drop = date
time_elapsed)
dbms=xlsx replace;
run;
%end;
%end;
%end;
%end;
%else %put &dir cannot be open.;
%let rc=%sysfunc(dclose(&did));
%mend drive;
%drive(C:\Users\mydirectory\,xlsx)
the problem is it renames all the spreadsheets that it reads into SAS - in proc import, the output for each spreadsheet is assigned the name "dsn&cnt" - Will be confusing moving forward to lose my unique identifier. I was hoping someone would know how to rewrite this code so that it keeps the name of the original filename that was read into SAS.
Further, I've figured out how to drop the variables I don't want in the final SAS spreadsheet, but I'm still figuring out how to merge all 144 files into one spreadsheet, while somehow creating a subject and treatment ID for each merged file worth of data so I can begin performing summary statistics.
Any help would be greatly appreciated!
Thanks,
Patrick
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try something like this. Untested.
I added logic to split the filename into BASE_NAME and NAME_EXT parts. I hard coded the test to just look for XLSX files.
I added input parameter for the name of the dataset you want to create.
It will read in each sheet and add a variable SUBJECT to hold the name of the xlsx file. Then it will use PROC APPEND to make one output dataset. I remove the DROP= option you had. You could add that back if you wanted . Personally I would use a normal DROP statement instead of the DROP= dataset option.
I added a PROC FREQ to show how many records were loaded from each sheet.
%macro drive(dir,out);
%local cnt filrf rc did name name_ext base_name;
%let cnt=0;
%let filrf=mydir;
%let rc=%sysfunc(filename(filrf,&dir\));
%let did=%sysfunc(dopen(&filrf));
%if &did ne 0 %then %do;
%if %sysfunc(exist(&out)) %then %do;
proc delete data=&out; run;
%end;
%do i=1 %to %sysfunc(dnum(&did));
%let name=%qsysfunc(dread(&did,&i));
%if %index(&name,.) > 1 %then %do;
%let name_ext=%qupcase(%qscan(&name,-1,.));
%let base_name=%qsubstr(&name,1,%index(&name,.)-1);
%end;
%else %do;
%let name_ext=;
%let base_name=&name ;
%end;
%if XLSX = &name_ext %then %do;
%let cnt=%eval(&cnt+1);
libname in xlsx "&dir\&name" ;
data for_loading;
length subject $32 ;
subject="&base_name";
set in.sheet1 ;
run;
proc append base=&out data=for_loading force;
run;
%end;
%end;
%put NOTE: Read &cnt worksheets from &dir into &out.. ;
proc freq data=&out;
tables subject ;
run;
%let rc=%sysfunc(dclose(&did));
%end;
%else %put ERROR: &dir cannot be opened.;
%let rc=%sysfunc(filename(filrf));
%mend drive;
%drive(dir=C:\Users\mydirectory,out=mydata)
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You need to provide more information to get good help.
Do all of the Excel files have just one sheet? Or do they have multiple sheets?
Do all of the Excel sheets have the same variables (columns)?
Is all of the data in contents of the sheets or is some of it hiding in the filename, sheetname or column name? For example is the subject id and date exist as columns in the sheet like this:
Subject Date Var1 Var2 Var3 101 2018/09/20 1 2 3
Or do they only exist in the filename or sheetname?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tom, thanks for the speedy reply:
Do all of the Excel files have just one sheet? Or do they have multiple sheets?
-Just one sheet per file.
Do all of the Excel sheets have the same variables (columns)?
-Yes, the columns are identical. However, the number of rows are different between subject files, accounting for differences in time (minute to minute measures varied by a few hours; somewhere between 900 to 1200 rows of data). The macro didn't appear to have a problem with this difference.
Layout of output
Is all of the data in contents of the sheets or is some of it hiding in the filename, sheetname or column name?
-subject ID is the filename (unique run number, to be more specific). Would be nice to add a column for subject ID if there's a quick way to do that, gleaning from file name.
Hope this helps hone in on my problem!
Patrick
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do all of them name the sheet as SHEET1, like in your photograph?
Can you use the XSLX libname engine to read one of the files? If so it is generally easier to use than PROC IMPORT. But your column headers might cause trouble. Try it on one of the files.
libname in xlsx 'filename.xlsx';
data test;
set in.sheet1;
run;
Your example macro is a little more complex than it needs to be. For example why does it keep calling DREAD() over and over again instead of just remembering the value? It should be possible to pull out the filename and use that for the name of the dataset. As long as the filenames are valid SAS names. 32 characters or less. Contain letters, digits and underscores, Starts with letter or underscore.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Tom,
Do all of them name the sheet as SHEET1, like in your photograph?
-yes, they all say sheet1. This was output recorded from a matlab program into an excel spreadsheet. Only difference across the files will be the length (rows) of the data.
Can you use the XSLX libname engine to read one of the files?
-Yes, that worked just fine. Do you have a do loop in mind?
Regarding your macro questions, you'll have to take that up with someone more enlightened!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try something like this. Untested.
I added logic to split the filename into BASE_NAME and NAME_EXT parts. I hard coded the test to just look for XLSX files.
I added input parameter for the name of the dataset you want to create.
It will read in each sheet and add a variable SUBJECT to hold the name of the xlsx file. Then it will use PROC APPEND to make one output dataset. I remove the DROP= option you had. You could add that back if you wanted . Personally I would use a normal DROP statement instead of the DROP= dataset option.
I added a PROC FREQ to show how many records were loaded from each sheet.
%macro drive(dir,out);
%local cnt filrf rc did name name_ext base_name;
%let cnt=0;
%let filrf=mydir;
%let rc=%sysfunc(filename(filrf,&dir\));
%let did=%sysfunc(dopen(&filrf));
%if &did ne 0 %then %do;
%if %sysfunc(exist(&out)) %then %do;
proc delete data=&out; run;
%end;
%do i=1 %to %sysfunc(dnum(&did));
%let name=%qsysfunc(dread(&did,&i));
%if %index(&name,.) > 1 %then %do;
%let name_ext=%qupcase(%qscan(&name,-1,.));
%let base_name=%qsubstr(&name,1,%index(&name,.)-1);
%end;
%else %do;
%let name_ext=;
%let base_name=&name ;
%end;
%if XLSX = &name_ext %then %do;
%let cnt=%eval(&cnt+1);
libname in xlsx "&dir\&name" ;
data for_loading;
length subject $32 ;
subject="&base_name";
set in.sheet1 ;
run;
proc append base=&out data=for_loading force;
run;
%end;
%end;
%put NOTE: Read &cnt worksheets from &dir into &out.. ;
proc freq data=&out;
tables subject ;
run;
%let rc=%sysfunc(dclose(&did));
%end;
%else %put ERROR: &dir cannot be opened.;
%let rc=%sysfunc(filename(filrf));
%mend drive;
%drive(dir=C:\Users\mydirectory,out=mydata)
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Tom,
It worked so well I'm borderline emotional (a biologist writing SAS code is a tormented creature).
As mentioned, each run had a varying length of time, and it will make life easier to clip off the ends so everyone has uniform data (for example, only moving forward with 5:00 PM to 9:48 AM data) - truncated to the shortest dataset, instead of extrapolating into the unknown. Is there a way to add this to your macro?
Since we've moved away from proc import, how do I drop unwanted columns? For example, I don't wan't to include Date, time elapsed, etc. There's really only 4 columns of data I want to move forward with.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you want to subset the data then just modify the data step.
data for_loading;
length subject $32 ;
subject="&base_name";
set in.sheet1 ;
run;
You could add DROP and/or KEEP statement to tell it which variables to keep.
You could add an IF or WHERE statement to tell it which observations to keep.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Tom,
Thank you for being so kind and patient. I am extremely grateful!
All the best,
Patrick