BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DrSolverson
Calcite | Level 5

Hi! Using SAS 9.4 here. I have large data sets from a recent experiment: my subjects have ~18 hours of minute-to-minute measurements summarized in excel spreadsheets. 36 subjects and 4 treatments. I came across this nice macro in the user guide that allows me to read all the data into SAS:

 

 

%macro drive(dir,ext); 
   %local cnt filrf rc did memcnt name; 
   %let cnt=0;          

   %let filrf=mydir;    
   %let rc=%sysfunc(filename(filrf,&dir)); 
   %let did=%sysfunc(dopen(&filrf));
    %if &did ne 0 %then %do;   
   %let memcnt=%sysfunc(dnum(&did));    

    %do i=1 %to &memcnt;              
                       
      %let name=%qscan(%qsysfunc(dread(&did,&i)),-1,.);                    
                    
      %if %qupcase(%qsysfunc(dread(&did,&i))) ne %qupcase(&name) %then %do;
       %if %superq(ext) = %superq(name) %then %do;                         
          %let cnt=%eval(&cnt+1);       
          %put %qsysfunc(dread(&did,&i));  
          proc import datafile="&dir\%qsysfunc(dread(&did,&i))" out=dsn&cnt  (drop = date
		  time_elapsed)
           dbms=xlsx replace;            
          run;          
       %end; 
      %end;  

    %end;
      %end;
  %else %put &dir cannot be open.;
  %let rc=%sysfunc(dclose(&did));      
             
 %mend drive;
 
%drive(C:\Users\mydirectory\,xlsx) 

 

the problem is it renames all the spreadsheets that it reads into SAS - in proc import, the output for each spreadsheet is assigned the name "dsn&cnt" - Will be confusing moving forward to lose my unique identifier. I was hoping someone would know how to rewrite this code so that it keeps the name of the original filename that was read into SAS. 

 

Further, I've figured out how to drop the variables I don't want in the final SAS spreadsheet, but I'm still figuring out how to merge all 144 files into one spreadsheet, while somehow creating a subject and treatment ID for each merged file worth of data so I can begin performing summary statistics.  

 

Any help would be greatly appreciated!

Thanks,

Patrick

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Try something like this.  Untested.

I added logic to split the filename into BASE_NAME and NAME_EXT parts.  I hard coded the test to just look for XLSX files.

I added input parameter for the name of the dataset you want to create.

It will read in each sheet and add a variable SUBJECT to hold the name of the xlsx file. Then it will use PROC APPEND to make one output dataset.  I remove the DROP= option you had. You could add that back if you wanted .  Personally I would use a normal DROP statement instead of the DROP= dataset option.

I added a PROC FREQ to show how many records were loaded from each sheet.

%macro drive(dir,out);
%local cnt filrf rc did name name_ext base_name;
%let cnt=0;

%let filrf=mydir;
%let rc=%sysfunc(filename(filrf,&dir\));
%let did=%sysfunc(dopen(&filrf));
%if &did ne 0 %then %do;
  %if %sysfunc(exist(&out)) %then %do;
    proc delete data=&out; run;
  %end;
  %do i=1 %to %sysfunc(dnum(&did));
    %let name=%qsysfunc(dread(&did,&i));
    %if %index(&name,.) > 1 %then %do;
      %let name_ext=%qupcase(%qscan(&name,-1,.));
      %let base_name=%qsubstr(&name,1,%index(&name,.)-1);
    %end;
    %else %do;
      %let name_ext=;
      %let base_name=&name ;
    %end;
    %if XLSX = &name_ext %then %do;
      %let cnt=%eval(&cnt+1);
      libname in xlsx "&dir\&name" ;
      data for_loading;
        length subject $32 ;
        subject="&base_name";
        set in.sheet1 ;
      run;
      proc append base=&out data=for_loading force;
      run;
    %end;
  %end;
  %put NOTE: Read &cnt worksheets from &dir into &out.. ;
   proc freq data=&out;
      tables subject ;
   run;
  %let rc=%sysfunc(dclose(&did));
%end;
%else %put ERROR: &dir cannot be opened.;
%let rc=%sysfunc(filename(filrf));
%mend drive;

%drive(dir=C:\Users\mydirectory,out=mydata)

 

View solution in original post

8 REPLIES 8
Tom
Super User Tom
Super User

You need to provide more information to get good help.

Do all of the Excel files have just one sheet?  Or do they have multiple sheets?

Do all of the Excel sheets have the same variables (columns)?

Is all of the data in contents of the sheets  or is some of it hiding in the filename, sheetname or column name?  For example is the subject id and date exist as columns in the sheet like this:

 

Subject Date Var1 Var2 Var3
101 2018/09/20 1 2 3

Or do they only exist in the filename or sheetname?

 

DrSolverson
Calcite | Level 5

Hi Tom, thanks for the speedy reply:

Do all of the Excel files have just one sheet?  Or do they have multiple sheets?

-Just one sheet per file.

 

Do all of the Excel sheets have the same variables (columns)?

-Yes, the columns are identical. However, the number of rows are different between subject files, accounting for differences in time (minute to minute measures varied by a few hours; somewhere between 900 to 1200 rows of data). The macro didn't appear to have a problem with this difference.

Layout of outputLayout of output

 

Is all of the data in contents of the sheets  or is some of it hiding in the filename, sheetname or column name?

-subject ID is the filename (unique run number, to be more specific). Would be nice to add a column for subject ID if there's a quick way to do that, gleaning from file name. 

 

Hope this helps hone in on my problem!

Patrick

 

Tom
Super User Tom
Super User

Do all of them name the sheet as SHEET1, like in your photograph?

Can you use the XSLX libname engine to read one of the files?  If so it is generally easier to use than PROC IMPORT.  But your column headers might cause trouble. Try it on one of the files.

libname in xlsx 'filename.xlsx';
data test;
 set in.sheet1;
run;

Your example macro is a little more complex than it needs to be.  For example why does it keep calling DREAD() over and over again instead of just remembering the value?  It should be possible to pull out the filename and use that for the name of the dataset. As long as the filenames are valid SAS names. 32 characters or less. Contain letters, digits and underscores, Starts with letter or underscore.

DrSolverson
Calcite | Level 5

Tom,

Do all of them name the sheet as SHEET1, like in your photograph?

-yes, they all say sheet1. This was output recorded from a matlab program into an excel spreadsheet. Only difference across the files will be the length (rows) of the data. 

 

Can you use the XSLX libname engine to read one of the files?

-Yes, that worked just fine. Do you have a do loop in mind? 

 

xlsx libname engine.png

 

Regarding your macro questions, you'll have to take that up with someone more enlightened!

https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.3&docsetId=mcrolref&docsetTarget=n0c...

Tom
Super User Tom
Super User

Try something like this.  Untested.

I added logic to split the filename into BASE_NAME and NAME_EXT parts.  I hard coded the test to just look for XLSX files.

I added input parameter for the name of the dataset you want to create.

It will read in each sheet and add a variable SUBJECT to hold the name of the xlsx file. Then it will use PROC APPEND to make one output dataset.  I remove the DROP= option you had. You could add that back if you wanted .  Personally I would use a normal DROP statement instead of the DROP= dataset option.

I added a PROC FREQ to show how many records were loaded from each sheet.

%macro drive(dir,out);
%local cnt filrf rc did name name_ext base_name;
%let cnt=0;

%let filrf=mydir;
%let rc=%sysfunc(filename(filrf,&dir\));
%let did=%sysfunc(dopen(&filrf));
%if &did ne 0 %then %do;
  %if %sysfunc(exist(&out)) %then %do;
    proc delete data=&out; run;
  %end;
  %do i=1 %to %sysfunc(dnum(&did));
    %let name=%qsysfunc(dread(&did,&i));
    %if %index(&name,.) > 1 %then %do;
      %let name_ext=%qupcase(%qscan(&name,-1,.));
      %let base_name=%qsubstr(&name,1,%index(&name,.)-1);
    %end;
    %else %do;
      %let name_ext=;
      %let base_name=&name ;
    %end;
    %if XLSX = &name_ext %then %do;
      %let cnt=%eval(&cnt+1);
      libname in xlsx "&dir\&name" ;
      data for_loading;
        length subject $32 ;
        subject="&base_name";
        set in.sheet1 ;
      run;
      proc append base=&out data=for_loading force;
      run;
    %end;
  %end;
  %put NOTE: Read &cnt worksheets from &dir into &out.. ;
   proc freq data=&out;
      tables subject ;
   run;
  %let rc=%sysfunc(dclose(&did));
%end;
%else %put ERROR: &dir cannot be opened.;
%let rc=%sysfunc(filename(filrf));
%mend drive;

%drive(dir=C:\Users\mydirectory,out=mydata)

 

DrSolverson
Calcite | Level 5

Tom,

It worked so well I'm borderline emotional (a biologist writing SAS code is a tormented creature). 

 

As mentioned, each run had a varying length of time, and it will make life easier to clip off the ends so everyone has uniform data (for example, only moving forward with 5:00 PM to 9:48 AM data) - truncated to the shortest dataset, instead of extrapolating into the unknown. Is there a way to add this to your macro?

 

Since we've moved away from proc import, how do I drop unwanted columns? For example, I don't wan't to include Date, time elapsed, etc. There's really only 4 columns of data I want to move forward with. 

Tom
Super User Tom
Super User

If you want to subset the data then just modify the data step.

     data for_loading;
        length subject $32 ;
        subject="&base_name";
        set in.sheet1 ;
      run;

You could add DROP and/or KEEP statement to tell it which variables to keep.

You could add an IF or WHERE statement to tell it which observations to keep.

DrSolverson
Calcite | Level 5

Tom,

Thank you for being so kind and patient. I am extremely grateful! 

All the best,

Patrick

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1306 views
  • 1 like
  • 2 in conversation