BookmarkSubscribeRSS Feed
jmono
Calcite | Level 5

I've been using SAS for only 2 years and my next task is rather a complicated one.

 

Summary of the task: I need to export two columns from multiple excel files (xlsx; ~120 files) into one dataset. Each excel file consititute a record or a case. When imported, I would like for the columns from each record to be a row so that in the new dataset, I'll have each record on a row. I understand that for the latter step, I'll need to transpose.

 

Steps taken: I know how to import one entire excel file into a SAS dataset. I've looked at articles but they only discussed importing select columns or rows from one excel file -- NOT from multiple files. 

 

What I think needs to be done

  1. Write a code to reference the folder where the excel files are stored.
  2. Write a code to extract the two columns of interest from the first excel file and store into a new dataset.
  3. Then reiterate this code to continue extracting from the next files until it gets to the last file.
  4. Make sure to flag those files that have been imported.
  5. The new dataset will now contain the following columns: ID-1, Code-1, ID-2, Code-2, ID-3, Code-3, ... Therefore, next step is to transpose.

What I'm avoiding (marked with *):

  1. Extract both columns of interest from each excel file into a dataset. This will create ~120 datasets*.
  2. Transpose each of the new datasets. 
  3. Merge all new datasets. 
  4. Delete the ~120 datasets.

Any coding tips will be helpful, and necessary tweaks to my thinking processes will be appreciated. 

 

I'm currently using SAS Studio at home, the free version, which makes things a little bit more challenging. 

 

Thanks!

6 REPLIES 6
Reeza
Super User

Essentially you're looking for a macro.

 

For importing the data from Excel, are the values always in the same location? You can specify the range in Proc Import. Do you have a list of the files or do you need to generate it?

 

Get it working for one file and then move on. I disagree with your last set of arguments, its probably easier to import each file, transpose and append to the final file. Make sure to clean up between loops. 

 

Here's a rough sketch to get you started. Once you get it working, wrap it in %macro/%mend and then you can use CALL EXECUTE to iterate over all your files. 

 

 

%let file2imp=myfile.xlsx;

proc import out=temp1 datafile="/folders/myfolders/&file2imp" dbms=excel replace; range='Sheet1$A4:G4'; run;

*Transpose here;
proc transpose data=temp1 out=temp2;
by id;
run;

*Append to table;
proc append base=want data=temp2;
run;

*clean up;
proc sql;
drop table temp1, temp2;
quit;

*repeat;

 

DCMarks25
Calcite | Level 5

Can you please explian how this code would be used for the same cells in multiple spreadsheets with data in the same location in each?

Thank you!

ballardw
Super User

To make sure that we know exactly what you refer to you can get a copy of the post by clicking on the left double quote " that appears in the icons above the message window. So you would get something like

 


@DCMarks25 wrote:

Can you please explian how this code would be used for the same cells in multiple spreadsheets with data in the same location in each?

Thank you!


Then you can edit the quoted text to remove parts you aren't asking about or even highlight selected text.

 

If your question is about "multiple spreadsheets" in different file you need to import each one into a separate SAS data set. If "multiple spreadsheets" means different sheets in one file best would possibly be to use a LIBNAME statement to reference the file which would look like

libname myref xlsx "<path>\file.xlsx";

which may let you reference each sheet in the library as a data set. However, because of the way so many people work with spreadsheets this may not reference "same location" in a meaningful way as SAS uses VARIABLES which would be set by the first row value for each column(the variables).

 

If you question is more about the "same location in each" that is the bit around the RANGE statement in the Proc Import code.

range='Sheet1$A4:G4';

 Sheet1 would be the name the sheet or tab using default names. If there is something assigned it gets more fun. the A4:G4 describes a rectangular area the letters are columns the numbers are rows, to import.

DCMarks25
Calcite | Level 5
Hello,
Thank you for your response. I did find the import code to import by cell ranges. I worked nicely, but I have 45 identical workbooks in the same file folder and I need the same 12 cells from each. You are right the sheets are not set up with column headers alone, there are a few rows of information at the top of the sheet that aren't data. I haven't SASsed in a while and I am having a hard time thinking this one through.
Thank you for your time,
Dominique
Tom
Super User Tom
Super User

@DCMarks25 wrote:
Hello,
Thank you for your response. I did find the import code to import by cell ranges. I worked nicely, but I have 45 identical workbooks in the same file folder and I need the same 12 cells from each. You are right the sheets are not set up with column headers alone, there are a few rows of information at the top of the sheet that aren't data. I haven't SASsed in a while and I am having a hard time thinking this one through.
Thank you for your time,
Dominique

Take the code that works to read from one workbook.  Now change it to work for another.  What part of the code did you have to change?  Replace those part(s) with a references to macro variable(s).  Now set values to the macro variable(s) and try the code again.

Once that works define a macro using those macro variable(s) as the parameters to the macro.   Test it again.

Now take your list of filenames and use it to generate one call to the macro for each file.

 

Something like:

%macro readone(filename);
* Read range from worksheet ;
proc import dbms=xlsx file="&filename" out=from_xlsx replace;
  range='A5:B16' ;
  getnames=no;
run;
* Standardize dataset structure, add filename ;
data from_xlsx ;
  length filename $200 A $20 B $8 ;
  set from_xlsx;
  filename="&filename";
  rename a=id b=code ;
  format _all_;
run;
* Append to aggregate dataset ;
proc append data=from_xlsx base=all_results force;
run;
%mend readone;
* Get list of files ;
%dirtree(~/mydir);
* Read all of the XLSX files ;
data _null_;
  set dirtree ;
  where scan(filename,-1,'.')='xlsx';
  call execute(cats('%nrstr(%readone)(',catx('/',dname,filename),')'));
run;
Tom
Super User Tom
Super User

Write a code to reference the folder where the excel files are stored.

To get the list of files you can use the DOPEN() and DREAD() functions.  Or take advantage of some existing code that does that such as

 https://github.com/sasutils/macros/blob/master/dirtree.sas

 

Write a code to extract the two columns of interest from the first excel file and store into a new dataset.

That sounds very strange.  What do the excel files look like?  Normally when you IMPORT data you import the whole worksheet.  You can always ignore the other variables if you don't need them. 

 

Are they really EXCEL files?  It is much easier if they are instead CSV files, which your PC might think of as being owned by Excel but are really just plain text files.  Since you can write your own data step to read from a text file.

 

If they are Excel files (XLSX files) then each file could have multiple worksheets in it.  Do your files have multiple worksheets?  Do the worksheets always have the same name(s)?

 

Then reiterate this code to continue extracting from the next files until it gets to the last file.

So figure out how to do one file.  Then generate that same type of code for each file.  This can be made easier by creating a macro that processes one file.  Then use the list of files to generate one call to the macro for each file.  That way if the logic for how to process a file changes you can just update the macro. 

 

Make sure to flag those files that have been imported.

If you just process the files in your list there is no need to "keep track of which files have been read".   But if you expect to need to re-do this process again in the future when there may be some NEW files that need to be processed then save the list into a SAS dataset.  So that next time you want to look for and process new files you can exclude the files you processed already.

 

The new dataset will now contain the following columns: ID-1, Code-1, ID-2, Code-2, ID-3, Code-3, ... Therefore, next step is to transpose.

I can perhaps guess what you mean by this.  For example you might mean that the spreadsheets have two variables named ID and CODE and your want to somehow convert it from that useful organizational structure of having each ID/CODE pair on its own observation into some type of wide structure with many ID variables and many  CODE variables.  Why would that help anything at all?

 

I would expect it would be better to instead just add another variable (or more) that can be used to tie those observations together.  So perhaps the name of the file that they came from.  Or perhaps a value like a DATE or COUNTRY or DISEASE that is derived from the filename.

 

Extract both columns of interest from each excel file into a dataset. This will create ~120 datasets*.

Not really.  More like it will re-create the same dataset 120 times.  Each new one replacing the old one.

 

Transpose each of the new datasets.

Don't do this.  Or if you need to do it wait until you have all of the data so you know how many variables your will need.  I suspect each spreadsheet will have a different number of observations, hence a different number of variables when transposed.

 

Merge all new datasets.

More like keep appending each new dataset as it is created.

 

Delete the ~120 datasets.

Probably not really needed. First if you process them one by one there is only one temporary dataset to be deleted.  But also SAS work datasets are already deleted as soon as your SAS session ends.

 

So your coding process should be:

 

  • Locate one or two example EXCEL files.
  • Use PROC IMPORT to (or use LIBNAME with XLSX engine) to convert them into SAS datasets.
  • Figure out what variables they have.  Are the ID and CODE variables really in the spreadsheet?  Or are the spreadsheets some complex layoff that poorly constructed for being used as a data source?
  • Figure how to convert the INCONSISTENT mess you will get by using PROC IMPORT to GUESS what variable names, variable types and lengths for characters into a consistent structure that you can append.

Once you have that mainly sorted try wrapping that into a macro that takes as input the name of the file to read and possibly other things like the sheet within the file to read and the name of the dataset to create.  Test that to see if it works for the couple of files you explored already.  Then try it on a couple more.

 

Once the macro is mainly working you can add a PROC APPEND step to aggregate the new file into a consolidated file.  This is where making sure the variable names and types are consistent is very important so that each new dataset created from each new file has the same variables so they can be combined.

 

Now look into the process of getting the list of files.  How automatic does it need to be?  Is the list static?  Then just use normal operating system commands like DIR (or ls on unix) to make a text list of names and paste it into your program.  If the list will vary then look into using a tool like the %DIRTREE() macro to build a dataset with the list of filenames.

 

Once you have the list of filenames you can use it to generate one call to the macro for each file.

 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2626 views
  • 1 like
  • 5 in conversation