BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Satori
Quartz | Level 8
I have about 100 excel files that I would like to join in one file
Ideally I would like to join this files and get one SAS file as output
The files have two sheets each and I just want to get the second sheet of each file
The variables of the sheet I want are the same for all the 100 files
Thanks
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

@Satori wrote:

What I am doing now is saving all the excel files as CSV files. Could you then tell me what code should I use to import this files to SAS and then stach them together and save it as a SAS file.

The file names are Data1, Data2, ..., Data104

Each file has now only one sheet since I deleted the other one to be able to save it as CSV.


Make sure you know the names of the files and that they are visible to SAS UE.  I doubt that a CSV file is name Data1 it is probably named Data1,csv. Perhaps whatever tool you are using to look at filenames is hiding the extension? Windows likes to do that. With SAS UE since it is running in a virtual machine you also need to find the full name for the file as seen by SAS.  You can use the files and folders explored in the left hand panel of the SAS/Studio interface.  If you right click and select properties on one of the files you can see the full filename.

 

Start by trying to read one of the files into a SAS dataset.  You could try using PROC IMPORT but it would be better to write your own data step to read it. Then you can control the variable names and the variables' attributes (type, length, formats attached, label etc) to make the individual dataset compatible. With PROC IMPORT each dataset's variables will be based on the particular data that SAS sees when it reads that file.  This can lead to some variables with different lengths or worse different types (number vs character).

 

Here is how I like to code a program to read a CSV with a data step.  

data data1 ;
  infile "/folders/myfolder/Data1.csv" dsd firstobs=2 truncover ;
  length Var1 8 Var2 $20 ;
  input Var1 Var2 ;
run;

So the first statement starts a data step. The value 'data1' is the name of the dataset to be created.

The second statement tells where to find the CSV file and how to parse it.  The physical name of the CSV file is the value in quotes (remember that SAS UE is running in a Linux virtual machine so file names as case sensitive.) The DSD option tells it that it is delimited file and also set DLM=','. Firstobs=2 tells it to skip the header row.  The TRUNCOVER option tells SAS to treat short lines as meaning the rest of the fields are missing.

The third statement defines your variables. Set all numeric variables to use length of 8 (SAS stores all numbers as floating point).  The $ before the length is what tells SAS that you want to define a character variable.  Set the length long enough to hold the longest string you expect.  Remember that dates and times are numeric variables.

The fourth statement is what actually reads the values from the line.  If you decide to add informat specifications in the INPUT statement remember to prefix them with the colon modifier so that SAS will continue to read the variables in list mode. 

The last statement ends the data step.

 

If you have dates or times then you will also want to add those to an INFORMAT and FORMAT statement. The first will tell SAS what informat to use to read the data and the second will tell it how to format the data so that humans can read it.  For example Excel might have written dates that look like '8-3-18'.  So you need to tell SAS whether to interpret that as MDY or DMY.   Personally I try to always display dates using YMD order to prevent that confusion.  So if you had a date variable named DATE you might add these two lines to your data step.

  informat date mmddyy. ;
  format date yymmdd10.;

Once you have been able to read one file then there are two methods to reading multiple files. One is to modify the data step to read them all into one dataset to start with.  If you can use a single wildcard to list all of the files then that is really easy. You just change the INFILE statement and add a little code to skip the multiple header rows.

data all_data;
  length fname $256;
  infile "/folders/myfolder/Data*.csv" dsd firstobs=1 truncover filename=fname;
  input @;
  if fname ne lag(fname) then input;
  length Var1 8 Var2 $20 ;
  input Var1 Var2 ;
run;

@so the extra lines of code (and modification to the INFILE statement) will create a variable with the name of the current file. The extra INPUT with the trailing @ will read a line but keep the line current for the real input statement below.  This will allow SAS to set the FNAME variable to name of the file it is currently reading. Then IF condition will be true for the first line of each file so that then the conditional INPUT statement will cause SAS to skip that header row.

 

The other method is read each CSV into its own SAS dataset.  You could then add a final data step with a SET statement that lists all of the individual files to combine them.  

View solution in original post

17 REPLIES 17
RW9
Diamond | Level 26 RW9
Diamond | Level 26

It would be simpler to write a vba script to load each file and drop the second sheet's data out to csv, then use SAS to read the csv file.  You can get a list of files and then proc import each one, however Excel is a bad data format, and proc import is guessing procedure so the data you will get out of that will be very likely to be incompatible - i.e. you wouldn't be able to stack it.  Its pretty simple in VBA:
https://stackoverflow.com/questions/10551353/saving-excel-worksheet-to-csv-files-with-filenameworksh...

Just do a loop over that from a list of files in a directory:

https://stackoverflow.com/questions/31414106/get-list-of-excel-files-in-a-folder-using-vba

 

Then write a datastep in SAS to import the CSV file.

Satori
Quartz | Level 8

I had a look at the link you sent and I do not understand what I have to do. I am quite new using SAS and I really don't know how to do what you said. Could you give me an example of how to do it?

Many thanks

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Its not SAS.  It is Visual Basic for Applications.  That is the programming language embedded in the Office Suite.  You code in Excel to load and process the Excel files and write out to plain text comma separated variable file.  Then in SAS you only write a datastep to import the csv file.

As you are new to SAS perhaps a less optimal more long winded approach.  Write one proc import statement in a macro an call it for each file:

%macro Imp (file=,ds=);
  proc import datafile="c:/somewhere/&file..xlsx" out=&ds.;
    sheet="Your Sheet";
  run;
%mend Imp;

%Imp (file=File1,ds=File1);
%Imp (file=File2,ds=File2);
%Imp (file=File3,ds=File3);
...

You will then have all the datasets in work, and can proceed to process them (as they will likely need manipulation) before combining them in one datastep, which if they are all called the same, prefix, could be as simple as:

data want;
  set file:;
run;

Note the colon, that means any dataset with a prefix of file will be added, its a shortcut.

If you are quite new to SAS, have you imported any files?  I would start with one, or two and see how it goes.

Satori
Quartz | Level 8

Actually I'm quite new to programming. And although I would love to start slowly and progress gradually, I have to do this at work so I don't have an option. I will try to do the VBA code in excel that you first suggested and if that does not work, the last suggestion in SAS

Satori
Quartz | Level 8

Nope, I am not able to make any of the options work.

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

New to programming, ok, I would just start by writing a proc import for each file:

proc import datafile="c:/somewhere/myfile.xlsx" out=want;
  sheet="Your Sheet";
run;

You can find a lot of help at:

https://video.sas.com/category/videos/how-to-tutorials

 

And of course on this forum.  Also, what SAS are you using, some have a button for import of files.

Satori
Quartz | Level 8

I'm using SAS university edition. I found some import snippets and I managed to import files individually already. But for now I have a task in which I need to stack these 100 files in one file...

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Right, so you have the datasets in the system.  If you look at the structure of each of those files, are they all exactly the same, i.e. character ones are character, number are number (its very unlikely).  If they are, then what are the names of them.  Remember I cannot see your screen so I don't know.  Do they all have the same prefix, or are all they all called differently?  

If they are all the same structure and prefix:

data want;
  set <prefix>:;
run;

Will work, replacing <prefix> with whatever your prefix is.  If they are not all the same prefix then you would need to supply the list:

data want;
  set adataset something anotherone;
run;

If you try to do this with datasets with different structures you will get errors and warnings, which you will need to fix in the original datasets before setting together.

Satori
Quartz | Level 8

What I am doing now is saving all the excel files as CSV files. Could you then tell me what code should I use to import this files to SAS and then stach them together and save it as a SAS file.

The file names are Data1, Data2, ..., Data104

Each file has now only one sheet since I deleted the other one to be able to save it as CSV.

Satori
Quartz | Level 8

Thanks, but I really don't understand.

Satori
Quartz | Level 8

Could someone tell me exactly what I need to type in SAS in order to get the desired result? I really have no idea how to do this

andreas_lds
Jade | Level 19

@Satori wrote:

Could someone tell me exactly what I need to type in SAS in order to get the desired result? I really have no idea how to do this


What have you tried so far?

 

Start by reading one file using a data-step, there are many papers and posts explaining the details, so i won't repeat them. You need a least the following statements (all explained in the online help at sas.com):

  • data
  • run
  • infile
  • input

And maybe format and informat, depending on the data you have.

 

 

Satori
Quartz | Level 8

I searched online and I am able to open one but when I try to open more than one it does not work.

The issue is that I don't know what I am doing; I am just copying the codes I find online and then I do not know how to adapt them to my situation

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 6628 views
  • 2 likes
  • 4 in conversation