Help using Base SAS procedures

Combining multiple Excel files while maintaining their identities

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 11
Accepted Solution

Combining multiple Excel files while maintaining their identities

Hi everyone,

I want to ask something that I've been struggling with it for the last day or so. I've seen other posts similar to mine on this forum but for some reason I haven't been able to utilize them.

Alright, so I have multiple CSV files (with identical variables) that I want to merge while maintaining their identity. I should add, I only want to merge some of the files.

An example will make it clear. Suppose I have the following files;

2014_1.csv

2014_2.csv

2014_3.csv

aa_2014_1.csv

aa_2014_2.csv

aa_2014_3.csv

etc.

When I merge (say the top 3 files), I want to have another variable with the original file name attached to it.

So the final table would look like the following;

Var1 Var2 Period

2      323   2014_1

1      432   2014_1

4      432   2014_2

etc.

Can anyone help with this?

Thank you


Accepted Solutions
Solution
‎09-01-2014 12:26 AM
Occasional Contributor
Posts: 11

Re: Combining multiple Excel files while maintaining their identities

I've managed to write the code in R, and I guess wil stick with that. I'm not sure about the etiquette of posting code for other packages, so I'll wait for a while to see if anyone objects me posting R code here.

View solution in original post


All Replies
Super Contributor
Posts: 376

Re: Combining multiple Excel files while maintaining their identities

1) You could use Powershell to add ",<filename>" (end of line) or "<filename>," (beginning of line) to your files, while merging them into a single combined file.

2) You could write a VBA module to add this column as it reads multiple files into a single worksheet.

3) You could easily do this using Perl or Python or .Net or ...

4) You could import multiple CSV files into SAS using an aggregated fileref, use the FILEVAR option of the INFILE statement to capture the current filename, parse the full path to get the basename, and write back out the merged CSV file.

What is your *final* "landing zone" for your data?  Is it a merged CSV file?  A SAS dataset?  Excel?  Another database?  Depending on your answer, you may be able to skip the merged CSV file part.

Occasional Contributor
Posts: 11

Re: Combining multiple Excel files while maintaining their identities

Hi Scott,

Thank you for your reply. The data comes in CSV format and all the processing/analysis will be done in SAS. I have a strong preference to keep all processing/analysis in a single platform to maintain version control and easy reproducibility. So I'm planning to keep the CSV files as is.

Solution
‎09-01-2014 12:26 AM
Occasional Contributor
Posts: 11

Re: Combining multiple Excel files while maintaining their identities

I've managed to write the code in R, and I guess wil stick with that. I'm not sure about the etiquette of posting code for other packages, so I'll wait for a while to see if anyone objects me posting R code here.

Super Contributor
Posts: 376

Re: Combining multiple Excel files while maintaining their identities

"So I'm planning to keep the CSV files as is.

I'm not sure what this means???  Are you planning to keep your source CSV files as is, your target CSV files as is (which doesn't make sense, since they don't exist yet), or your CSV code generation program as is?

You say you "have a strong preference to keep all processing/analysis in a single platform", yet ask a question in a SAS forum, then say you've got the code in R?

Anyway, given your problem statement above, and with lots of assumptions, here is what I think you want to do, using SAS:

* create some dummy CSV files. ;

* assumption: the files have the same structure, just different data ;

proc export data=sashelp.stocks (where=(stock="IBM")) file="c:\temp\_ibm.csv" dbms=csv;

run;

proc export data=sashelp.stocks (where=(stock="Intel")) file="c:\temp\_intel.csv" dbms=csv;

run;

proc export data=sashelp.stocks (where=(stock="Microsoft")) file="c:\temp\_microsoft.csv" dbms=csv;

run;

* merge (append) the CSV files, setting the source of the file as a variable ;

filename csvs "c:\temp\_*.csv"* note the wildcard in the filename statement ;

data test;

  length fullpath $200 source $20;

  infile csvs filename=fullpath eov=eov dsd dlm=",";

  * skip the header record on the first row of each file ;

  input @@; * input and hold the record pointer, used to set eov ;

  if _n_=1 or eov then do* eov is not set on _n_=1 ;

    eov=0* need to reset eov, SAS doesn't do this automatically ;

    input* release the record pointer for the next read ;

    delete; * throw away the header record ;

  end;

  else do;

    input stock $ date open high low close volume adjClose;

  end;

  * parse the fullpath to get just the source basename ;

source=scan(fullpath,-1,"\");

source=scan(source,1,".");

  * input informats ;

  informat date date. open high low close adjClose dollar. volume comma.;

  * output formats ;

  format date date. open high low close adjClose dollar12.2 volume comma12.;

run;

Occasional Contributor
Posts: 11

Re: Combining multiple Excel files while maintaining their identities

Hi Scott,

Thank you for the detailed reply and appreciate the sample code as well.

Yes, I was referring to the input files. I'm planning to keep them unedited/untouched and carry on with data preparation/analysis on a single platform. If this point wasn't clear, I apologize.

Introducing a step utilizing VBA, Powershell etc, adds an extra step to this process (from my point of view at least) that I'd rather avoid. On top of that, I've never used those systems before.

Finally, since I was able to solve my problem in R and given that I can complete my analysis in R, I've accomplished what I was after, that is, completing this process in a single platform.

Valued Guide
Posts: 3,208

Re: Combining multiple Excel files while maintaining their identities

@erickbernard There is really no need to use an other language as SAS.  I do not understand why you are posting something here for SAS and switch into R programming.
Probably you are not educated with using SAS. Form the old time very advanced data-manipulation until modern time advanced analytics.

One major difference to a lot of other tools,  Excel probably also R a up to some level, it is possible to have traceable auditable secured environments and processes. 

An other sample for processing all files using wildcarding is:

(o boy you should imagine how many soluitions are possible with SAS):

filename impcsv '/folders/myfolders' ;


filename impcsv ' /folders/myshortcuts/ikke';

data orderlst (keep= x y z q order part eval) ;
  length filename filenmcr $250 ;    retain _nfl -1 eovfl  ;
  length z $3 q $1 order part eval $16   ; retain order part eval ; /* the targeted lays out */
  infile impcsv(12345*.csv) filename=filenmcr eov=eovfl eof=tailer end=endfl missover dsd dlm=";"  ;
  input @ ;
   if ( eovfl or _nfl <0 ) then link header;
   _nfl=_nfl+1;
   input x y z q ; 
   output ;
   if ( endfl ) then link allfil ;
  
return;

header:
    filename=filenmcr;
    put "-- " filename " --"; _nfl=1 ; eovfl=0;
    input  ; /* advance one line, ignoring the header */
    part=scan(filename,-6,"_");
    order=scan(filename,-4,"_"); 
    eval=scan(filename,-3,"_"); 
return;

tailer: 
    put "total records" part _nfl / ;
return; 
allfil: 
    put "processing done" / ;
return; 
run;


---->-- ja karman --<-----
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 960 views
  • 0 likes
  • 3 in conversation