BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
erickbernard
Calcite | Level 5

Hi everyone,

I want to ask something that I've been struggling with it for the last day or so. I've seen other posts similar to mine on this forum but for some reason I haven't been able to utilize them.

Alright, so I have multiple CSV files (with identical variables) that I want to merge while maintaining their identity. I should add, I only want to merge some of the files.

An example will make it clear. Suppose I have the following files;

2014_1.csv

2014_2.csv

2014_3.csv

aa_2014_1.csv

aa_2014_2.csv

aa_2014_3.csv

etc.

When I merge (say the top 3 files), I want to have another variable with the original file name attached to it.

So the final table would look like the following;

Var1 Var2 Period

2      323   2014_1

1      432   2014_1

4      432   2014_2

etc.

Can anyone help with this?

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
erickbernard
Calcite | Level 5

I've managed to write the code in R, and I guess wil stick with that. I'm not sure about the etiquette of posting code for other packages, so I'll wait for a while to see if anyone objects me posting R code here.

View solution in original post

6 REPLIES 6
ScottBass
Rhodochrosite | Level 12

1) You could use Powershell to add ",<filename>" (end of line) or "<filename>," (beginning of line) to your files, while merging them into a single combined file.

2) You could write a VBA module to add this column as it reads multiple files into a single worksheet.

3) You could easily do this using Perl or Python or .Net or ...

4) You could import multiple CSV files into SAS using an aggregated fileref, use the FILEVAR option of the INFILE statement to capture the current filename, parse the full path to get the basename, and write back out the merged CSV file.

What is your *final* "landing zone" for your data?  Is it a merged CSV file?  A SAS dataset?  Excel?  Another database?  Depending on your answer, you may be able to skip the merged CSV file part.


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
erickbernard
Calcite | Level 5

Hi Scott,

Thank you for your reply. The data comes in CSV format and all the processing/analysis will be done in SAS. I have a strong preference to keep all processing/analysis in a single platform to maintain version control and easy reproducibility. So I'm planning to keep the CSV files as is.

erickbernard
Calcite | Level 5

I've managed to write the code in R, and I guess wil stick with that. I'm not sure about the etiquette of posting code for other packages, so I'll wait for a while to see if anyone objects me posting R code here.

ScottBass
Rhodochrosite | Level 12

"So I'm planning to keep the CSV files as is.

I'm not sure what this means???  Are you planning to keep your source CSV files as is, your target CSV files as is (which doesn't make sense, since they don't exist yet), or your CSV code generation program as is?

You say you "have a strong preference to keep all processing/analysis in a single platform", yet ask a question in a SAS forum, then say you've got the code in R?

Anyway, given your problem statement above, and with lots of assumptions, here is what I think you want to do, using SAS:

* create some dummy CSV files. ;

* assumption: the files have the same structure, just different data ;

proc export data=sashelp.stocks (where=(stock="IBM")) file="c:\temp\_ibm.csv" dbms=csv;

run;

proc export data=sashelp.stocks (where=(stock="Intel")) file="c:\temp\_intel.csv" dbms=csv;

run;

proc export data=sashelp.stocks (where=(stock="Microsoft")) file="c:\temp\_microsoft.csv" dbms=csv;

run;

* merge (append) the CSV files, setting the source of the file as a variable ;

filename csvs "c:\temp\_*.csv"* note the wildcard in the filename statement ;

data test;

  length fullpath $200 source $20;

  infile csvs filename=fullpath eov=eov dsd dlm=",";

  * skip the header record on the first row of each file ;

  input @@; * input and hold the record pointer, used to set eov ;

  if _n_=1 or eov then do* eov is not set on _n_=1 ;

    eov=0* need to reset eov, SAS doesn't do this automatically ;

    input* release the record pointer for the next read ;

    delete; * throw away the header record ;

  end;

  else do;

    input stock $ date open high low close volume adjClose;

  end;

  * parse the fullpath to get just the source basename ;

source=scan(fullpath,-1,"\");

source=scan(source,1,".");

  * input informats ;

  informat date date. open high low close adjClose dollar. volume comma.;

  * output formats ;

  format date date. open high low close adjClose dollar12.2 volume comma12.;

run;


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
erickbernard
Calcite | Level 5

Hi Scott,

Thank you for the detailed reply and appreciate the sample code as well.

Yes, I was referring to the input files. I'm planning to keep them unedited/untouched and carry on with data preparation/analysis on a single platform. If this point wasn't clear, I apologize.

Introducing a step utilizing VBA, Powershell etc, adds an extra step to this process (from my point of view at least) that I'd rather avoid. On top of that, I've never used those systems before.

Finally, since I was able to solve my problem in R and given that I can complete my analysis in R, I've accomplished what I was after, that is, completing this process in a single platform.

jakarman
Barite | Level 11

@erickbernard There is really no need to use an other language as SAS.  I do not understand why you are posting something here for SAS and switch into R programming.
Probably you are not educated with using SAS. Form the old time very advanced data-manipulation until modern time advanced analytics.

One major difference to a lot of other tools,  Excel probably also R a up to some level, it is possible to have traceable auditable secured environments and processes. 

An other sample for processing all files using wildcarding is:

(o boy you should imagine how many soluitions are possible with SAS):

filename impcsv '/folders/myfolders' ;


filename impcsv ' /folders/myshortcuts/ikke';

data orderlst (keep= x y z q order part eval) ;
  length filename filenmcr $250 ;    retain _nfl -1 eovfl  ;
  length z $3 q $1 order part eval $16   ; retain order part eval ; /* the targeted lays out */
  infile impcsv(12345*.csv) filename=filenmcr eov=eovfl eof=tailer end=endfl missover dsd dlm=";"  ;
  input @ ;
   if ( eovfl or _nfl <0 ) then link header;
   _nfl=_nfl+1;
   input x y z q ; 
   output ;
   if ( endfl ) then link allfil ;
  
return;

header:
    filename=filenmcr;
    put "-- " filename " --"; _nfl=1 ; eovfl=0;
    input  ; /* advance one line, ignoring the header */
    part=scan(filename,-6,"_");
    order=scan(filename,-4,"_"); 
    eval=scan(filename,-3,"_"); 
return;

tailer: 
    put "total records" part _nfl / ;
return; 
allfil: 
    put "processing done" / ;
return; 
run;


---->-- ja karman --<-----

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 3366 views
  • 0 likes
  • 3 in conversation