Help using Base SAS procedures

Append all csv file into one dataset

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 145
Accepted Solution

Append all csv file into one dataset

Hi,

I have folder with N number of CSV files with same structural format but with different names for the csv file.

I would like to append all csv files into one final dataset.

The folder is dynamic folder and new csv files arrive on day to day…So would like to put in a macro such that every time program runs it would append all the existing csv files in one dataset.

Thanks


Accepted Solutions
Solution
‎08-28-2014 02:10 PM
Valued Guide
Posts: 3,208

Re: Append all csv file into one dataset

I don not like the proc import usage that way. lt is scanning first the data and than creating a datastep/input of data.
The result can be that with some creative input you are getting the sas-datasets in different structures.

In you datasetnaming you have session_1 batch_1 etc and they are separated by a _   Is the _ a variable separation or a level-naming separation.
As assuming it is replaced by real values I hope they will avoid the _ and the _ is being reliable as level indication.

The following code, is tested with two csv files. 

filename impcsv ' /folders/myshortcuts/ikke';

data orderlst (keep= x y z q order part eval) ;
  length filename filenmcr $250 ;    retain _nfl -1 eovfl  ;
  length z $3 q $1 order part eval $16   ; retain order part eval ; /* the targeted lays out */
  infile impcsv(12345*.csv) filename=filenmcr eov=eovfl eof=allfil end=endfl missover dsd dlm=";"  ;
  input @ ;
   if ( eovfl or _nfl <0 ) then link header;
   _nfl=_nfl+1;
   input x y z q ; 
   output ;
   if ( end ) then link tailer ;
  
return;

header:
    filename=filenmcr;
    put "-- " filename " --"; _nfl=1 ; eovfl=0;
    input  ; /* advance one line, ignoring the header */
    part=scan(filename,-6,"_");
    order=scan(filename,-4,"_"); 
    eval=scan(filename,-3,"_"); 
return;

tailer: 
    put "total records" part _nfl / ;
return; 
allfil: 
    put "processing done"l / ;
return; 
run;

proc print;
run;

NOTE: The infile library IMPCSV is:

NOTE: The infile IMPCSV(12345*.csv) is:

-- /folders/myshortcuts/ikke/12345_a_b_c_part-2_batch-1_session-1_reader-1_03aug12_dummy.csv  --

NOTE: The infile IMPCSV(12345*.csv) is:

-- /folders/myshortcuts/ikke/12345_a_b_c_part-2_batch-1_session-2_reader-2_03aug12_dummy.csv  --

processing done


       z    q                                        x   y

1UNENsession-1part-2reader-11560010101
2ENHNsession-1part-2reader-12220010102
3UNENsession-2part-2reader-21560010101
4ENHNsession-2part-2reader-22220010102
---->-- ja karman --<-----

View solution in original post


All Replies
Super User
Posts: 5,254

Re: Append all csv file into one dataset

Ok, but what is your specific question?

You seems already be on the right path using macros. Give it a shot and get back when have some specific questions.

Data never sleeps
Frequent Contributor
Posts: 145

Re: Append all csv file into one dataset

I have CSV file with following name convention 12345_a_b_c_part_2_batch_1_session_1_reader_1_03aug12_dummy.csv

x;y;z;q----first row in csv---should be made column name

1;560010101;UNE;N----records or values for the above x,y,z,q variables

2;220010102;ENH;N----- records or values for the above x,y,z,q variables

I have to create dataset from CSV FILE…..part  of the name of the csv file should go variables like

batch_1= name of the dataset

ord=1(coming from session in the name of csv)

part=2(coming from part_2 in the name of csv)

eval=1((coming from reader_1 in the name of csv)

the dataset batch_ 1 should be like this

x        y                z           q     ord      part    eval

1   560010101       une       n       1         2          1

Frequent Contributor
Posts: 145

Re: Append all csv file into one dataset

I have written parial code and would like to know

I would like to know how to run this macro for all the csv file files located in the folder. All csv file are of same structural format.

libname x 'library location;

option spool;

%let x=/by/data/ad/14607_randlist_2nd_read_part_2_batch_1_session_1_reader_1_03aug12_dummy.csv;

(location of folder)

%macro test(a=);

PROC IMPORT OUT=x.&a

DATAFILE="&x"

    DBMS=CSV REPLACE;

    delimiter = ";";

     GETNAMES=YES;

RUN;

data x.&a;

set x.&a;

a="&x";

part=substr(a,89,1);

order=substr(a,107,1);

eval=substr(a,116,1);

drop a;

run;

%mend;

%test(a=x);

Super User
Super User
Posts: 7,392

Re: Append all csv file into one dataset

Does it have headings?  If not its straightforward:

x 'copy "s:\temp\rob\*.csv" "s:\temp\rob\total.csv"';

Then you program can import just one file (note I would put the above in a batch file and have a delete the total.csv before re-qriting it otherwise it will also get included).

So use the dos copy to create a complete file.

If it does have headings then its the same thing, you just need to either do a bit of scripting in the batch file, or post-process in your dataset.

Frequent Contributor
Posts: 145

Re: Append all csv file into one dataset

Hi RW9!

Can you give syntax how it work in sas....I coudnt do it in mine......

x 'copy "s:\temp\rob\*.csv" "s:\temp\rob\total.csv"';

Super User
Super User
Posts: 7,392

Re: Append all csv file into one dataset

What couldn't you do?  The X syntax is base SAS.  It sends the command within quotes to the system for processing.  If you don't have admin rights to do that or X is locked out then just create a batch file outside of sas.

E.g.

In Windows explorer where your csv files are, right click create new text document.  Type in:

copy "s:\temp\rob\*.csv" "s:\temp\rob\total.csv"

Changing the paths obviously.

Now right click and rename it Job.bat.  Yes, extension of .bat.

When you double click on it a black screen will pop up - that is DOS - it will process each file and then close or wait to be closed.

Valued Guide
Posts: 3,208

Re: Append all csv file into one dataset

No need for os commands sas is supporting wildcards  

filename impdir '/folders/myfolders' ;    /* define you input location */ *

data _null ;
  length filename filenmcr $250 ;
  retain _nfl 1 eovfl 1 ;
  infile impdir(*.csv) filename=filenmcr eov=eovfl eof=tailer end=endfl  ;   /* read all csv files */
  input   ............                                                      /*  your input statments for the csv file(s)  */
   ;  
   _nfl=_nfl+1;
   if ( eovfl ) then link header; eovfl=0;
return;

header:  /* just some processing to show wich csv files have been processed */
    filename=filenmcr;
    put "-- " filename " --"; _nfl=1 ;
return;
tailer: 
    put "total records" _nfl / ;
return; 
run;

---->-- ja karman --<-----
Solution
‎08-28-2014 02:10 PM
Valued Guide
Posts: 3,208

Re: Append all csv file into one dataset

I don not like the proc import usage that way. lt is scanning first the data and than creating a datastep/input of data.
The result can be that with some creative input you are getting the sas-datasets in different structures.

In you datasetnaming you have session_1 batch_1 etc and they are separated by a _   Is the _ a variable separation or a level-naming separation.
As assuming it is replaced by real values I hope they will avoid the _ and the _ is being reliable as level indication.

The following code, is tested with two csv files. 

filename impcsv ' /folders/myshortcuts/ikke';

data orderlst (keep= x y z q order part eval) ;
  length filename filenmcr $250 ;    retain _nfl -1 eovfl  ;
  length z $3 q $1 order part eval $16   ; retain order part eval ; /* the targeted lays out */
  infile impcsv(12345*.csv) filename=filenmcr eov=eovfl eof=allfil end=endfl missover dsd dlm=";"  ;
  input @ ;
   if ( eovfl or _nfl <0 ) then link header;
   _nfl=_nfl+1;
   input x y z q ; 
   output ;
   if ( end ) then link tailer ;
  
return;

header:
    filename=filenmcr;
    put "-- " filename " --"; _nfl=1 ; eovfl=0;
    input  ; /* advance one line, ignoring the header */
    part=scan(filename,-6,"_");
    order=scan(filename,-4,"_"); 
    eval=scan(filename,-3,"_"); 
return;

tailer: 
    put "total records" part _nfl / ;
return; 
allfil: 
    put "processing done"l / ;
return; 
run;

proc print;
run;

NOTE: The infile library IMPCSV is:

NOTE: The infile IMPCSV(12345*.csv) is:

-- /folders/myshortcuts/ikke/12345_a_b_c_part-2_batch-1_session-1_reader-1_03aug12_dummy.csv  --

NOTE: The infile IMPCSV(12345*.csv) is:

-- /folders/myshortcuts/ikke/12345_a_b_c_part-2_batch-1_session-2_reader-2_03aug12_dummy.csv  --

processing done


       z    q                                        x   y

1UNENsession-1part-2reader-11560010101
2ENHNsession-1part-2reader-12220010102
3UNENsession-2part-2reader-21560010101
4ENHNsession-2part-2reader-22220010102
---->-- ja karman --<-----
Frequent Contributor
Posts: 145

Re: Append all csv file into one dataset

Perfect Sir...that worked perfect for me.

Thanks

Raghav

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 2973 views
  • 2 likes
  • 4 in conversation