Hi,
I have folder with N number of CSV files with same structural format but with different names for the csv file.
I would like to append all csv files into one final dataset.
The folder is dynamic folder and new csv files arrive on day to day…So would like to put in a macro such that every time program runs it would append all the existing csv files in one dataset.
Thanks
I don not like the proc import usage that way. lt is scanning first the data and than creating a datastep/input of data.
The result can be that with some creative input you are getting the sas-datasets in different structures.
In you datasetnaming you have session_1 batch_1 etc and they are separated by a _ Is the _ a variable separation or a level-naming separation.
As assuming it is replaced by real values I hope they will avoid the _ and the _ is being reliable as level indication.
The following code, is tested with two csv files.
filename impcsv ' /folders/myshortcuts/ikke';
data orderlst (keep= x y z q order part eval) ;
length filename filenmcr $250 ; retain _nfl -1 eovfl ;
length z $3 q $1 order part eval $16 ; retain order part eval ; /* the targeted lays out */
infile impcsv(12345*.csv) filename=filenmcr eov=eovfl eof=allfil end=endfl missover dsd dlm=";" ;
input @ ;
if ( eovfl or _nfl <0 ) then link header;
_nfl=_nfl+1;
input x y z q ;
output ;
if ( end ) then link tailer ;
return;
header:
filename=filenmcr;
put "-- " filename " --"; _nfl=1 ; eovfl=0;
input ; /* advance one line, ignoring the header */
part=scan(filename,-6,"_");
order=scan(filename,-4,"_");
eval=scan(filename,-3,"_");
return;
tailer:
put "total records" part _nfl / ;
return;
allfil:
put "processing done"l / ;
return;
run;
proc print;
run;
NOTE: The infile library IMPCSV is:
NOTE: The infile IMPCSV(12345*.csv) is:
-- /folders/myshortcuts/ikke/12345_a_b_c_part-2_batch-1_session-1_reader-1_03aug12_dummy.csv --
NOTE: The infile IMPCSV(12345*.csv) is:
-- /folders/myshortcuts/ikke/12345_a_b_c_part-2_batch-1_session-2_reader-2_03aug12_dummy.csv --
processing done
z q x y
1 | UNE | N | session-1 | part-2 | reader-1 | 1 | 560010101 |
---|---|---|---|---|---|---|---|
2 | ENH | N | session-1 | part-2 | reader-1 | 2 | 220010102 |
3 | UNE | N | session-2 | part-2 | reader-2 | 1 | 560010101 |
4 | ENH | N | session-2 | part-2 | reader-2 | 2 | 220010102 |
Ok, but what is your specific question?
You seems already be on the right path using macros. Give it a shot and get back when have some specific questions.
I have CSV file with following name convention 12345_a_b_c_part_2_batch_1_session_1_reader_1_03aug12_dummy.csv
x;y;z;q----first row in csv---should be made column name
1;560010101;UNE;N----records or values for the above x,y,z,q variables
2;220010102;ENH;N----- records or values for the above x,y,z,q variables
I have to create dataset from CSV FILE…..part of the name of the csv file should go variables like
batch_1= name of the dataset
ord=1(coming from session in the name of csv)
part=2(coming from part_2 in the name of csv)
eval=1((coming from reader_1 in the name of csv)
the dataset batch_ 1 should be like this
x y z q ord part eval
1 560010101 une n 1 2 1
I have written parial code and would like to know
I would like to know how to run this macro for all the csv file files located in the folder. All csv file are of same structural format.
libname x 'library location;
option spool;
%let x=/by/data/ad/14607_randlist_2nd_read_part_2_batch_1_session_1_reader_1_03aug12_dummy.csv;
(location of folder)
%macro test(a=);
PROC IMPORT OUT=x.&a
DATAFILE="&x"
DBMS=CSV REPLACE;
delimiter = ";";
GETNAMES=YES;
RUN;
data x.&a;
set x.&a;
a="&x";
part=substr(a,89,1);
order=substr(a,107,1);
eval=substr(a,116,1);
drop a;
run;
%mend;
%test(a=x);
Does it have headings? If not its straightforward:
x 'copy "s:\temp\rob\*.csv" "s:\temp\rob\total.csv"';
Then you program can import just one file (note I would put the above in a batch file and have a delete the total.csv before re-qriting it otherwise it will also get included).
So use the dos copy to create a complete file.
If it does have headings then its the same thing, you just need to either do a bit of scripting in the batch file, or post-process in your dataset.
Hi RW9!
Can you give syntax how it work in sas....I coudnt do it in mine......
x 'copy "s:\temp\rob\*.csv" "s:\temp\rob\total.csv"';
What couldn't you do? The X syntax is base SAS. It sends the command within quotes to the system for processing. If you don't have admin rights to do that or X is locked out then just create a batch file outside of sas.
E.g.
In Windows explorer where your csv files are, right click create new text document. Type in:
copy "s:\temp\rob\*.csv" "s:\temp\rob\total.csv"
Changing the paths obviously.
Now right click and rename it Job.bat. Yes, extension of .bat.
When you double click on it a black screen will pop up - that is DOS - it will process each file and then close or wait to be closed.
No need for os commands sas is supporting wildcards
filename impdir '/folders/myfolders' ; /* define you input location */ *
data _null ;
length filename filenmcr $250 ;
retain _nfl 1 eovfl 1 ;
infile impdir(*.csv) filename=filenmcr eov=eovfl eof=tailer end=endfl ; /* read all csv files */
input ............ /* your input statments for the csv file(s) */
;
_nfl=_nfl+1;
if ( eovfl ) then link header; eovfl=0;
return;
header: /* just some processing to show wich csv files have been processed */
filename=filenmcr;
put "-- " filename " --"; _nfl=1 ;
return;
tailer:
put "total records" _nfl / ;
return;
run;
I don not like the proc import usage that way. lt is scanning first the data and than creating a datastep/input of data.
The result can be that with some creative input you are getting the sas-datasets in different structures.
In you datasetnaming you have session_1 batch_1 etc and they are separated by a _ Is the _ a variable separation or a level-naming separation.
As assuming it is replaced by real values I hope they will avoid the _ and the _ is being reliable as level indication.
The following code, is tested with two csv files.
filename impcsv ' /folders/myshortcuts/ikke';
data orderlst (keep= x y z q order part eval) ;
length filename filenmcr $250 ; retain _nfl -1 eovfl ;
length z $3 q $1 order part eval $16 ; retain order part eval ; /* the targeted lays out */
infile impcsv(12345*.csv) filename=filenmcr eov=eovfl eof=allfil end=endfl missover dsd dlm=";" ;
input @ ;
if ( eovfl or _nfl <0 ) then link header;
_nfl=_nfl+1;
input x y z q ;
output ;
if ( end ) then link tailer ;
return;
header:
filename=filenmcr;
put "-- " filename " --"; _nfl=1 ; eovfl=0;
input ; /* advance one line, ignoring the header */
part=scan(filename,-6,"_");
order=scan(filename,-4,"_");
eval=scan(filename,-3,"_");
return;
tailer:
put "total records" part _nfl / ;
return;
allfil:
put "processing done"l / ;
return;
run;
proc print;
run;
NOTE: The infile library IMPCSV is:
NOTE: The infile IMPCSV(12345*.csv) is:
-- /folders/myshortcuts/ikke/12345_a_b_c_part-2_batch-1_session-1_reader-1_03aug12_dummy.csv --
NOTE: The infile IMPCSV(12345*.csv) is:
-- /folders/myshortcuts/ikke/12345_a_b_c_part-2_batch-1_session-2_reader-2_03aug12_dummy.csv --
processing done
z q x y
1 | UNE | N | session-1 | part-2 | reader-1 | 1 | 560010101 |
---|---|---|---|---|---|---|---|
2 | ENH | N | session-1 | part-2 | reader-1 | 2 | 220010102 |
3 | UNE | N | session-2 | part-2 | reader-2 | 1 | 560010101 |
4 | ENH | N | session-2 | part-2 | reader-2 | 2 | 220010102 |
Perfect Sir...that worked perfect for me.
Thanks
Raghav
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.