Hello All,
I am fairly new to SAS and am trying to export a dataset to multiple CSVs.
My dataset has 5 variables:
Var1 represents a line identifier (can be either header, volume or cost),
Var2 represents a bloc identifier
The format of the data for Var3 Var4 Var5 is dependent on the value of Var1.
My goal is to export my dataset into different csv (grouping by Var2).
I sorted my table based on Var2 and ran the following code:
data _null_;
set MyDataSet;
by Var2 ;
fv=cats("MyDictory\",Var2,'.csv');
file csv filevar=fv dsd dlm=';' lrecl=32000 ;
put (_all_) (+0);
run;
Doing so gives me the correct content for my CSV files
Here comes my issue: I need a specific naming convention for the files.
The first line of each csv is the header and I need to have this information in the filename as well.
Any idea how I can rename the files as they are being exported, or in a second step, using information from the first line of each csv ?
Thanks a lot !
Nathan
Not sure I am following, but it sounds like you want to include on the FIRST value of VAR1 in the name of the file?
If that is the case then make this change so that FV is updated only once per value of VAR2.
data _null_;
set Mydataset ;
by VAR2;
if first.var2 then fv=cats("/MyDirectory/",catx('_',of Var1--Var5),'.csv');
retain fv;
file csv filevar=fv dsd dlm=';' lrecl=32000 ;
put (_all_) (+0);
run;
The naming convention for your filenames must be implemented here:
fv = cats("MyDictory\",Var2,'.csv');
in whichever way you can derive the wanted filename from the content of var2.
Header lines can be inserted with
if first.var2 then put .....;
If the "header" observation is in fact the first of each var2 group, the information for building the filename will be available at the same time.
Thanks for the reply.
The issue is that if I modify the filename in that location, then the content of the file changes as well.
As the code is, the content is perfectly as wanted, I just need more flexibility in the naming of the file and was wondering if there was an easy way to separate the two (i.e. select the content of the file as it is now, and then define then file name based on the first line of the file)
@NathanB wrote:
Thanks for the reply.
The issue is that if I modify the filename in that location, then the content of the file changes as well.
As the code is, the content is perfectly as wanted, I just need more flexibility in the naming of the file and was wondering if there was an easy way to separate the two (i.e. select the content of the file as it is now, and then define then file name based on the first line of the file)
?????
As long as you do not include the fv variable in the PUT statement, changing it will have NO effect on the file content. None at all.
@NathanB wrote:
Thanks for the reply.
The issue is that if I modify the filename in that location, then the content of the file changes as well.
As the code is, the content is perfectly as wanted, I just need more flexibility in the naming of the file and was wondering if there was an easy way to separate the two (i.e. select the content of the file as it is now, and then define then file name based on the first line of the file)
So you are writing the value of FV into the file?
If so I would recommend making that variable first, so it is just one of the variables in the file.
And then use a different variable as the one you use with FILEVAR= option of the FILE statement.
For example here is a sketch based on the last method of my previous answer:
proc transpose data=mydataset(obs=0) out=names;
var _all_;
run;
data _null_;
set MyDataSet;
by Var2 ;
fv=cats("MyDictory\",Var2,'.csv');
if first.var2 then link header;
put (_all_) (+0);
return;
header:
otherfv= < some other name >;
file csv filevar=otherfv dsd dlm=';' lrecl=32000 ;
do _n_=1 to _nobs;
set names point=_n_ nobs=_nobs;
put _name_ @;
end;
put 'header for FV variable';
return;
run;
So this is the line that is setting the filename.
fv=cats("MyDictory\",Var2,'.csv');
So just change that to generate the filenames you actually want to create.
Since you are processing the data in file order you can just use BY group processing to know when to write the header line.
If you know the variable names (and what header line you want for those variables) you can just list them instead of using the _ALL_ variable list.
data _null_;
set MyDataSet;
by Var2 ;
fv=cats("MyDictory\",Var2,'.csv');
file csv filevar=fv dsd dlm=';' lrecl=32000 ;
if first.var2 then put 'var1;var2;var3;var4;var5';
put var1 var2 var3 var4 var5 ;
run;
If you don't know the variable names then you will need to either first build the list of column headers into a macro variable.
proc sql noprint;
select nliteral(name),quote(trim(coalesce(label,name)))
into :names separated by ' '
, :headers separated by ';'
from dictionary.columns
where libname='WORK'
and memname='MYDATSET'
order by varnum
;
quit;
data _null_;
set MyDataSet;
by Var2 ;
fv=cats("MyDictory\",Var2,'.csv');
file csv filevar=fv dsd dlm=';' lrecl=32000 ;
if first.var2 then put %sysfunc(quote(%superq(headers),%str(%'))) ;
put &names ;
run;
Or put the list of names into a dataset and use that. You can use PROC CONTENTS or query of DICTIONARY.COLUMNS (like above) to get the list of names. But you can also use PROC TRANSPOSE and the dataset option OBS=0 and VAR _ALL_ to make a dataset with names in order. Then you can loop over that dataset to write the header row.
proc transpose data=mydataset(obs=0) out=names;
var _all_;
run;
data _null_;
set MyDataSet;
by Var2 ;
fv=cats("MyDictory\",Var2,'.csv');
file csv filevar=fv dsd dlm=';' lrecl=32000 ;
if first.var2 then link header;
put (_all_) (+0);
return;
header:
do _n_=1 to _nobs;
set names point=_n_ nobs=_nobs;
put _name_ @;
end;
put;
return;
run;
Thank you both.
Unfortunately I can get the desired result.
I'll try to explain my issue better !
Using the data set below:
Var1 | Var2 | Var3 | Var4 | Var5 |
HEADER | ID_A | 20230509 | 20231205 | correction |
VOLLINES | ID_A | 20230509 | 08MAY23:22:00:00 | 0.0000 |
VOLLINES | ID_A | 20230509 | 08MAY23:22:15:00 | 0.0000 |
FIXLINE | ID_A | 20230509 | 20231205 | 50 |
HEADER | ID_B | 20220131 | 20221231 | correction |
VOLLINES | ID_B | 20220131 | 01JAN22:22:00:00 | 0.0000 |
VOLLINES | ID_B | 20220131 | 01JAN22:22:15:00 | 0.0000 |
FIXLINE | ID_B | 20220131 | 20221231 | 150 |
If I run my initial code
data _null_;
set Mydataset ;
by column2 ;
fv=cats("MyDirectory",Var2,'.csv');
file csv filevar=fv dsd dlm=';' lrecl=32000 ;
put (_all_) (+0);
run;
I get two csv's:
Filename: ID_A
HEADER | ID_A | 20230509 | 20231205 | correction |
VOLLINES | ID_A | 20230509 | 08MAY23:22:00:00 | 0.0000 |
VOLLINES | ID_A | 20230509 | 08MAY23:22:15:00 | 0.0000 |
FIXLINE | ID_A | 20230509 | 20231205 | 50 |
Filename: ID_B
HEADER | ID_B | 20220131 | 20221231 | correction |
VOLLINES | ID_B | 20220131 | 01JAN22:22:00:00 | 0.0000 |
VOLLINES | ID_B | 20220131 | 01JAN22:22:15:00 | 0.0000 |
FIXLINE | ID_B | 20220131 | 20221231 | 150 |
Content wise, this is exactly the output I want, but my issue is in the naming of the file:
The first one should be named ID_A_Header (or whatever other concatenation of the first line of the csv eg: ID_A_20230509_20231205_correction)
Same goes for the second file.
Any idea how I can achieve this result ?
I tried changing the following line:
fv=cats("MyDirectory",Var2,'.csv');
==> fv=cats("MyDirectory",Var2,Var1,'.csv');
My csv the get split differently content wise. In this case I would have 6 files, one for each combination of Header, Vollines, Fixline and ID_A, ID_B.
At this point I am a bit lost.
Thank you again for your help,
Nathan
Not sure I am following, but it sounds like you want to include on the FIRST value of VAR1 in the name of the file?
If that is the case then make this change so that FV is updated only once per value of VAR2.
data _null_;
set Mydataset ;
by VAR2;
if first.var2 then fv=cats("/MyDirectory/",catx('_',of Var1--Var5),'.csv');
retain fv;
file csv filevar=fv dsd dlm=';' lrecl=32000 ;
put (_all_) (+0);
run;
This is it !!
Thank you so much !
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.