Solved: Exporting dataset to csv: filename issues

NathanB · Posted 02-14-2024 05:01 AM

Hello All,

I am fairly new to SAS and am trying to export a dataset to multiple CSVs.

My dataset has 5 variables:

Var1 represents a line identifier (can be either header, volume or cost),

Var2 represents a bloc identifier

The format of the data for Var3 Var4 Var5 is dependent on the value of Var1.

My goal is to export my dataset into different csv (grouping by Var2).
I sorted my table based on Var2 and ran the following code:

data _null_;
set MyDataSet;
by Var2 ;
fv=cats("MyDictory\",Var2,'.csv');
file csv filevar=fv dsd dlm=';' lrecl=32000 ;
put (_all_) (+0);
run;

Doing so gives me the correct content for my CSV files

Here comes my issue: I need a specific naming convention for the files.
The first line of each csv is the header and I need to have this information in the filename as well.

Any idea how I can rename the files as they are being exported, or in a second step, using information from the first line of each csv ?

Thanks a lot !

Nathan

Tom · Posted 02-15-2024 09:54 AM

Not sure I am following, but it sounds like you want to include on the FIRST value of VAR1 in the name of the file?

If that is the case then make this change so that FV is updated only once per value of VAR2.

data _null_;
  set Mydataset ;
  by VAR2;
  if first.var2 then fv=cats("/MyDirectory/",catx('_',of Var1--Var5),'.csv');
  retain fv;
  file csv filevar=fv dsd dlm=';' lrecl=32000 ;
  put (_all_) (+0);
run;

View solution in original post

Kurt_Bremser · Posted 02-14-2024 08:21 AM

The naming convention for your filenames must be implemented here:

fv = cats("MyDictory\",Var2,'.csv');

in whichever way you can derive the wanted filename from the content of var2.

Header lines can be inserted with

if first.var2 then put .....;

If the "header" observation is in fact the first of each var2 group, the information for building the filename will be available at the same time.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

NathanB · Posted 02-14-2024 08:27 AM

Thanks for the reply.

The issue is that if I modify the filename in that location, then the content of the file changes as well.
As the code is, the content is perfectly as wanted, I just need more flexibility in the naming of the file and was wondering if there was an easy way to separate the two (i.e. select the content of the file as it is now, and then define then file name based on the first line of the file)

Kurt_Bremser · Posted 02-14-2024 08:34 AM

@NathanB wrote:

Thanks for the reply.

The issue is that if I modify the filename in that location, then the content of the file changes as well.
As the code is, the content is perfectly as wanted, I just need more flexibility in the naming of the file and was wondering if there was an easy way to separate the two (i.e. select the content of the file as it is now, and then define then file name based on the first line of the file)

?????

As long as you do not include the fv variable in the PUT statement, changing it will have NO effect on the file content. None at all.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Tom · Posted 02-14-2024 08:36 AM

@NathanB wrote:

Thanks for the reply.

The issue is that if I modify the filename in that location, then the content of the file changes as well.
As the code is, the content is perfectly as wanted, I just need more flexibility in the naming of the file and was wondering if there was an easy way to separate the two (i.e. select the content of the file as it is now, and then define then file name based on the first line of the file)

So you are writing the value of FV into the file?

If so I would recommend making that variable first, so it is just one of the variables in the file.

And then use a different variable as the one you use with FILEVAR= option of the FILE statement.

For example here is a sketch based on the last method of my previous answer:

proc transpose data=mydataset(obs=0) out=names;
  var _all_; 
run;
data _null_;
  set MyDataSet;
  by Var2 ;
  fv=cats("MyDictory\",Var2,'.csv');
  if first.var2 then link header;
  put (_all_) (+0);
return;
header:
  otherfv= < some other name >;
  file csv filevar=otherfv dsd dlm=';' lrecl=32000 ;
  do _n_=1 to _nobs;
    set names point=_n_ nobs=_nobs;
    put _name_ @;
  end;
  put 'header for FV variable';
return;
run;

Tom · Posted 02-14-2024 08:31 AM

So this is the line that is setting the filename.

fv=cats("MyDictory\",Var2,'.csv');

So just change that to generate the filenames you actually want to create.

Since you are processing the data in file order you can just use BY group processing to know when to write the header line.

If you know the variable names (and what header line you want for those variables) you can just list them instead of using the _ALL_ variable list.

data _null_;
  set MyDataSet;
  by Var2 ;
  fv=cats("MyDictory\",Var2,'.csv');
  file csv filevar=fv dsd dlm=';' lrecl=32000 ;
  if first.var2 then put 'var1;var2;var3;var4;var5';
  put var1 var2 var3 var4 var5 ;
run;

If you don't know the variable names then you will need to either first build the list of column headers into a macro variable.

proc sql noprint;
select nliteral(name),quote(trim(coalesce(label,name))) 
  into :names separated by ' '
     , :headers separated by ';'
  from dictionary.columns
  where libname='WORK'
    and memname='MYDATSET'
  order by varnum
;
quit;
data _null_;
  set MyDataSet;
  by Var2 ;
  fv=cats("MyDictory\",Var2,'.csv');
  file csv filevar=fv dsd dlm=';' lrecl=32000 ;
  if first.var2 then put %sysfunc(quote(%superq(headers),%str(%'))) ;
  put &names ;
run;

Or put the list of names into a dataset and use that. You can use PROC CONTENTS or query of DICTIONARY.COLUMNS (like above) to get the list of names. But you can also use PROC TRANSPOSE and the dataset option OBS=0 and VAR _ALL_ to make a dataset with names in order. Then you can loop over that dataset to write the header row.

proc transpose data=mydataset(obs=0) out=names;
  var _all_; 
run;
data _null_;
  set MyDataSet;
  by Var2 ;
  fv=cats("MyDictory\",Var2,'.csv');
  file csv filevar=fv dsd dlm=';' lrecl=32000 ;
  if first.var2 then link header;
  put (_all_) (+0);
return;
header:
  do _n_=1 to _nobs;
    set names point=_n_ nobs=_nobs;
    put _name_ @;
  end;
  put;
return;
run;

NathanB · Posted 02-15-2024 09:47 AM

Thank you both.

Unfortunately I can get the desired result.

I'll try to explain my issue better !
Using the data set below:

Var1	Var2	Var3	Var4	Var5
HEADER	ID_A	20230509	20231205	correction
VOLLINES	ID_A	20230509	08MAY23:22:00:00	0.0000
VOLLINES	ID_A	20230509	08MAY23:22:15:00	0.0000
FIXLINE	ID_A	20230509	20231205	50
HEADER	ID_B	20220131	20221231	correction
VOLLINES	ID_B	20220131	01JAN22:22:00:00	0.0000
VOLLINES	ID_B	20220131	01JAN22:22:15:00	0.0000
FIXLINE	ID_B	20220131	20221231	150

If I run my initial code

data _null_;
set Mydataset ;
by column2 ;
fv=cats("MyDirectory",Var2,'.csv');
file csv filevar=fv dsd dlm=';' lrecl=32000 ;
put (_all_) (+0);
run;

I get two csv's:

Filename: ID_A

HEADER	ID_A	20230509	20231205	correction
VOLLINES	ID_A	20230509	08MAY23:22:00:00	0.0000
VOLLINES	ID_A	20230509	08MAY23:22:15:00	0.0000
FIXLINE	ID_A	20230509	20231205	50

Filename: ID_B

HEADER	ID_B	20220131	20221231	correction
VOLLINES	ID_B	20220131	01JAN22:22:00:00	0.0000
VOLLINES	ID_B	20220131	01JAN22:22:15:00	0.0000
FIXLINE	ID_B	20220131	20221231	150

Content wise, this is exactly the output I want, but my issue is in the naming of the file:

The first one should be named ID_A_Header (or whatever other concatenation of the first line of the csv eg: ID_A_20230509_20231205_correction)

Same goes for the second file.

Any idea how I can achieve this result ?
I tried changing the following line:
fv=cats("MyDirectory",Var2,'.csv');

==> fv=cats("MyDirectory",Var2,Var1,'.csv');

My csv the get split differently content wise. In this case I would have 6 files, one for each combination of Header, Vollines, Fixline and ID_A, ID_B.

At this point I am a bit lost.

Thank you again for your help,
Nathan

Tom · Posted 02-15-2024 09:54 AM

Not sure I am following, but it sounds like you want to include on the FIRST value of VAR1 in the name of the file?

If that is the case then make this change so that FV is updated only once per value of VAR2.

data _null_;
  set Mydataset ;
  by VAR2;
  if first.var2 then fv=cats("/MyDirectory/",catx('_',of Var1--Var5),'.csv');
  retain fv;
  file csv filevar=fv dsd dlm=';' lrecl=32000 ;
  put (_all_) (+0);
run;

NathanB · Posted 02-15-2024 10:01 AM

This is it !!
Thank you so much !

Exporting dataset to csv: filename issues

Re: Exporting dataset to csv: filename issues

Re: Exporting dataset to csv: filename issues

Re: Exporting dataset to csv: filename issues

Re: Exporting dataset to csv: filename issues

Re: Exporting dataset to csv: filename issues

Re: Exporting dataset to csv: filename issues

Re: Exporting dataset to csv: filename issues

Re: Exporting dataset to csv: filename issues

Re: Exporting dataset to csv: filename issues

Registration is open

SAS Training: Just a Click Away