%Let rptout = /myshare; data have; set sashelp.cars; run; %macro eof (Make); Data _Null_; Set have; FILE "&rptout/&Make._report1.csv" DLM="," dsd; RUN; %mend; %eof(Acura,'Acura'); %eof(Audi,'Audi'); %eof(BMW,'BMW');
1. Is there a way to automate programmatically and segment into separate datasets by make? (ie if make = 'Acura' output Acura)
2. After executing step 1 is there a way to proc export based on the number of datasets created. Objective is to assign the Make as the name of each dataset at export
Creating multiple text files is simple. Use the FILEVAR= option on the FILE statement.
data _null_;
set have;
length filevar $256 ;
filevar=cats("&rptout/",make,'.csv');
file csv dsd filevar=filevar ;
put (_all_) (+0);
run;
Creating separate datasets is harder than creating separated text files. For that you would need to use code generation.
@Tom wrote:
...
Creating separate datasets is harder than creating separated text files. For that you would need to use code generation.
But i
f you are willing to use a set hashes (and a hash of hashes), then you don't have to use code generation to create one dataset per make:
data have;
set sashelp.cars;
run;
data _null_;
set have end=end_of_have;
if _n_=1 then do;
declare hash h;
declare hash hoh (ordered:'a');
hoh.definekey('make');
hoh.definedata('make','h');
hoh.definedone();
declare hiter hohi ('hoh');
end;
if hoh.find()^=0 then do;
h=_new_ hash(dataset:'have (obs=0)');
* or (see note) h=_new_ hash(dataset:'have (obs=0)',multidata:'Y');
h.definekey(all:'Y');
*or (see note) h.definekey('make');
h.definedata(all:'Y');
h.definedone();
hoh_add=hoh.add();
end;
h.add();
if end_of_have then do rc=hohi.first() by 0 until (hohi.next()^=0);
h.output(dataset:translate(trim(make),'___',' -&'));
end;
run;
The program above creates one hash object per make. Note that the hash object is named h for every instance - i.e. for each make. It is use of the hash-of-hashes object hoh that manages those individual hash objects such that each incoming observation becomes a data item in the associated hash h.
Now the hash objects h are keyed on all variables found in HAVE. Assuming no observations are completely identical, this will avoid the problem of dealing with "collisions" in the hash objects, whose default is to keep only one dataitem (think row) per set of key values. That would be a problem if the h objects were keyed only on make.
The primary downside of this is that you need enough memory to hold all the data from HAVE (and all the memory to record the complete set of keys). That's not a problem here, but it could be a problem for a large data set.
There would be many ways to mitigate this problem. One way to partially reduce memory is to substitute the commented lines above for their respective preceding statements. This allows the variable MAKE to be the sole key, reducing memory demand. But it also obligates the programmer to accommodate duplicate key values - hence the multidata:'Y' option.
And if memory is still a problem you could list the anticipated make's as explicit dataset names in the DATA statement, then use the hash objects for unanticipated makes. Let's say you know you will have datasets Chevrolet, Ford, and Toyota: All others are unanticipated. Then this would use a lot less memory:
data Chevrolet (where=(make='Chevrolet'))
Ford (where=(make='Ford'))
Toyota (where=(make='Toyota'));
set have end=end_of_have;
if _n_=1 then do;
declare hash h;
declare hash hoh (ordered:'a');
hoh.definekey('make');
hoh.definedata('make','h');
hoh.definedone();
declare hiter hohi ('hoh');
end;
if findw('Chevrolet Ford Toyota',trim(make),' ','e') then do;
output;
return;
end;
if hoh.find()^=0 then do;
h=_new_ hash(dataset:'have (obs=0)',multidata:'Y');
h.definekey('make');
h.definedata(all:'Y');
h.definedone();
hoh_add=hoh.add();
end;
h.add();
if end_of_have then do rc=hohi.first() by 0 until (hohi.next()^=0);
h.output(dataset:translate(trim(make),'___',' -&'));
end;
run;
Hi @Q1983
You ask for two things:
1) Segment into separate datasets by make, and
2) Proc export with mMake as the name of each dataset at export.
I think the simplest solution to both problems is to use code generation as mentioned by @Tom . The idea is to make a list of Makes and execute a series of Data- or Proc Export steps, each using a subset of data that contains a single make. So the principle is:
* Get list of makes;
proc sql;
create table mlist as
select distinct make
from sashelp.cars;
quit;
data _null_;
set mlist;
code = 'Program step to do something based on current observarion in input data';
call execute(code);
run;
The code for a Data Step or Proc Export to handle a single Make is (example):
data Acura;
set sashelp.cars where=(make="Acura"));
run;
proc export data=sashelp.cars where=(make="Acura"))
outfile="c:\temp\Acura.csv"
dbms=csv;
run;
So this code should be inculded in the Data Step with Call Execute and modified, so the hardcoded "Acura" in the examples is substituted with the current Data Step variable Make. Note that it is necessary to remove hyphens and embedded blanks from Make to get a valid SAS Data Set name.
The full working code:
* Get list of makes;
proc sql;
create table mlist as
select distinct make
from sashelp.cars;
quit;
* Split in named data sets;
data _null_; set mlist;
code = cat('data ',compress(make,'- '),'; set sashelp.cars; where make="',make,'"; run;');
call execute(code);
run;
* export to named files;
data _null_;
set mlist;
code = cat('proc export data=sashelp.cars(where=(make="',make,'"))outfile="c:\temp\',make,'.csv" dbms=csv; run;');
call execute(code);
run;
The question I have for you is: why? Why do you want to do such a thing.
Normally, using BY statements or WHERE statements suffice for any analysis in SAS that you might want to do. There is rarely a need to take a dataset and split it into piece by the value of a variable.
The split based on Make is needed because I need to report to different entities (customers) based on Make. Actually I am just using the sashelp.cars as an example. The true split is based on csv files inputted from different states and processed through a sas program The results need to be split based on the state.
You do not need to split the dataset, you only export in one step to different files, as @Tom has already shown.
Saying the results need to be in a certain form does not explain WHY they need to be in a certain form.
Try this, it skips the step of creating subsets and writes the datasets directly to a CSV file for each name in the CLASS data set with a header row.
This approach will not work if you have a large number of variables - the macro variable lists have a limit of 64k characters.
*name of the data set with the original data;
%let lib_name = sashelp;
%let dsn_name = class;
*Variable to split on;
%let var_split = NAME;
*path to folder to save text files;
%let path_folder= /home/fkhurshed/;
*if you are exporting each line this is not required
but should not cause any issues unless your data set is large. In that case rename your data set to _temp and skip this step;
PROC SORT DATA=&lib_name..&dsn_name OUT=_temp;
BY &var_split;
RUN;
*make variable lists;
*for header row;
proc sql noprint;
select name into :var_list_csv separated by ", " from sashelp.vcolumn where libname = upper("&lib_name") and memname = upper("&dsn_name");
select name into :var_list separated by " " from sashelp.vcolumn where libname = upper("&lib_name") and memname = upper("&dsn_name");
quit;
DATA _NULL_;
SET _temp; *Dataset to be exported;
BY &var_split.; *Variable that file is to be split on;
*Create path to file that is to be exported;
if first.&var_split. then out_file=cats("&path_folder.", &var_split., ".csv");
file temp filevar=out_file dlm=',' dsd;
*If first value of make then output column names;
if first.&var_split. then
put "&var_list_csv.";
*Output variables;
put &var_list.;
run;
@Q1983 wrote:
%Let rptout = /myshare; data have; set sashelp.cars; run; %macro eof (Make); Data _Null_; Set have; FILE "&rptout/&Make._report1.csv" DLM="," dsd; RUN; %mend; %eof(Acura,'Acura'); %eof(Audi,'Audi'); %eof(BMW,'BMW');1. Is there a way to automate programmatically and segment into separate datasets by make? (ie if make = 'Acura' output Acura)
2. After executing step 1 is there a way to proc export based on the number of datasets created. Objective is to assign the Make as the name of each dataset at export
Why are you runing the same SQL query twice?
Instead of two queries like:
select name into :var_list_csv separated by ", " from sashelp.vcolumn where libname = upper("&lib_name") and memname = upper("&dsn_name");
select name into :var_list separated by " " from sashelp.vcolumn where libname = upper("&lib_name") and memname = upper("&dsn_name");
Just use one:
select name
, name
into :var_list_csv separated by ", "
, :var_list separated by " "
from sashelp.vcolumn
where libname = upper("&lib_name") and memname = upper("&dsn_name")
;
proc sql noprint;
select distinct make into :carlist separated by '|'
from sashelp.cars
where make is not null;
;
quit;
%put &carlist .;
%macro each;
%local i next_car;
%let i=1;
%do %while (%scan(&carlist, &i) ne );
%let next_car=%scan(&carlist,&i, |);
data %sysfunc(compress(&next_car)); /* Removes space from Land Rover */
set sashelp.cars;
where make eq "&next_car";
run;
proc export
outfile="~/csv/%sysfunc(compress(&next_car)).csv"
dbms=csv
replace;
run;
%let i = %eval(&i + 1);
%end;
%mend;
%each;
If you just need individual csv files and not generate individual datasets, the following will do it.
Note, names with embedded spaces will have the blanks removed
proc sql noprint;
select distinct make into :carlist separated by '|'
from sashelp.cars
where make is not null;
;
quit;
%macro each;
%local i next_car;
%let i=1;
%do %while (%scan(&carlist, &i) ne );
%let next_car=%scan(&carlist,&i, |);
proc export
data=sashelp.cars(where=(make="&next_car"))
outfile="~/csv/%sysfunc(compress(&next_car)).csv"
dbms=csv
replace;
run;
%let i = %eval(&i + 1);
%end;
%mend;
%each;
Hi @Q1983
If the problem is just about creating a bunch of csv files, the simple solution is to do it in one step as suggested. But it pays to seperate the looping logic and the task code, if the task code is something more complicated. which I think would be the case in "real life".
We have lots of running jobs that exports data to individual csv files or spreadsheets for each department, each file with department name and a date suffix. But the task doesn't stop there. The files must be stored in a documentation folder, and mails with a text body specifying actions to take based on actual content + the file must be sent to individual receivers.
The code could easily become very complicated and difficult to maintain, if everything is done in one step with lots of if-first and if-last constructs to handle department boundaries. It is much simpler in my opinion to write a macro that handles one department whith the department name and meil receiver as arguments, and use some control logic to call the macro.
I have rewritten my previous example to use a macro + control logic. It is not as effecient as one step, because input data must be processed for each make, but I think it is a small price to pay, because it is much simpler to expand the macro to do whatever is wanted, if it works with one object only.
* Task code for a given make;
%let outfolder = c:\temp;
%macro writefile(make);
proc export data=sashelp.cars(where=(make="&make"))
outfile="&outfolder\&make..csv" dbms=csv replace;
run;
%mend;
* Get list of makes;
proc sql;
create table mlist as
select distinct make
from sashelp.cars;
quit;
* export to named files;
data _null_;
set mlist;
call execute('%writefile(' || make || ');');
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.