BookmarkSubscribeRSS Feed
Q1983
Lapis Lazuli | Level 10
%Let rptout     = /myshare; 

data have;
set sashelp.cars;
run;



%macro eof (Make);
Data _Null_;
   Set have;
   FILE "&rptout/&Make._report1.csv" DLM="," dsd;
      RUN;
	  %mend;
	  %eof(Acura,'Acura');
	  %eof(Audi,'Audi');
	  %eof(BMW,'BMW');

1. Is there a way to automate programmatically and segment into separate datasets by make? (ie if make = 'Acura' output Acura)

2. After executing step 1 is there a way to proc export based on the number of datasets created. Objective is to assign the Make as the name of each dataset at export

 

12 REPLIES 12
Tom
Super User Tom
Super User

Creating multiple text files is simple. Use the FILEVAR= option on the FILE statement.

 

data _null_;
  set have;
  length filevar $256 ;
  filevar=cats("&rptout/",make,'.csv');
  file csv dsd filevar=filevar ;
  put (_all_) (+0);
run;

 

Creating separate datasets is harder than creating separated text files.  For that you would need to use code generation.

mkeintz
PROC Star

@Tom wrote:

 

...

 

Creating separate datasets is harder than creating separated text files.  For that you would need to use code generation.

But i

f you are willing to use a set hashes (and a hash of hashes), then you don't have to use code generation to create one dataset per make:

 

 

data have;
  set sashelp.cars;
run;

data _null_; 
  set have end=end_of_have;

  if _n_=1 then do;
    declare hash h;

    declare hash hoh (ordered:'a');
      hoh.definekey('make');
      hoh.definedata('make','h');
      hoh.definedone(); 
    declare hiter hohi ('hoh');
  end;

  if hoh.find()^=0 then do;
    h=_new_ hash(dataset:'have (obs=0)');   
    * or (see note) h=_new_ hash(dataset:'have (obs=0)',multidata:'Y');
      h.definekey(all:'Y');
      *or (see note) h.definekey('make');
      h.definedata(all:'Y');
      h.definedone();
    hoh_add=hoh.add();
  end;
  h.add();

  if end_of_have then do rc=hohi.first() by 0 until (hohi.next()^=0);
    h.output(dataset:translate(trim(make),'___',' -&'));
  end;
run;

 

 

The program above creates one hash object per make.  Note that the hash object is named h for every instance - i.e. for each make.  It is use of the hash-of-hashes object hoh that manages those individual hash objects such that each incoming observation becomes a data item in the associated hash h.

Now the hash objects h are keyed on all variables found in HAVE.  Assuming no observations are completely identical, this will avoid the problem of dealing with "collisions" in the hash objects, whose default is to keep only one dataitem (think row) per set of key values.  That would be a problem if the h objects were keyed only on make.

 

The primary downside of this is that you need enough memory to hold all the data from HAVE (and all the memory to record the complete set of keys).  That's not a problem here, but it could be a problem for a large data set.

 

There would be many ways to mitigate this problem.  One way to partially reduce memory is to substitute the commented lines above for their respective preceding statements.  This allows the variable MAKE to be the sole key, reducing memory demand.  But it also obligates the programmer to accommodate duplicate key values - hence the multidata:'Y' option.

And if memory is still a problem you could list the anticipated make's as explicit dataset names in the DATA statement, then use the hash objects for unanticipated makes.  Let's say you know you will have datasets Chevrolet, Ford, and Toyota:  All others are unanticipated. Then this would use a lot less memory:

 

 

data Chevrolet (where=(make='Chevrolet'))
     Ford      (where=(make='Ford'))
     Toyota    (where=(make='Toyota'));
  set have end=end_of_have;

  if _n_=1 then do;
    declare hash h;
    declare hash hoh (ordered:'a');
      hoh.definekey('make');
      hoh.definedata('make','h');
      hoh.definedone(); 
    declare hiter hohi ('hoh');
  end;
  
  if findw('Chevrolet Ford Toyota',trim(make),' ','e') then do;
     output;
     return;
  end;

  if hoh.find()^=0 then do;
    h=_new_ hash(dataset:'have (obs=0)',multidata:'Y');
      h.definekey('make');
      h.definedata(all:'Y');
      h.definedone();
    hoh_add=hoh.add();
  end;
  h.add();

  if end_of_have then do rc=hohi.first() by 0 until (hohi.next()^=0);
    h.output(dataset:translate(trim(make),'___',' -&'));
  end;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ErikLund_Jensen
Rhodochrosite | Level 12

Hi @Q1983 

You ask for two things:

1) Segment into separate datasets by make, and

2) Proc export with mMake as the name of each dataset at export.

 

I think the simplest solution to both problems is to use code generation as mentioned by @Tom . The idea is to make a list of Makes and execute a series of Data- or Proc Export steps, each using a subset of data that contains a single make. So the principle is:

 

* Get list of makes;
proc sql;
  create table mlist as
    select distinct make
    from sashelp.cars;
quit;

data _null_; 
  set mlist;
  code = 'Program step to do something based on current observarion in input data';
  call execute(code);
run;
  

The code for a Data Step or Proc Export to handle a single Make is (example):

 

data Acura;
  set sashelp.cars where=(make="Acura"));
run;

proc export data=sashelp.cars where=(make="Acura"))
  outfile="c:\temp\Acura.csv"
  dbms=csv; 
run;

So this code should be inculded in the Data Step with Call Execute and modified, so the hardcoded "Acura" in the examples is substituted with the current Data Step variable Make. Note that it is necessary to remove hyphens and embedded blanks from Make to get a valid SAS Data Set name.

 

The full working code:

 

* Get list of makes;
proc sql;
  create table mlist as
    select distinct make
    from sashelp.cars;
quit;

* Split in named data sets;
data _null_; set mlist;
  code = cat('data ',compress(make,'- '),'; set sashelp.cars; where make="',make,'"; run;');
  call execute(code);
run;

* export to named files;
data _null_; 
  set mlist;
  code = cat('proc export data=sashelp.cars(where=(make="',make,'"))outfile="c:\temp\',make,'.csv" dbms=csv; run;');
  call execute(code);
run;

 

 

 

 

PaigeMiller
Diamond | Level 26

The question I have for you is: why? Why do you want to do such a thing.

 

Normally, using BY statements or WHERE statements suffice for any analysis in SAS that you might want to do. There is rarely a need to take a dataset and split it into piece by the value of a variable.

--
Paige Miller
Q1983
Lapis Lazuli | Level 10

The split based on Make is needed because I need to report to different entities (customers) based on Make.  Actually I am just using the sashelp.cars as an example.  The true split is based on csv files inputted from different states and processed through a sas program  The results need to be split based on the state.

PaigeMiller
Diamond | Level 26

Saying the results need to be in a certain form does not explain WHY they need to be in a certain form.

--
Paige Miller
Reeza
Super User

Try this, it skips the step of creating subsets and writes the datasets directly to a CSV file for each name in the CLASS data set with a header row. 

This approach will not work if you have a large number of variables - the macro variable lists have a limit of 64k characters. 

 

*name of the data set with the original data;
%let lib_name = sashelp;
%let dsn_name = class;
*Variable to split on;
%let var_split = NAME;
*path to folder to save text files;
%let path_folder= /home/fkhurshed/;




*if you are exporting each line this is not required 
but should not  cause any issues unless your data set is large. In that case rename your data set to _temp and skip this step;
PROC SORT DATA=&lib_name..&dsn_name OUT=_temp;
BY &var_split;
RUN;



*make variable lists;
*for header row;
proc sql noprint;
select name into :var_list_csv separated by ", " from sashelp.vcolumn where libname = upper("&lib_name") and memname = upper("&dsn_name");
select name into :var_list separated by " " from sashelp.vcolumn where libname = upper("&lib_name") and memname = upper("&dsn_name");
quit;



DATA _NULL_;

SET _temp; *Dataset to be exported;
BY &var_split.; *Variable that file is to be split on;

*Create path to file that is to be exported;
if first.&var_split. then out_file=cats("&path_folder.", &var_split., ".csv");

file temp filevar=out_file dlm=',' dsd;

*If first value of make then output column names;
if first.&var_split. then 
put "&var_list_csv.";

*Output variables;
put &var_list.;

run;

 


@Q1983 wrote:
%Let rptout     = /myshare; 

data have;
set sashelp.cars;
run;



%macro eof (Make);
Data _Null_;
   Set have;
   FILE "&rptout/&Make._report1.csv" DLM="," dsd;
      RUN;
	  %mend;
	  %eof(Acura,'Acura');
	  %eof(Audi,'Audi');
	  %eof(BMW,'BMW');

1. Is there a way to automate programmatically and segment into separate datasets by make? (ie if make = 'Acura' output Acura)

2. After executing step 1 is there a way to proc export based on the number of datasets created. Objective is to assign the Make as the name of each dataset at export

 


 

Tom
Super User Tom
Super User

Why are you runing the same SQL query twice?
Instead of two queries like:

select name into :var_list_csv separated by ", " from sashelp.vcolumn where libname = upper("&lib_name") and memname = upper("&dsn_name");
select name into :var_list separated by " " from sashelp.vcolumn where libname = upper("&lib_name") and memname = upper("&dsn_name");

Just use one:

select name 
     , name
  into :var_list_csv separated by ", " 
     , :var_list separated by " " 
  from sashelp.vcolumn 
  where libname = upper("&lib_name") and memname = upper("&dsn_name")
;
ghosh
Barite | Level 11
proc sql noprint;
  select distinct make into :carlist separated by '|'
  from sashelp.cars
  where make is not null;
;
quit;

%put &carlist .;
%macro each;
	%local i next_car;
	%let i=1;
	%do %while (%scan(&carlist, &i) ne );
		%let next_car=%scan(&carlist,&i, |);
		
		data %sysfunc(compress(&next_car)); /* Removes space from Land Rover */
		 set sashelp.cars;
		    where make eq "&next_car";
		run;
				
		proc export  
		     outfile="~/csv/%sysfunc(compress(&next_car)).csv"
		     dbms=csv 
		     replace;
		run;	
		
		  %let i = %eval(&i + 1);
	%end;
%mend;

%each;

ghosh_0-1645986730862.png

 

ghosh
Barite | Level 11

If you just need individual csv files and not generate individual datasets, the following will do it.

Note, names with embedded spaces will have the blanks removed  

proc sql noprint;
  select distinct make into :carlist separated by '|'
  from sashelp.cars
  where make is not null;
;
quit;

%macro each;
	%local i next_car;
	%let i=1;
	%do %while (%scan(&carlist, &i) ne );
		%let next_car=%scan(&carlist,&i, |);		

		proc export  
		   data=sashelp.cars(where=(make="&next_car"))	       
		     outfile="~/csv/%sysfunc(compress(&next_car)).csv"
		     dbms=csv 
		     replace;		  
		run;			
		  %let i = %eval(&i + 1);
	%end;
%mend;

%each;

 

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @Q1983 

 

If the problem is just about creating a bunch of csv files, the simple solution is to do it in one step as suggested. But it pays to seperate the looping logic and the task code, if the task code is something more complicated. which I think would be the case in "real life".

 

We have lots of running jobs that exports data to individual csv files or spreadsheets for each department, each file with department name and a date suffix. But the task doesn't stop there. The files must be stored in a documentation folder, and mails with a text body specifying actions to take based on actual content + the file must be sent to individual receivers.

 

The code could easily become very complicated and difficult to maintain, if everything is done in one step with lots of if-first and if-last constructs to handle department boundaries. It is much simpler in my opinion to write a macro that handles one department whith the department name and meil receiver as arguments, and use some control logic to call the macro.

 

I have rewritten my previous example to use a macro + control logic. It is not as effecient as one step, because input data must be processed for each make, but I think it is a small price to pay, because it is much simpler to expand the macro to do whatever is wanted, if it works with one object only.

 

* Task code for a given make;
%let outfolder = c:\temp;
%macro writefile(make);
  proc export data=sashelp.cars(where=(make="&make"))
    outfile="&outfolder\&make..csv" dbms=csv replace;
  run;
%mend;

* Get list of makes;
proc sql;
  create table mlist as
    select distinct make
    from sashelp.cars;
quit;

* export to named files;
data _null_; 
  set mlist;
  call execute('%writefile(' || make || ');');
run;


 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 2047 views
  • 1 like
  • 8 in conversation