output based on variable in dataset

Q1983 · Posted 02-26-2022 08:41 PM

%Let rptout     = /myshare; 

data have;
set sashelp.cars;
run;



%macro eof (Make);
Data _Null_;
   Set have;
   FILE "&rptout/&Make._report1.csv" DLM="," dsd;
      RUN;
	  %mend;
	  %eof(Acura,'Acura');
	  %eof(Audi,'Audi');
	  %eof(BMW,'BMW');

1. Is there a way to automate programmatically and segment into separate datasets by make? (ie if make = 'Acura' output Acura)

2. After executing step 1 is there a way to proc export based on the number of datasets created. Objective is to assign the Make as the name of each dataset at export

Tom · Posted 02-26-2022 09:21 PM

Creating multiple text files is simple. Use the FILEVAR= option on the FILE statement.

data _null_;
  set have;
  length filevar $256 ;
  filevar=cats("&rptout/",make,'.csv');
  file csv dsd filevar=filevar ;
  put (_all_) (+0);
run;

Creating separate datasets is harder than creating separated text files. For that you would need to use code generation.

mkeintz · Posted 02-27-2022 01:12 PM

@Tom wrote:

...

Creating separate datasets is harder than creating separated text files. For that you would need to use code generation.

But i

f you are willing to use a set hashes (and a hash of hashes), then you don't have to use code generation to create one dataset per make:

data have;
  set sashelp.cars;
run;

data _null_; 
  set have end=end_of_have;

  if _n_=1 then do;
    declare hash h;

    declare hash hoh (ordered:'a');
      hoh.definekey('make');
      hoh.definedata('make','h');
      hoh.definedone(); 
    declare hiter hohi ('hoh');
  end;

  if hoh.find()^=0 then do;
    h=_new_ hash(dataset:'have (obs=0)');   
    * or (see note) h=_new_ hash(dataset:'have (obs=0)',multidata:'Y');
      h.definekey(all:'Y');
      *or (see note) h.definekey('make');
      h.definedata(all:'Y');
      h.definedone();
    hoh_add=hoh.add();
  end;
  h.add();

  if end_of_have then do rc=hohi.first() by 0 until (hohi.next()^=0);
    h.output(dataset:translate(trim(make),'___',' -&'));
  end;
run;

The program above creates one hash object per make. Note that the hash object is named h for every instance - i.e. for each make. It is use of the hash-of-hashes object hoh that manages those individual hash objects such that each incoming observation becomes a data item in the associated hash h.

Now the hash objects h are keyed on all variables found in HAVE. Assuming no observations are completely identical, this will avoid the problem of dealing with "collisions" in the hash objects, whose default is to keep only one dataitem (think row) per set of key values. That would be a problem if the h objects were keyed only on make.

The primary downside of this is that you need enough memory to hold all the data from HAVE (and all the memory to record the complete set of keys). That's not a problem here, but it could be a problem for a large data set.

There would be many ways to mitigate this problem. One way to partially reduce memory is to substitute the commented lines above for their respective preceding statements. This allows the variable MAKE to be the sole key, reducing memory demand. But it also obligates the programmer to accommodate duplicate key values - hence the multidata:'Y' option.

And if memory is still a problem you could list the anticipated make's as explicit dataset names in the DATA statement, then use the hash objects for unanticipated makes. Let's say you know you will have datasets Chevrolet, Ford, and Toyota: All others are unanticipated. Then this would use a lot less memory:

data Chevrolet (where=(make='Chevrolet'))
     Ford      (where=(make='Ford'))
     Toyota    (where=(make='Toyota'));
  set have end=end_of_have;

  if _n_=1 then do;
    declare hash h;
    declare hash hoh (ordered:'a');
      hoh.definekey('make');
      hoh.definedata('make','h');
      hoh.definedone(); 
    declare hiter hohi ('hoh');
  end;
  
  if findw('Chevrolet Ford Toyota',trim(make),' ','e') then do;
     output;
     return;
  end;

  if hoh.find()^=0 then do;
    h=_new_ hash(dataset:'have (obs=0)',multidata:'Y');
      h.definekey('make');
      h.definedata(all:'Y');
      h.definedone();
    hoh_add=hoh.add();
  end;
  h.add();

  if end_of_have then do rc=hohi.first() by 0 until (hohi.next()^=0);
    h.output(dataset:translate(trim(make),'___',' -&'));
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ErikLund_Jensen · Posted 02-27-2022 10:06 AM

Hi @Q1983

You ask for two things:

1) Segment into separate datasets by make, and

2) Proc export with mMake as the name of each dataset at export.

I think the simplest solution to both problems is to use code generation as mentioned by @Tom . The idea is to make a list of Makes and execute a series of Data- or Proc Export steps, each using a subset of data that contains a single make. So the principle is:

* Get list of makes;
proc sql;
  create table mlist as
    select distinct make
    from sashelp.cars;
quit;

data _null_; 
  set mlist;
  code = 'Program step to do something based on current observarion in input data';
  call execute(code);
run;

The code for a Data Step or Proc Export to handle a single Make is (example):

data Acura;
  set sashelp.cars where=(make="Acura"));
run;

proc export data=sashelp.cars where=(make="Acura"))
  outfile="c:\temp\Acura.csv"
  dbms=csv; 
run;

So this code should be inculded in the Data Step with Call Execute and modified, so the hardcoded "Acura" in the examples is substituted with the current Data Step variable Make. Note that it is necessary to remove hyphens and embedded blanks from Make to get a valid SAS Data Set name.

The full working code:

* Get list of makes;
proc sql;
  create table mlist as
    select distinct make
    from sashelp.cars;
quit;

* Split in named data sets;
data _null_; set mlist;
  code = cat('data ',compress(make,'- '),'; set sashelp.cars; where make="',make,'"; run;');
  call execute(code);
run;

* export to named files;
data _null_; 
  set mlist;
  code = cat('proc export data=sashelp.cars(where=(make="',make,'"))outfile="c:\temp\',make,'.csv" dbms=csv; run;');
  call execute(code);
run;

PaigeMiller · Posted 02-27-2022 11:53 AM

The question I have for you is: why? Why do you want to do such a thing.

Normally, using BY statements or WHERE statements suffice for any analysis in SAS that you might want to do. There is rarely a need to take a dataset and split it into piece by the value of a variable.

--
Paige Miller

Q1983 · Posted 02-27-2022 12:19 PM

The split based on Make is needed because I need to report to different entities (customers) based on Make. Actually I am just using the sashelp.cars as an example. The true split is based on csv files inputted from different states and processed through a sas program The results need to be split based on the state.

Kurt_Bremser · Posted 02-27-2022 12:49 PM

You do not need to split the dataset, you only export in one step to different files, as @Tom has already shown.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PaigeMiller · Posted 02-27-2022 12:57 PM

Saying the results need to be in a certain form does not explain WHY they need to be in a certain form.

--
Paige Miller

Reeza · Posted 02-27-2022 01:05 PM

Try this, it skips the step of creating subsets and writes the datasets directly to a CSV file for each name in the CLASS data set with a header row.

This approach will not work if you have a large number of variables - the macro variable lists have a limit of 64k characters.

*name of the data set with the original data;
%let lib_name = sashelp;
%let dsn_name = class;
*Variable to split on;
%let var_split = NAME;
*path to folder to save text files;
%let path_folder= /home/fkhurshed/;




*if you are exporting each line this is not required 
but should not  cause any issues unless your data set is large. In that case rename your data set to _temp and skip this step;
PROC SORT DATA=&lib_name..&dsn_name OUT=_temp;
BY &var_split;
RUN;



*make variable lists;
*for header row;
proc sql noprint;
select name into :var_list_csv separated by ", " from sashelp.vcolumn where libname = upper("&lib_name") and memname = upper("&dsn_name");
select name into :var_list separated by " " from sashelp.vcolumn where libname = upper("&lib_name") and memname = upper("&dsn_name");
quit;



DATA _NULL_;

SET _temp; *Dataset to be exported;
BY &var_split.; *Variable that file is to be split on;

*Create path to file that is to be exported;
if first.&var_split. then out_file=cats("&path_folder.", &var_split., ".csv");

file temp filevar=out_file dlm=',' dsd;

*If first value of make then output column names;
if first.&var_split. then 
put "&var_list_csv.";

*Output variables;
put &var_list.;

run;

@Q1983 wrote:
%Let rptout     = /myshare; 

data have;
set sashelp.cars;
run;



%macro eof (Make);
Data _Null_;
   Set have;
   FILE "&rptout/&Make._report1.csv" DLM="," dsd;
      RUN;
	  %mend;
	  %eof(Acura,'Acura');
	  %eof(Audi,'Audi');
	  %eof(BMW,'BMW');
1. Is there a way to automate programmatically and segment into separate datasets by make? (ie if make = 'Acura' output Acura)

2. After executing step 1 is there a way to proc export based on the number of datasets created. Objective is to assign the Make as the name of each dataset at export

Tom · Posted 02-27-2022 01:14 PM

Why are you runing the same SQL query twice?
Instead of two queries like:

select name into :var_list_csv separated by ", " from sashelp.vcolumn where libname = upper("&lib_name") and memname = upper("&dsn_name");
select name into :var_list separated by " " from sashelp.vcolumn where libname = upper("&lib_name") and memname = upper("&dsn_name");

Just use one:

select name 
     , name
  into :var_list_csv separated by ", " 
     , :var_list separated by " " 
  from sashelp.vcolumn 
  where libname = upper("&lib_name") and memname = upper("&dsn_name")
;

ghosh · Posted 02-27-2022 01:34 PM

proc sql noprint;
  select distinct make into :carlist separated by '|'
  from sashelp.cars
  where make is not null;
;
quit;

%put &carlist .;
%macro each;
	%local i next_car;
	%let i=1;
	%do %while (%scan(&carlist, &i) ne );
		%let next_car=%scan(&carlist,&i, |);
		
		data %sysfunc(compress(&next_car)); /* Removes space from Land Rover */
		 set sashelp.cars;
		    where make eq "&next_car";
		run;
				
		proc export  
		     outfile="~/csv/%sysfunc(compress(&next_car)).csv"
		     dbms=csv 
		     replace;
		run;	
		
		  %let i = %eval(&i + 1);
	%end;
%mend;

%each;

ghosh · Posted 02-27-2022 03:31 PM

If you just need individual csv files and not generate individual datasets, the following will do it.

Note, names with embedded spaces will have the blanks removed

proc sql noprint;
  select distinct make into :carlist separated by '|'
  from sashelp.cars
  where make is not null;
;
quit;

%macro each;
	%local i next_car;
	%let i=1;
	%do %while (%scan(&carlist, &i) ne );
		%let next_car=%scan(&carlist,&i, |);		

		proc export  
		   data=sashelp.cars(where=(make="&next_car"))	       
		     outfile="~/csv/%sysfunc(compress(&next_car)).csv"
		     dbms=csv 
		     replace;		  
		run;			
		  %let i = %eval(&i + 1);
	%end;
%mend;

%each;

ErikLund_Jensen · Posted 02-27-2022 02:09 PM

Hi @Q1983

If the problem is just about creating a bunch of csv files, the simple solution is to do it in one step as suggested. But it pays to seperate the looping logic and the task code, if the task code is something more complicated. which I think would be the case in "real life".

We have lots of running jobs that exports data to individual csv files or spreadsheets for each department, each file with department name and a date suffix. But the task doesn't stop there. The files must be stored in a documentation folder, and mails with a text body specifying actions to take based on actual content + the file must be sent to individual receivers.

The code could easily become very complicated and difficult to maintain, if everything is done in one step with lots of if-first and if-last constructs to handle department boundaries. It is much simpler in my opinion to write a macro that handles one department whith the department name and meil receiver as arguments, and use some control logic to call the macro.

I have rewritten my previous example to use a macro + control logic. It is not as effecient as one step, because input data must be processed for each make, but I think it is a small price to pay, because it is much simpler to expand the macro to do whatever is wanted, if it works with one object only.

* Task code for a given make;
%let outfolder = c:\temp;
%macro writefile(make);
  proc export data=sashelp.cars(where=(make="&make"))
    outfile="&outfolder\&make..csv" dbms=csv replace;
  run;
%mend;

* Get list of makes;
proc sql;
  create table mlist as
    select distinct make
    from sashelp.cars;
quit;

* export to named files;
data _null_; 
  set mlist;
  call execute('%writefile(' || make || ');');
run;

output based on variable in dataset

Re: output based on variable in dataset

Re: output based on variable in dataset

Re: output based on variable in dataset

Re: output based on variable in dataset

Re: output based on variable in dataset

Re: output based on variable in dataset

Re: output based on variable in dataset

Re: output based on variable in dataset

Re: output based on variable in dataset

Re: output based on variable in dataset

Re: output based on variable in dataset

Re: output based on variable in dataset

SAS Innovate 2025: Call for Content

Classroom Training Available!