@Ronein: Beating a dead horse, eh?
If your input is sorted by your categorical variable, cool. If not, index it to facilitate BY processing. Then use hash extrusion to get what you want. Example (the categorical variable below being ORIGIN):
data have (index = (origin)) ;
set sashelp.cars ;
run ;
data _null_ ;
if _n_ = 1 then do ;
dcl hash h (dataset: "have (obs=0)", multidata:"y") ;
h.definekey ("origin") ;
h.definedata (all: "y") ;
h.definedone () ;
end ;
do until (last.origin) ;
set have ;
by origin ;
h.add() ;
end ;
h.output (dataset: catx ("_", "ds", origin)) ;
h.clear() ;
run ;
The alternative is to write code to pre-read the input and assemble a DATA statement with the requisite output file names plus a SELECT block or bunch of IF-THEN-ELSE statements for each distinct value of the categorical variable; then execute the generated code. The generation can be done by a macro, SQL, CALL EXECUTE, PUT/%INC, etc. Example (SQL):
proc sql noprint ;
select distinct
catx ("_", "ds", origin)
, catx (" ", "when (", quote (trim (origin)), ") output", catx ("_", "ds", origin))
into :filelist separated by ' '
, :whenlist separated by ';'
from have
;
quit ;
data &filelist ;
set have ;
select (origin) ;
&whenlist ;
otherwise ;
end ;
run ;
Both methods have their pros and cons:
In the hash case, the largest BY group with all its data must fit into the hash table memory. If the source file is split into a large number of small files, not a problem.
But it can become a problem with the other method, as too many data sets in the DATA statement may run the step out of buffer memory. Besides, the assembled code pieces may not fit into the FILELIST and WHENLIST macro variables, in which case a different way of assembling code devoid of this limitation (e.g., PUT/%INC) can be used.
Note that in the hash case, the splitting can be done using hash-of-hashes without the source file having to be either sorted or indexed (the method invented by @RichardDeVen in 2004 in the wake of SUGI in Montreal). However, in this case, memory must be plentiful enough to fit the entire file in the hash table memory.
Kind regards
Paul D.
... View more