Hello
What is the way to split a data set to multiple data sets by value of a categorical variable?
Let's say that my data set called RawTbl has a categorical field called group.
LEt's say that the number of the values that group field get is not fixed (It means that each day that RawTbl data set is published there are different number of categories).
The target is to split data set RawTbl into multiple data sets (for each value of firld group there will be a seprate data set).
The name of each data set will be new_name_of_the_group
@novinosrin wrote:
http://www.sascommunity.org/wiki/Split_Data_into_Subsets
Hey, @Ronein , please read the above wiki article, where it says
The reason it says "Don't Do it" is that there are rarely good reasons to do this, and plenty of drawbacks. Don't make your life harder than it has to be.
@Ronein: Beating a dead horse, eh?
If your input is sorted by your categorical variable, cool. If not, index it to facilitate BY processing. Then use hash extrusion to get what you want. Example (the categorical variable below being ORIGIN):
data have (index = (origin)) ;
set sashelp.cars ;
run ;
data _null_ ;
if _n_ = 1 then do ;
dcl hash h (dataset: "have (obs=0)", multidata:"y") ;
h.definekey ("origin") ;
h.definedata (all: "y") ;
h.definedone () ;
end ;
do until (last.origin) ;
set have ;
by origin ;
h.add() ;
end ;
h.output (dataset: catx ("_", "ds", origin)) ;
h.clear() ;
run ;
The alternative is to write code to pre-read the input and assemble a DATA statement with the requisite output file names plus a SELECT block or bunch of IF-THEN-ELSE statements for each distinct value of the categorical variable; then execute the generated code. The generation can be done by a macro, SQL, CALL EXECUTE, PUT/%INC, etc. Example (SQL):
proc sql noprint ; select distinct catx ("_", "ds", origin) , catx (" ", "when (", quote (trim (origin)), ") output", catx ("_", "ds", origin)) into :filelist separated by ' ' , :whenlist separated by ';' from have ; quit ; data &filelist ; set have ; select (origin) ; &whenlist ; otherwise ; end ; run ;
Both methods have their pros and cons:
Note that in the hash case, the splitting can be done using hash-of-hashes without the source file having to be either sorted or indexed (the method invented by @RichardDeVen in 2004 in the wake of SUGI in Montreal). However, in this case, memory must be plentiful enough to fit the entire file in the hash table memory.
Kind regards
Paul D.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.