Been searching around the internet a while for a solution to this one.. Lets say I have a dataset with the variable ID as
data have;
input ID $ var1 var2;
datalines;
one 100 200
three 300 400
three 500 200
three 200 100
two 400 200
two 300 166
one 300 100
three 400 400
;
Now, what if I want to dynamically split up this dataset into three datasets called one two and three? And I want to input the unique variable names of ID into a macro variable such as macrovar = one two three 🙂
data have;
input ID $ var1 var2;
datalines;
one 100 200
one 100 200
three 300 400
three 500 200
three 200 100
two 400 200
two 300 166
one 300 100
three 400 400
;
proc sort data=have;by id;run;
data _null_;
if _n_=1 then do;
if 0 then set have;
declare hash h(dataset:'have(obs=0)',multidata:'y');
h.definekey(all:'y');
h.definedata(all:'y');
h.definedone();
end;
do until(last.id);
set have;
by id;
h.add();
end;
h.output(dataset:id);
h.clear();
run;
@lakshmi_74 thank you but I am looking for a dynamic solution, which would also be applicable if I had a fourth variable 🙂
@PeterClemmensen wrote:
Been searching around the internet a while for a solution to this one.. Lets say I have a dataset with the variable ID as
data have; input ID $ var1 var2; datalines; one 100 200 three 300 400 three 500 200 three 200 100 two 400 200 two 300 166 one 300 100 three 400 400 ;
Now, what if I want to dynamically split up this dataset into three datasets called one two and three? And I want to input the unique variable names of ID into a macro variable such as macrovar = one two three 🙂
Don't store in a macro variable, store in a dataset:
proc sort
data=have (keep=id)
out=lookup
nodupkey
;
by id;
run;
Now you can split in a dynamically created data step:
data _null_;
call execute("data ");
do until (eof1);
set lookup end=eof1;
call execute(trim(id)!!" ");
end;
call execute("; set have;");
do until (eof2);
set lookup end=eof2;
call execute('if id = "'!!trim(id)!!'" then output '!!trim(id)!!';');
end;
call execute("run;")
stop;
run;
But you only should do this if by-group processing is not a viable option.
Also keep in mind that the values in id need to be valid SAS names.
Edit: added "set have;" in the call execute.
I've seen this done with hashing, but I can't point you toward a sample solution.
Of course your ID values all have to be legitimate variable names.
One approach:
proc sql noprint;
select distinct id into : id_list separated by ' ' from have;
quit;
Then inside a macro you can process the list. For example:
data &id_list;
set have;
select (id);
%do i=1 %to %sysfunc(countw(&id_list));
%let next_dataset = %scan(&id_list, &i, %str( ));
when ("&next_dataset") output &next_dataset;
%end;
end;
run;
Exactly the same way as the output to Excel we were talking about in the thread earlier today, just change the proc export part to a datastep. Also, its really not recommend to put "data" in macro variables, especially as lists. It will add layers of complexity to your code.
data have;
input ID $ var1 var2;
datalines;
one 100 200
one 100 200
three 300 400
three 500 200
three 200 100
two 400 200
two 300 166
one 300 100
three 400 400
;
proc sort data=have;by id;run;
data _null_;
if _n_=1 then do;
if 0 then set have;
declare hash h(dataset:'have(obs=0)',multidata:'y');
h.definekey(all:'y');
h.definedata(all:'y');
h.definedone();
end;
do until(last.id);
set have;
by id;
h.add();
end;
h.output(dataset:id);
h.clear();
run;
WPS/SAS: DOSUBL solution for splitting a SAS dataset
WPS does not support DOSUBL
SOAPBOX ON
You may want to use this in non-production jobs.
If many of us use it, there is less of a chance
SAS and/or Corporate IT will discourage, weaken or remove
this powerful 'subroutine' function in 9.4M4.
We do not want this to go the route of
the store command, window, %window, pmenu, command macros, libanme,
passthru to excel/access , command line, function keys...
Note you almost never see SAS supporting the lost functionality
above but you see SAS often supporting SAS studio and EG.
It would help if SAS will unequivocally state that it will
support and enhance all current DOSUBL functionality on all
interfaces (editors not just EG and SAS Studio) and all platforms.
I don't understand why SAS employees are not promoting DOSUBL by
using it to respond to posts like this one. I have experienced push back.
SOAPBOX OFF
Unlike the HASH this does not require a sort.
Unlike 'call execute' and macro solutions we can stop the generation of
code as soon as a 'user defined' or system warning/error occurs.
This post
https://goo.gl/aHXBoZ
https://communities.sas.com/t5/Base-SAS-Programming/Splitting-up-dataset-based-on-unique-variable/m-p/347689
HAVE
====
Up to 40 obs WORK.HAVE total obs=9
Obs ID VAR1 VAR2
1 one 100 200
2 one 100 200
3 three 300 400
4 three 500 200
5 three 200 100
6 two 400 200
7 two 300 166
8 one 300 100
9 three 400 400
WANT
====
WORK.ONE has 3 obs
WORK.TWO has 2 obs
WORK.THREE has 4 obs
Up to 40 obs from WORK.ONE total obs=3
Obs ID VAR1 VAR2
1 one 300 100
2 one 100 200
3 one 100 200
Up to 40 obs from WORK.TWO total obs=2
Obs ID VAR1 VAR2
1 two 400 200
2 two 300 166
Up to 40 obs from WORK.THREE total obs=4
Obs ID VAR1 VAR2
1 three 500 200
2 three 200 100
3 three 300 400
4 three 400 400
WORKING CODE
============
WPS/SAS
declare hash h(dataset:'have(obs=0)',multidata:'y', ordered: "A");
h.output(dataset:id);
* _ _ _
_ __ ___ __ _| | _____ __| | __ _| |_ __ _
| '_ ` _ \ / _` | |/ / _ \_____ / _` |/ _` | __/ _` |
| | | | | | (_| | < __/_____| (_| | (_| | || (_| |
|_| |_| |_|\__,_|_|\_\___| \__,_|\__,_|\__\__,_|
;
data have;
input ID $ var1 var2;
datalines;
one 100 200
three 300 400
three 500 200
three 200 100
two 400 200
two 300 166
one 300 100
three 400 400
;;;;
run;quit;
%symdel dsn sqlobs id / nowarn;
data _null_;
If _n_=0 then do;
rc=%sysfunc(dosubl('
proc sql; select distinct quote(trim(id)) into :id separated by "," from have;quit;
'));
end;
length id $5;
do id=&id;
call symputx('dsn',id);
rc=dosubl('
proc sql;
create
table &dsn as
select
*
from
have
where
id="&dsn"
;quit;
');
if &sqlobs=0 then
put "dataset " id " failed with 0 observations stoppin code generation";
end;
run;quit;
NOTE: Table WORK.ONE created, with 2 rows and 3 columns.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
NOTE: Table WORK.THREE created, with 4 rows and 3 columns.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
NOTE: Table WORK.TWO created, with 2 rows and 3 columns.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
NOTE: DATA statement used (Total process time):
real time 2.21 seconds
cpu time 0.28 seconds
To be totally fair when noting pros and cons, the DOSUBL solution makes a separate pass through the entire data set for each subset being created.
Thanks for the review and insight.
You cannot check completion codes for each interaction with one passthru.
If you want one passthru then use this code.
%symdel dsns ifout id / nowarn;
data _null_;
If _n_=0 then do;
rc=%sysfunc(dosubl('
proc sql; select distinct quote(trim(id)) into :id separated by "," from have;quit;
'));
end;
length id $5;
length dsns $2000;
length ifout $2000;
do id=&id;
dsns = catx(' ',dsns,id);
ifout = catx(' ',ifout,cats('when ("',id,'") output'),id,';');
end;
call symputx('dsns',dsns);
call symputx('ifout',ifout);
rc=dosubl('
data &dsns;
set have;
select (id);
&ifout
end;
run;quit;
');
stop;
run;quit;
NOTE: There were 8 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.ONE has 2 observations and 3 variables.
NOTE: The data set WORK.THREE has 4 observations and 3 variables.
NOTE: The data set WORK.TWO has 2 observations and 3 variables.
It amazes me how flexible DOSUBL is
10 lines of code
data have;
input ID $ var1 var2;
datalines;
one 100 200
one 300 100
three 300 400
three 500 200
three 200 100
three 400 400
two 400 200
two 300 166
;;;;
run;quit;
data _null_;
set have(keep=id);
by id;
if last.id then do;
call symputx('id',id);
rc=dosubl('
data &id;
set have(where=(id="&id"));
run;quit;
');
end;
run;quit;
Even more flexibility, unsorted have
data have;
input ID $ var1 var2;
cards4;
one 100 200
three 300 400
three 500 200
three 200 100
two 400 200
two 300 166
one 300 100
three 400 400
;;;;
run;quit;
proc datasets lib=work;
delete one two three;
run;quit;
data _null_;
set have;
by id notsorted;
call symputx('id',id);
if first.id then call symputx('obsbeg',put(_n_,3.));
if last.id then call symputx('obsend',put(_n_,3.));
if last.id then
rc=dosubl('
proc append base=&id data=have(firstobs=&obsbeg obs=&obsend);
run;quit;
');
run;quit;
Thank you all 🙂
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.