DATA Step, Macro, Functions and more

Splitting up dataset based on unique variable

Accepted Solution Solved
Reply
PROC Star
Posts: 551
Accepted Solution

Splitting up dataset based on unique variable

Been searching around the internet a while for a solution to this one.. Lets say I have a dataset with the variable ID as

 

data have;
	input ID $ var1 var2;
datalines;
one 100 200
three 300 400
three 500 200
three 200 100
two 400 200
two 300 166
one 300 100
three 400 400
;

Now, what if I want to dynamically split up this dataset into three datasets called one two and three? And I want to input the unique variable names of ID into a macro variable such as macrovar = one two three Smiley Happy

 


Accepted Solutions
Solution
‎04-07-2017 09:51 AM
Super User
Posts: 9,681

Re: Splitting up dataset based on unique variable

data have;
	input ID $ var1 var2;
datalines;
one 100 200
one 100 200
three 300 400
three 500 200
three 200 100
two 400 200
two 300 166
one 300 100
three 400 400
;

proc sort data=have;by id;run;
data _null_;
 if _n_=1 then do;
   if 0 then set have;
   declare hash h(dataset:'have(obs=0)',multidata:'y');
   h.definekey(all:'y');
   h.definedata(all:'y');
   h.definedone();
 end;
do until(last.id);
 set have;
 by id;
 h.add();
end;
h.output(dataset:id);
h.clear();
run;

View solution in original post


All Replies
Contributor
Posts: 56

Re: Splitting up dataset based on unique variable

data one two three;
set have;
select(ID);
when('one') output one;
when('two') output two;
when('three') output three;
end;
run;
PROC Star
Posts: 551

Re: Splitting up dataset based on unique variable

@lakshmi_74 thank you but I am looking for a dynamic solution, which would also be applicable if I had a fourth variable Smiley Happy

Super User
Posts: 6,938

Re: Splitting up dataset based on unique variable

[ Edited ]

draycut wrote:

Been searching around the internet a while for a solution to this one.. Lets say I have a dataset with the variable ID as

 

data have;
	input ID $ var1 var2;
datalines;
one 100 200
three 300 400
three 500 200
three 200 100
two 400 200
two 300 166
one 300 100
three 400 400
;

Now, what if I want to dynamically split up this dataset into three datasets called one two and three? And I want to input the unique variable names of ID into a macro variable such as macrovar = one two three Smiley Happy

 


Don't store in a macro variable, store in a dataset:

proc sort
  data=have (keep=id)
  out=lookup
  nodupkey
;
by id;
run;

Now you can split in a dynamically created data step:

data _null_;
call execute("data ");
do until (eof1);
  set lookup end=eof1;
  call execute(trim(id)!!" ");
end;
call execute("; set have;");
do until (eof2);
  set lookup end=eof2;
  call execute('if id = "'!!trim(id)!!'" then output '!!trim(id)!!';');
end;
call execute("run;")
stop;
run;

But you only should do this if by-group processing is not a viable option.

Also keep in mind that the values in id need to be valid SAS names.

 

Edit: added "set have;" in the call execute.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 5,083

Re: Splitting up dataset based on unique variable

I've seen this done with hashing, but I can't point you toward a sample solution.

 

Of course your ID values all have to be legitimate variable names.

 

One approach:

 

proc sql noprint;

select distinct id into : id_list separated by ' ' from have;

quit;

 

Then inside a macro you can process the list.  For example:

 

data &id_list;

set have;

select (id);

   %do i=1 %to %sysfunc(countw(&id_list));

        %let next_dataset = %scan(&id_list, &i, %str( ));

         when ("&next_dataset") output &next_dataset;

   %end;

end;

run;

Super User
Posts: 17,829
Super User
Super User
Posts: 7,401

Re: Splitting up dataset based on unique variable

Exactly the same way as the output to Excel we were talking about in the thread earlier today, just change the proc export part to a datastep.  Also, its really not recommend to put "data" in macro variables, especially as lists.  It will add layers of complexity to your code.

Solution
‎04-07-2017 09:51 AM
Super User
Posts: 9,681

Re: Splitting up dataset based on unique variable

data have;
	input ID $ var1 var2;
datalines;
one 100 200
one 100 200
three 300 400
three 500 200
three 200 100
two 400 200
two 300 166
one 300 100
three 400 400
;

proc sort data=have;by id;run;
data _null_;
 if _n_=1 then do;
   if 0 then set have;
   declare hash h(dataset:'have(obs=0)',multidata:'y');
   h.definekey(all:'y');
   h.definedata(all:'y');
   h.definedone();
 end;
do until(last.id);
 set have;
 by id;
 h.add();
end;
h.output(dataset:id);
h.clear();
run;
Valued Guide
Posts: 505

Re: Splitting up dataset based on unique variable

WPS/SAS: DOSUBL solution for splitting a SAS dataset

WPS does not support DOSUBL

SOAPBOX ON

You may want to use this in non-production jobs.
If many of us use it, there is less of a chance
SAS and/or Corporate IT will  discourage, weaken or remove
this powerful  'subroutine' function  in 9.4M4.
We do not want this to go the route of
the store command, window, %window, pmenu, command macros,  libanme, 
passthru to excel/access , command line, function keys... Note you almost never see SAS supporting the lost functionality
above but you see SAS often supporting SAS studio and EG. It would help if SAS will unequivocally state that it will support and enhance all current DOSUBL functionality on all
interfaces (editors not just EG and SAS Studio) and all platforms. I don't understand why SAS employees are not promoting DOSUBL by using it to respond to posts like this one. I have experienced push back. SOAPBOX OFF Unlike the HASH this does not require a sort. Unlike 'call execute' and macro solutions we can stop the generation of code as soon as a 'user defined' or system warning/error occurs. This post https://goo.gl/aHXBoZ https://communities.sas.com/t5/Base-SAS-Programming/Splitting-up-dataset-based-on-unique-variable/m-p/347689 HAVE ==== Up to 40 obs WORK.HAVE total obs=9 Obs ID VAR1 VAR2 1 one 100 200 2 one 100 200 3 three 300 400 4 three 500 200 5 three 200 100 6 two 400 200 7 two 300 166 8 one 300 100 9 three 400 400 WANT ==== WORK.ONE has 3 obs WORK.TWO has 2 obs WORK.THREE has 4 obs Up to 40 obs from WORK.ONE total obs=3 Obs ID VAR1 VAR2 1 one 300 100 2 one 100 200 3 one 100 200 Up to 40 obs from WORK.TWO total obs=2 Obs ID VAR1 VAR2 1 two 400 200 2 two 300 166 Up to 40 obs from WORK.THREE total obs=4 Obs ID VAR1 VAR2 1 three 500 200 2 three 200 100 3 three 300 400 4 three 400 400 WORKING CODE ============ WPS/SAS declare hash h(dataset:'have(obs=0)',multidata:'y', ordered: "A"); h.output(dataset:id); * _ _ _ _ __ ___ __ _| | _____ __| | __ _| |_ __ _ | '_ ` _ \ / _` | |/ / _ \_____ / _` |/ _` | __/ _` | | | | | | | (_| | < __/_____| (_| | (_| | || (_| | |_| |_| |_|\__,_|_|\_\___| \__,_|\__,_|\__\__,_| ; data have; input ID $ var1 var2; datalines; one 100 200 three 300 400 three 500 200 three 200 100 two 400 200 two 300 166 one 300 100 three 400 400 ;;;; run;quit; %symdel dsn sqlobs id / nowarn; data _null_; If _n_=0 then do; rc=%sysfunc(dosubl(' proc sql; select distinct quote(trim(id)) into :id separated by "," from have;quit; ')); end; length id $5; do id=&id; call symputx('dsn',id); rc=dosubl(' proc sql; create table &dsn as select * from have where id="&dsn" ;quit; '); if &sqlobs=0 then put "dataset " id " failed with 0 observations stoppin code generation"; end; run;quit; NOTE: Table WORK.ONE created, with 2 rows and 3 columns. NOTE: PROCEDURE SQL used (Total process time): real time 0.01 seconds cpu time 0.00 seconds NOTE: Table WORK.THREE created, with 4 rows and 3 columns. NOTE: PROCEDURE SQL used (Total process time): real time 0.01 seconds cpu time 0.01 seconds NOTE: Table WORK.TWO created, with 2 rows and 3 columns. NOTE: PROCEDURE SQL used (Total process time): real time 0.00 seconds cpu time 0.00 seconds NOTE: DATA statement used (Total process time): real time 2.21 seconds cpu time 0.28 seconds
Super User
Posts: 5,083

Re: Splitting up dataset based on unique variable

To be totally fair when noting pros and cons, the DOSUBL solution makes a separate pass through the entire data set for each subset being created.

Valued Guide
Posts: 505

Re: Splitting up dataset based on unique variable

Thanks for the review and insight.

You cannot check completion codes for each interaction with one passthru.
If you want one passthru then use this code.

%symdel dsns ifout id / nowarn;
data _null_;

  If _n_=0 then do;
      rc=%sysfunc(dosubl('
      proc sql; select distinct quote(trim(id)) into :id separated by ","  from have;quit;
    '));
  end;

  length id $5;
  length dsns $2000;
  length ifout $2000;

  do id=&id;

    dsns  =  catx(' ',dsns,id);
    ifout =  catx(' ',ifout,cats('when ("',id,'") output'),id,';');

  end;

  call symputx('dsns',dsns);
  call symputx('ifout',ifout);

  rc=dosubl('
    data &dsns;
      set have;
      select (id);
        &ifout
      end;
    run;quit;

  ');
   stop;

run;quit;

NOTE: There were 8 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.ONE has 2 observations and 3 variables.
NOTE: The data set WORK.THREE has 4 observations and 3 variables.
NOTE: The data set WORK.TWO has 2 observations and 3 variables.



Valued Guide
Posts: 505

Re: Splitting up dataset based on unique variable

It amazes me how flexible DOSUBL is

10 lines of code

data have;
      input ID $ var1 var2;
datalines;
one 100 200
one 300 100
three 300 400
three 500 200
three 200 100
three 400 400
two 400 200
two 300 166
;;;;
run;quit;


data _null_;
  set have(keep=id);
  by id;
  if last.id then do;
     call symputx('id',id);
     rc=dosubl('
      data &id;
         set have(where=(id="&id"));
      run;quit;
     ');
  end;
run;quit;

Valued Guide
Posts: 505

Re: Splitting up dataset based on unique variable

Even more flexibility, unsorted have

data have;
  input ID $ var1 var2;
cards4;
one 100 200
three 300 400
three 500 200
three 200 100
two 400 200
two 300 166
one 300 100
three 400 400
;;;;
run;quit;

proc datasets lib=work;
delete one two three;
run;quit;

data _null_;
  set have;
  by id notsorted;
  call symputx('id',id);
  if first.id then call symputx('obsbeg',put(_n_,3.));
  if last.id then  call symputx('obsend',put(_n_,3.));
  if last.id then
  rc=dosubl('
     proc append base=&id data=have(firstobs=&obsbeg obs=&obsend);
     run;quit;
  ');
run;quit;

PROC Star
Posts: 551

Re: Splitting up dataset based on unique variable

Thank you all Smiley Happy

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 13 replies
  • 585 views
  • 4 likes
  • 8 in conversation