BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
panda
Quartz | Level 8

Hi -,

I want to create sub data sets from main data sets using loop, my data looks like this:

id  group       subgroup 

1   patient care   physical contact

1   patient care   family contact

1   patient care   other

2   education      conference

2   education      read

2   education      teaching 

I want to sub set the data by each group and each subgroup:

data patient_care;
    set data;
     if group = 'patient care';
run;

data edu;
   set data;
   if group = 'Education';
run;

data patient_care1;
    set data;
    if subgroup = 'physical contact';
run;

data patient_care2;
   set data;
   if subgroup = 'family contact';
run;

data patient_care3;
   set data;
   if subgroup = 'other';
run;

data edu1;
   set data;
   if subgroup = 'conference';
run;

data edu2;
   set data;
   if subgroup = 'read';
run;

data edu3;
   set data;
   if subgroup  = 'teaching';
run;

The code is very redundant, I think looping will be helpful but not sure exactly what to do, any idea?

Thanks!!!

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

You can do hierarchical sums in a data step:

data have;
infile cards dlm=',';
input id :$1. group :$12. subgroup :$16. value;
cards;
1,patient care,physical contact,2
1,patient care,physical contact,3
1,patient care,family contact,1
1,patient care,family contact,6
1,patient care,other,3
1,patient care,other,3
2,education,conference,2
2,education,conference,4
2,education,read,1
2,education,read,5
2,education,teaching,7
2,education,teaching,2
;
run;

data want (keep=id group subgroup value);
set have;
by group notsorted subgroup notsorted;
retain groupsum sum;
if first.group then groupsum = 0;
if first.subgroup then sum = 0;
groupsum + value;
sum + value;
if last.subgroup
then do;
  value = sum;
  output;
end;
if last.group
then do;
  subgroup = '';
  value = groupsum;
  output;
end;
run;

If you just want a report, proc report is perfectly capable of doing that.

View solution in original post

7 REPLIES 7
Reeza
Super User

This is rarely a good way to work with your data. Why do you want to split it in the first place?

 

FYI - the best answer to this question is usually don't do it. Because many people insist anyways see these posts:

 

http://blogs.sas.com/content/sasdummy/2015/01/26/how-to-split-one-data-set-into-many/

http://www.sascommunity.org/wiki/Split_Data_into_Subsets

panda
Quartz | Level 8

Hi - 

I want to split the data because I am not sure how to sum all observations across rows by group and subgroup and generate new observations as sums of those groups, right now I am using proc print and copy and paste the output in excel:

proc print data = data;
var inter0-inter23;
run;

The original data is like this, each id has one unique group:

id  group    inter0  inter1  inter2  inter3   ...   inter23

1   patient care

1   education

2   patinet care

2    education

each id has one unique subgroup:

id subgroup       inter0 inter1 inter2 inter3 ... inter23

1   patient interaction

1   family interaction

1   conference

1    read

1    teaching

2    patient interaction

2    family interaction

2    conference

2    read

2   teaching 

 

 

I want to sum all ids by group and subgroup, the ideal ouput looks like this

group            inter0   inter1 inter2 ..... inter23

patient care

education

subgroup

patient interaction

family interaction

conference

read 

teaching

Not sure how to do that, any idea?

 

Kurt_Bremser
Super User

You can do hierarchical sums in a data step:

data have;
infile cards dlm=',';
input id :$1. group :$12. subgroup :$16. value;
cards;
1,patient care,physical contact,2
1,patient care,physical contact,3
1,patient care,family contact,1
1,patient care,family contact,6
1,patient care,other,3
1,patient care,other,3
2,education,conference,2
2,education,conference,4
2,education,read,1
2,education,read,5
2,education,teaching,7
2,education,teaching,2
;
run;

data want (keep=id group subgroup value);
set have;
by group notsorted subgroup notsorted;
retain groupsum sum;
if first.group then groupsum = 0;
if first.subgroup then sum = 0;
groupsum + value;
sum + value;
if last.subgroup
then do;
  value = sum;
  output;
end;
if last.group
then do;
  subgroup = '';
  value = groupsum;
  output;
end;
run;

If you just want a report, proc report is perfectly capable of doing that.

panda
Quartz | Level 8

Don't know why but notsorted did not work well, after I remove notsorted command, I can get the result I want (need to sort by group and subgroup first):

proc sort have;
by group subgroup;
run;
data want (keep=id group subgroup value); set have; by group subgroup; retain groupsum sum; if first.group then groupsum = 0; if first.subgroup then sum = 0; groupsum + value; sum + value; if last.subgroup then do; value = sum; output; end; if last.group then do; subgroup = ''; value = groupsum; output; end; run;
Kurt_Bremser
Super User

I used the notsorted option so that the code works when there is no ascending/descending order. One has to make sure that the groups come in blocks, though (no identical by values scattered across the dataset). If sorting does not interfere with other logic, then it is to be preferred.

ballardw
Super User

First would be to explain what you are doing that requires all of those sets. The request for many sets is usually immediately followed by questions on how to use them to do the same processing.

 

You could extend this logic:

data patient_care
     edu
;
    set data;
     if group = 'patient care' then output Patient_care;
     if group = 'Education' then output edu;
run;

each condition can send output to the specified output set.

 

Kurt_Bremser
Super User

Put your control data in a dataset, and let a data step create the split code with call execute:

data lookup;
infile cards dlm=',';
input id :$1. group :$12. subgroup :$16.;
cards;
1,patient care,physical contact
1,patient care,family contact
1,patient care,other
2,education,conference
2,education,read
2,education,teaching
;
run;

data _null_;
call execute('data');
do until (eof1);
  set lookup end=eof1;
  by id;
  if first.id then call execute(' ' !! translate(trim(group),'_',' '));
  call execute(' ' !! translate(trim(subgroup),'_',' '));
end;
call execute('; set data;');
do until (eof2);
  set lookup end=eof2;
  by id;
  if first.id then call execute('if group = "' !! trim(group) !! '" then output ' !! translate(trim(group),'_',' ') !! ';');
  call execute('if subgroup = "' !! trim(subgroup) !! '" then output ' !! translate(trim(subgroup),'_',' ') !! ';');
end;
call execute('run;');
stop;
run;

Note that the created code will make just one pass through the original dataset.

 

The question remains, what will you do with the resulting datasets? If you perform the same analytics/reports on all of them, you're better off doing it in one pass with by-group processing.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2847 views
  • 5 likes
  • 4 in conversation