I feel like this should be a fairly simple problem, but I can't get it right unfortunately.
Say I have got some schools with a few students in each, and some students belong to group A and others belong to group B. The first school ID is 12345, and all three students are in the A group, so I want this school and all three of its students to go into the a_students output.
The next school, with school ID 99888, has three students also, but Jane is in group A and the other two are in group B. Since not every student is in A group, that school doesn't meet the criteria and so the school and its three students get put in the 'other' output. The last school has on B group students so goes in the other output.
For my first part of code, I've tried to flag it if first.SchoolID is group A. But for the second part I just can't get it right so I have put ****** in the part that I can't fix. Thanks for your help.
data have; infile datalines dlm=',' dsd truncover; input SchoolID $ StudentID :$4. Name :$7. Group :$10. ; datalines; 12345,1336,Li,A 12345,2337,Amy,A 12345,3338,Rahul,A 99888,2222,Tim,B 99888,2222,Jane,A 99888,2222,Mary,B 11777,1123,George,B 11777,1124,Xi,B ; DATA A_students other; do until (last.SchoolID); set have; by SchoolID notsorted; if first.SchoolID then _check_A=Group; if Group='A' then _flag=1; end; do until (last.SchoolID); set have; by SchoolID notsorted; if ******** then output A_students; else output other; end; run;
You only need a single flag variable that holds the group and can be used to determine 'other' status:
data
a_students
b_students
other
;
do until (last.schoolid);
set have;
by schoolid notsorted;
if first.schoolid then _group = group;
if _group ne group then _group = 'O';
end;
do until (last.schoolid);
set have;
by schoolid notsorted;
select (_group);
when ('A') output a_students;
when ('B') output b_students;
when ('O') output other;
end;
end;
drop _group;
run;
Do you know all the possible values for group beforehand, or do they have to be determined dynamically?
Hi Kurt,
Any technique at all that solves it would make me happy. When I did the SAS programming course a while back, I thought the first and last variables looked so handy, and yet some problems I just can't get right using them. When I tried this, it outputted any occurrence of a student in group A into the output. I just can't solve it for the case of every student must be in group A or that school and its students don't go into the A_students output.
But any quick and dirty technique is welcome. It's a simplified version of something I'm trying to do at work, but due to lockdown I'm working alone with no experts to ask.
Sorry - I don't think I answered your question properly. Yes - they'll always either be flagged as A or B. There won't be any other options in there and there is no chance of a group C or D etc in the future. Group A means the student met a certain criteria, so B is just the 'else' condition for students that didn't fit into group A.
Then you can simplify the code I posted to:
data
a_students
other
;
_group = 'A';
do until (last.schoolid);
set have;
by schoolid notsorted;
if group ne 'A' then _group = 'O';
end;
do until (last.schoolid);
set have;
by schoolid notsorted;
if _group = 'A'
then output a_students;
else output other;
end;
drop _group;
run;
And if you only need to find the schoolid, it's even simpler:
data
a_students
other
;
_group = 'A';
do until (last.schoolid);
set have;
by schoolid notsorted;
if group ne 'A' then _group = 'O';
end;
if _group = 'A'
then output a_students;
else output other;
keep schoolid;
run;
Are you trying to create a separate dataset per group for those rows that fit other condition(s) too?
Something like:
data stud_a
stud_b
other;
set have;
if group="A" and <other condition> then output stud_a; else
if group="B" and <other condition> then output stud_b;
else output other;
run;
Hi Shmuel,
No, I don't need an output for 'b_students'. I'm only wanting to create two output groups in this case. It's basically if every student has met the A group condition, then I'll output them as A_students and those students will considered students who doing okay at school. Then you can think of the B students as those who need more attention or tuition. So if a school has only A students then they'll go into a list of schools/students who don't need further improvement so I can forget about them. The schools with B's would be reviewed.
Thanks for your help.
Depending on the complexity of your condindition, it seems to me that the most simple code
will be:
data Stud_a Other;
set have;
if <condition> then output stud_a;
else output stud_b;
run;
Without knowing the <condition> - I cannot understand why to read twice the group.
You only need a single flag variable that holds the group and can be used to determine 'other' status:
data
a_students
b_students
other
;
do until (last.schoolid);
set have;
by schoolid notsorted;
if first.schoolid then _group = group;
if _group ne group then _group = 'O';
end;
do until (last.schoolid);
set have;
by schoolid notsorted;
select (_group);
when ('A') output a_students;
when ('B') output b_students;
when ('O') output other;
end;
end;
drop _group;
run;
Thanks so much for writing that. Now I can see where I was going wrong - I should have used _check not equal to in that first do until loop.
If I rewrite mine now using your logic, but so I just get the A_students and other group that I'm needing, this below works perfectly. I do need to retain all of the variables in both outputs, not just schoolID.
DATA A_students other; do until (last.SchoolID); set have; by SchoolID notsorted; if first.SchoolID then _check=Group; if _check NE Group then _check='B'; end; do until (last.SchoolID); set have; by SchoolID notsorted; select (_check); when ('A') output a_students; when ('B') output other; end; end; drop _:; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.