Hi All,
I have the following table. How do I select all ID's of By Group where 'Math' or 'Bio' occured only after 01 Jan 2000.
data have;
input Grade $ ST_ID $ Course $ Score date;
informat date ddmmyy8.;
format date date9.;
cards;
Grade10 101 Eng 20 01/01/82
Grade11 105 Bio 25 25/11/99
Grade11 105 Math 18 23/09/00
Grade11 105 Eng 15 11/12/01
Grade11 105 Math 15 14/12/05
Grade11 105 Phy 15 21/03/08
Grade11 102 Eng 15 13/07/02
Grade11 102 Bio 20 22/11/07
Grade11 102 Phy 16 19/04/88
Grade11 102 Math 21 27/08/99
Grade11 104 Arts 13 18/06/00
Grade11 104 Che 12 14/07/09
Grade11 104 Eng 18 09/03/08
Grade11 104 Bio 21 25/11/13
Grade12 103 Bio 10 13/12/99
Grade12 103 Math 20 25/05/01
Grade12 103 Eng 16 04/04/06
Grade12 103 Che 15 09/11/02
Grade12 103 Phy 17 16/03/15
;
RUN;
Expected Output:
Grade11 104 Arts 13 18Jun2000
Grade11 104 Che 12 14Jul2009
Grade11 104 Eng 18 09Mar2008
Grade11 104 Bio 21 25Nov2013
I'm guessing at least some of the other solutions here are fine, but here's my approach:
proc sql;
CREATE TABLE want AS
SELECT * FROM have
GROUP BY st_id
HAVING max((course in ('Bio', 'Math')) and (date > '01JAN2000'd)) = 1 AND
max((course in ('Bio', 'Math')) and (date <= '01JAN2000'd)) = 0
ORDER BY grade, st_id;
quit;
Something like:
proc sql; create table WANT as select * from HAVE where GRADE in (select distinct GRADE
from HAVE where COURSE in ("Math","Eng") and SCORE > "01JAN2000"d); quit;
The op's request was "where 'Math' or 'Bio' occured only after 01 Jan 2000".
If it's not ambiguously put, and means exactly what it says, then you probably want to exclude by group's in which any 'Math' or 'Bio" occurred on or before 01jan2000 - maybe through a limit on the minimum score for 'Math' or 'Bio' within a group.
Yep, I got that. My code's sub-query creates a list of ID's only where they have a record for either Math or Eng and the date being after 01Jan2000. Then it selects the data with ID in that list of IDs. Seems to match what the OP asked?
My point is that if a by-group had a 'Math' or 'Bio' BOTH before and after the cut date, it is not clear from the OP's description that such a by group is wanted.
I.e. in the data set below do you want grade='Math before and After'? The phrase "where 'Math' or 'Bio' occured only after 01 Jan 2000" suggests an intention to exclude that by group (assuming that grade is the "by group" variable). That's why I suggested a within-group minimum criterion.
data have;
grade='Math before and after';
do score='01jan1990'd, '01jan2001'd;
do course='Math','Hist'; output; end;
end;
grade='Math before only';
score='01jan1990'd;
do course='Math','Hist'; output; end;
grade='Math After Only';
score='01jan2001'd;
do course='Math','Hist'; output; end;
format score date9.;
run;
@RW9 wrote:
Yep, I got that. My code's sub-query creates a list of ID's only where they have a record for either Math or Eng and the date being after 01Jan2000. Then it selects the data with ID in that list of IDs. Seems to match what the OP asked?
Actually, when I ran your code with the slight change from "score" to "date":
data have;
input Grade $ ST_ID $ Course $ Score date;
informat date ddmmyy8.;
format date date9.;
cards;
Grade10 101 Eng 20 01/01/82
Grade11 105 Bio 25 25/11/99
Grade11 105 Math 18 23/09/00
Grade11 105 Eng 15 11/12/01
Grade11 105 Math 15 14/12/05
Grade11 105 Phy 15 21/03/08
Grade11 102 Eng 15 13/07/02
Grade11 102 Bio 20 22/11/07
Grade11 102 Phy 16 19/04/88
Grade11 102 Math 21 27/08/99
Grade11 104 Arts 13 18/06/00
Grade11 104 Che 12 14/07/09
Grade11 104 Eng 18 09/03/08
Grade11 104 Bio 21 25/11/13
Grade12 103 Bio 10 13/12/99
Grade12 103 Math 20 25/05/01
Grade12 103 Eng 16 04/04/06
Grade12 103 Che 15 09/11/02
Grade12 103 Phy 17 16/03/15
;
run;
proc sql;
create table WANT as
select *
from HAVE
where GRADE in (select distinct GRADE
from HAVE
where COURSE in ("Math","Eng")
and date > "01JAN2000"d);
quit;
I got 18 observations. So the where condition and the variables used for matching need to be adapted. Since this got quite complicated quite quickly (at least for me), I chose the data step approach.
I think I see clearer now. My suggestion is to create a lookup table for groups and merge back with that:
proc sort data=have;
by grade st_id;
run;
data lookup (keep=grade st_id);
set have;
by grade st_id ;
retain flag;
if first.st_id
then flag = .;
if course in ('Math','Bio')
then do;
if flag lt 1
then if date lt '01jan2000'd
then flag = 1;
else flag = 0;
end;
if last.st_id and missing(flag) or flag then output;
run;
data want;
merge
have (in=a)
lookup (in=b)
;
by grade st_id;
if a and not b;
run;
proc print data=want noobs;
run;
The result looks like the expected one:
Grade ST_ID Course Score date Grade11 104 Arts 13 18JUN2000 Grade11 104 Che 12 14JUL2009 Grade11 104 Eng 18 09MAR2008 Grade11 104 Bio 21 25NOV2013
What constitutes a by group (which variables)?
Good case for using a double dow:
data want (drop=keep); do until (last.ST_ID); set have; by ST_ID notsorted; if first.st_id then keep=0; if keep ne -1 then do; if strip(Course) in ('Bio', 'Math') and date lt '01jan2000'd then keep=-1; else if strip(Course) in ('Bio', 'Math') and date gt '01jan2000'd then keep=1; end; end; do until (last.ST_ID); set have; by ST_ID notsorted; if keep eq 1 then output; end; run;
Interesting that the in operator wouldn't work as expected unless I stripped the Course variable.
Art, CEO, AnalystFinder.com
I'm guessing at least some of the other solutions here are fine, but here's my approach:
proc sql;
CREATE TABLE want AS
SELECT * FROM have
GROUP BY st_id
HAVING max((course in ('Bio', 'Math')) and (date > '01JAN2000'd)) = 1 AND
max((course in ('Bio', 'Math')) and (date <= '01JAN2000'd)) = 0
ORDER BY grade, st_id;
quit;
Mine like Arthur's code.
data have;
input Grade $ ST_ID $ Course $ Score date;
informat date ddmmyy8.;
format date date9.;
cards;
Grade10 101 Eng 20 01/01/82
Grade11 105 Bio 25 25/11/99
Grade11 105 Math 18 23/09/00
Grade11 105 Eng 15 11/12/01
Grade11 105 Math 15 14/12/05
Grade11 105 Phy 15 21/03/08
Grade11 102 Eng 15 13/07/02
Grade11 102 Bio 20 22/11/07
Grade11 102 Phy 16 19/04/88
Grade11 102 Math 21 27/08/99
Grade11 104 Arts 13 18/06/00
Grade11 104 Che 12 14/07/09
Grade11 104 Eng 18 09/03/08
Grade11 104 Bio 21 25/11/13
Grade12 103 Bio 10 13/12/99
Grade12 103 Math 20 25/05/01
Grade12 103 Eng 16 04/04/06
Grade12 103 Che 15 09/11/02
Grade12 103 Phy 17 16/03/15
;
run;
data want;
has_key=0;after_date=1;
do until (last.ST_ID);
set have;
by grade ST_ID notsorted;
if strip(Course) in ('Bio', 'Math') then do;
has_key=1;
if date lt '01jan2000'd then after_date=0;
end;
end;
do until (last.ST_ID);
set have;
by grade ST_ID notsorted;
if has_key and after_date then output;
end;
drop has_key after_date;
run;
If a proc sort is acceptable, then you can have a VERY simple program:
proc sort data=have;
by grade st_id;
run;
data want;
merge have (in=in1 where=(course in ('Math','Bio') and date<'01jan2000'd))
have (in=in2 where=(course in ('Math','Bio') and date>='01jan2000'd))
have ;
by grade st_id;
if in2>in1;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.