DATA Step, Macro, Functions and more

By Group occurances

Accepted Solution Solved
Reply
Regular Contributor
Posts: 218
Accepted Solution

By Group occurances

Hi All,

I have the following table. How do I select all ID's of By Group where 'Math' or 'Bio' occured only after 01 Jan 2000. 

 

 

data have;
input Grade $ ST_ID $ Course $ Score date;
informat date ddmmyy8.;
format date date9.;
cards; 
Grade10 101 Eng 20 01/01/82
Grade11 105 Bio 25 25/11/99
Grade11 105 Math 18 23/09/00
Grade11 105 Eng 15 11/12/01
Grade11 105 Math 15 14/12/05
Grade11 105 Phy 15 21/03/08
Grade11 102 Eng 15 13/07/02
Grade11 102 Bio 20 22/11/07
Grade11 102 Phy 16 19/04/88
Grade11 102 Math 21 27/08/99
Grade11 104 Arts 13 18/06/00
Grade11 104 Che 12 14/07/09
Grade11 104 Eng 18 09/03/08
Grade11 104 Bio 21 25/11/13
Grade12 103 Bio 10 13/12/99
Grade12 103 Math 20 25/05/01
Grade12 103 Eng 16 04/04/06
Grade12 103 Che 15 09/11/02
Grade12 103 Phy 17 16/03/15
;
RUN;

 

Expected Output:

 

Grade11 104 Arts 13 18Jun2000
Grade11 104 Che 12 14Jul2009
Grade11 104 Eng 18 09Mar2008
Grade11 104 Bio 21 25Nov2013


Accepted Solutions
Solution
‎02-24-2017 09:17 PM
PROC Star
Posts: 307

Re: By Group occurances

 

I'm guessing at least some of the other solutions here are fine, but here's my approach:

 

proc sql;
    CREATE TABLE want AS
    SELECT * FROM have
    GROUP BY st_id
    HAVING max((course in ('Bio', 'Math')) and (date > '01JAN2000'd)) = 1 AND
           max((course in ('Bio', 'Math')) and (date <= '01JAN2000'd)) = 0 
    ORDER BY grade, st_id;
quit;

View solution in original post


All Replies
Super User
Super User
Posts: 7,977

Re: By Group occurances

Something like:

proc sql;
  create table WANT as 
  select  *
  from    HAVE
  where GRADE in (select distinct GRADE 
from HAVE where COURSE in ("Math","Eng") and SCORE > "01JAN2000"d); quit;
Trusted Advisor
Posts: 1,022

Re: By Group occurances

@RW9

 

The op's request was "where 'Math' or 'Bio' occured only after 01 Jan 2000".

 

If it's not ambiguously put, and means exactly what it says, then you probably want to exclude by group's in which any 'Math' or 'Bio" occurred on or before 01jan2000 - maybe through a limit on the minimum score for 'Math' or 'Bio' within a group.

Super User
Super User
Posts: 7,977

Re: By Group occurances

Yep, I got that.  My code's sub-query creates a list of ID's only where they have a record for either Math or Eng and the date being after 01Jan2000.  Then it selects the data with ID in that list of IDs.  Seems to match what the OP asked?

Trusted Advisor
Posts: 1,022

Re: By Group occurances

[ Edited ]

My point is that if a by-group had a 'Math' or 'Bio' BOTH before and after the cut date, it is not clear from the OP's description that such a by group is wanted. 

 

I.e. in the data set below do you want grade='Math before and After'?  The phrase "where 'Math' or 'Bio' occured only after 01 Jan 2000" suggests an intention to exclude that by group (assuming that grade is the "by group" variable).  That's why I suggested a within-group minimum criterion.

 

 

data have;

  grade='Math before and after';

  do score='01jan1990'd, '01jan2001'd;

    do course='Math','Hist'; output; end;

  end;

  grade='Math before only';

  score='01jan1990'd;

  do course='Math','Hist'; output; end;

 

  grade='Math After Only';

  score='01jan2001'd;

  do course='Math','Hist'; output; end;

  format score date9.;

run;

 

 

Super User
Posts: 7,832

Re: By Group occurances


RW9 wrote:

Yep, I got that.  My code's sub-query creates a list of ID's only where they have a record for either Math or Eng and the date being after 01Jan2000.  Then it selects the data with ID in that list of IDs.  Seems to match what the OP asked?


Actually, when I ran your code with the slight change from "score" to "date":

data have;
input Grade $ ST_ID $ Course $ Score date;
informat date ddmmyy8.;
format date date9.;
cards; 
Grade10 101 Eng 20 01/01/82
Grade11 105 Bio 25 25/11/99
Grade11 105 Math 18 23/09/00
Grade11 105 Eng 15 11/12/01
Grade11 105 Math 15 14/12/05
Grade11 105 Phy 15 21/03/08
Grade11 102 Eng 15 13/07/02
Grade11 102 Bio 20 22/11/07
Grade11 102 Phy 16 19/04/88
Grade11 102 Math 21 27/08/99
Grade11 104 Arts 13 18/06/00
Grade11 104 Che 12 14/07/09
Grade11 104 Eng 18 09/03/08
Grade11 104 Bio 21 25/11/13
Grade12 103 Bio 10 13/12/99
Grade12 103 Math 20 25/05/01
Grade12 103 Eng 16 04/04/06
Grade12 103 Che 15 09/11/02
Grade12 103 Phy 17 16/03/15
;
run;

proc sql;
  create table WANT as 
  select  *
  from    HAVE
  where GRADE in (select distinct GRADE 
                  from   HAVE 
                  where  COURSE in ("Math","Eng")
                    and  date > "01JAN2000"d);
quit;

I got 18 observations. So the where condition and the variables used for matching need to be adapted. Since this got quite complicated quite quickly (at least for me), I chose the data step approach.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 7,832

Re: By Group occurances

I think I see clearer now. My suggestion is to create a lookup table for groups and merge back with that:

proc sort data=have;
by grade st_id;
run;

data lookup (keep=grade st_id);
set have;
by grade st_id ;
retain flag;
if first.st_id
then flag = .;
if course in ('Math','Bio')
then do;
  if flag lt 1
  then if date lt '01jan2000'd
    then flag = 1;
    else flag = 0;
end;
if last.st_id and missing(flag) or flag then output;
run;

data want;
merge
  have (in=a)
  lookup (in=b)
;
by grade st_id;
if a and not b;
run;

proc print data=want noobs;
run;

The result looks like the expected one:

 Grade     ST_ID    Course    Score         date

Grade11     104      Arts       13     18JUN2000
Grade11     104      Che        12     14JUL2009
Grade11     104      Eng        18     09MAR2008
Grade11     104      Bio        21     25NOV2013
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 7,832

Re: By Group occurances

What constitutes a by group (which variables)?

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
PROC Star
Posts: 7,485

Re: By Group occurances

Good case for using a double dow:

 

data want (drop=keep);
  do until (last.ST_ID);
    set have;
    by ST_ID notsorted;
    if first.st_id then keep=0;
    if keep ne -1 then do;
      if strip(Course) in ('Bio', 'Math') and date lt '01jan2000'd then keep=-1;
      else if strip(Course) in ('Bio', 'Math') and date gt '01jan2000'd then keep=1;
    end;
  end;
  do until (last.ST_ID);
    set have;
    by ST_ID notsorted;
    if keep eq 1 then output;
  end;
run;

Interesting that the in operator wouldn't work as expected unless I stripped the Course variable.

 

Art, CEO, AnalystFinder.com

Solution
‎02-24-2017 09:17 PM
PROC Star
Posts: 307

Re: By Group occurances

 

I'm guessing at least some of the other solutions here are fine, but here's my approach:

 

proc sql;
    CREATE TABLE want AS
    SELECT * FROM have
    GROUP BY st_id
    HAVING max((course in ('Bio', 'Math')) and (date > '01JAN2000'd)) = 1 AND
           max((course in ('Bio', 'Math')) and (date <= '01JAN2000'd)) = 0 
    ORDER BY grade, st_id;
quit;
Super User
Posts: 10,041

Re: By Group occurances

Mine like Arthur's code.

 

data have;
input Grade $ ST_ID $ Course $ Score date;
informat date ddmmyy8.;
format date date9.;
cards; 
Grade10 101 Eng 20 01/01/82
Grade11 105 Bio 25 25/11/99
Grade11 105 Math 18 23/09/00
Grade11 105 Eng 15 11/12/01
Grade11 105 Math 15 14/12/05
Grade11 105 Phy 15 21/03/08
Grade11 102 Eng 15 13/07/02
Grade11 102 Bio 20 22/11/07
Grade11 102 Phy 16 19/04/88
Grade11 102 Math 21 27/08/99
Grade11 104 Arts 13 18/06/00
Grade11 104 Che 12 14/07/09
Grade11 104 Eng 18 09/03/08
Grade11 104 Bio 21 25/11/13
Grade12 103 Bio 10 13/12/99
Grade12 103 Math 20 25/05/01
Grade12 103 Eng 16 04/04/06
Grade12 103 Che 15 09/11/02
Grade12 103 Phy 17 16/03/15
;
run;
data want;
has_key=0;after_date=1;
  do until (last.ST_ID);
    set have;
    by grade ST_ID notsorted;
    if strip(Course) in ('Bio', 'Math') then do;
     has_key=1;
     if date lt '01jan2000'd then after_date=0;
    end;
  end;
  do until (last.ST_ID);
    set have;
    by grade ST_ID notsorted;
    if has_key and after_date then output;
  end;
drop has_key after_date;
run;
Trusted Advisor
Posts: 1,022

Re: By Group occurances

If a proc sort is acceptable, then you can have a VERY simple program:

 

proc sort data=have;
  by grade st_id;
run;

data want;
  merge have (in=in1 where=(course in ('Math','Bio') and date<'01jan2000'd))
        have (in=in2 where=(course in ('Math','Bio') and date>='01jan2000'd))
        have ;
  by grade st_id;
  if in2>in1;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 322 views
  • 9 likes
  • 7 in conversation