BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mlogan
Lapis Lazuli | Level 10

Hi All,

I have the following table. How do I select all ID's of By Group where 'Math' or 'Bio' occured only after 01 Jan 2000. 

 

 

data have;
input Grade $ ST_ID $ Course $ Score date;
informat date ddmmyy8.;
format date date9.;
cards; 
Grade10 101 Eng 20 01/01/82
Grade11 105 Bio 25 25/11/99
Grade11 105 Math 18 23/09/00
Grade11 105 Eng 15 11/12/01
Grade11 105 Math 15 14/12/05
Grade11 105 Phy 15 21/03/08
Grade11 102 Eng 15 13/07/02
Grade11 102 Bio 20 22/11/07
Grade11 102 Phy 16 19/04/88
Grade11 102 Math 21 27/08/99
Grade11 104 Arts 13 18/06/00
Grade11 104 Che 12 14/07/09
Grade11 104 Eng 18 09/03/08
Grade11 104 Bio 21 25/11/13
Grade12 103 Bio 10 13/12/99
Grade12 103 Math 20 25/05/01
Grade12 103 Eng 16 04/04/06
Grade12 103 Che 15 09/11/02
Grade12 103 Phy 17 16/03/15
;
RUN;

 

Expected Output:

 

Grade11 104 Arts 13 18Jun2000
Grade11 104 Che 12 14Jul2009
Grade11 104 Eng 18 09Mar2008
Grade11 104 Bio 21 25Nov2013

1 ACCEPTED SOLUTION

Accepted Solutions
collinelliot
Barite | Level 11

 

I'm guessing at least some of the other solutions here are fine, but here's my approach:

 

proc sql;
    CREATE TABLE want AS
    SELECT * FROM have
    GROUP BY st_id
    HAVING max((course in ('Bio', 'Math')) and (date > '01JAN2000'd)) = 1 AND
           max((course in ('Bio', 'Math')) and (date <= '01JAN2000'd)) = 0 
    ORDER BY grade, st_id;
quit;

View solution in original post

11 REPLIES 11
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Something like:

proc sql;
  create table WANT as 
  select  *
  from    HAVE
  where GRADE in (select distinct GRADE 
from HAVE where COURSE in ("Math","Eng") and SCORE > "01JAN2000"d); quit;
mkeintz
PROC Star

@RW9

 

The op's request was "where 'Math' or 'Bio' occured only after 01 Jan 2000".

 

If it's not ambiguously put, and means exactly what it says, then you probably want to exclude by group's in which any 'Math' or 'Bio" occurred on or before 01jan2000 - maybe through a limit on the minimum score for 'Math' or 'Bio' within a group.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yep, I got that.  My code's sub-query creates a list of ID's only where they have a record for either Math or Eng and the date being after 01Jan2000.  Then it selects the data with ID in that list of IDs.  Seems to match what the OP asked?

mkeintz
PROC Star

My point is that if a by-group had a 'Math' or 'Bio' BOTH before and after the cut date, it is not clear from the OP's description that such a by group is wanted. 

 

I.e. in the data set below do you want grade='Math before and After'?  The phrase "where 'Math' or 'Bio' occured only after 01 Jan 2000" suggests an intention to exclude that by group (assuming that grade is the "by group" variable).  That's why I suggested a within-group minimum criterion.

 

 

data have;

  grade='Math before and after';

  do score='01jan1990'd, '01jan2001'd;

    do course='Math','Hist'; output; end;

  end;

  grade='Math before only';

  score='01jan1990'd;

  do course='Math','Hist'; output; end;

 

  grade='Math After Only';

  score='01jan2001'd;

  do course='Math','Hist'; output; end;

  format score date9.;

run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Kurt_Bremser
Super User

@RW9 wrote:

Yep, I got that.  My code's sub-query creates a list of ID's only where they have a record for either Math or Eng and the date being after 01Jan2000.  Then it selects the data with ID in that list of IDs.  Seems to match what the OP asked?


Actually, when I ran your code with the slight change from "score" to "date":

data have;
input Grade $ ST_ID $ Course $ Score date;
informat date ddmmyy8.;
format date date9.;
cards; 
Grade10 101 Eng 20 01/01/82
Grade11 105 Bio 25 25/11/99
Grade11 105 Math 18 23/09/00
Grade11 105 Eng 15 11/12/01
Grade11 105 Math 15 14/12/05
Grade11 105 Phy 15 21/03/08
Grade11 102 Eng 15 13/07/02
Grade11 102 Bio 20 22/11/07
Grade11 102 Phy 16 19/04/88
Grade11 102 Math 21 27/08/99
Grade11 104 Arts 13 18/06/00
Grade11 104 Che 12 14/07/09
Grade11 104 Eng 18 09/03/08
Grade11 104 Bio 21 25/11/13
Grade12 103 Bio 10 13/12/99
Grade12 103 Math 20 25/05/01
Grade12 103 Eng 16 04/04/06
Grade12 103 Che 15 09/11/02
Grade12 103 Phy 17 16/03/15
;
run;

proc sql;
  create table WANT as 
  select  *
  from    HAVE
  where GRADE in (select distinct GRADE 
                  from   HAVE 
                  where  COURSE in ("Math","Eng")
                    and  date > "01JAN2000"d);
quit;

I got 18 observations. So the where condition and the variables used for matching need to be adapted. Since this got quite complicated quite quickly (at least for me), I chose the data step approach.

Kurt_Bremser
Super User

I think I see clearer now. My suggestion is to create a lookup table for groups and merge back with that:

proc sort data=have;
by grade st_id;
run;

data lookup (keep=grade st_id);
set have;
by grade st_id ;
retain flag;
if first.st_id
then flag = .;
if course in ('Math','Bio')
then do;
  if flag lt 1
  then if date lt '01jan2000'd
    then flag = 1;
    else flag = 0;
end;
if last.st_id and missing(flag) or flag then output;
run;

data want;
merge
  have (in=a)
  lookup (in=b)
;
by grade st_id;
if a and not b;
run;

proc print data=want noobs;
run;

The result looks like the expected one:

 Grade     ST_ID    Course    Score         date

Grade11     104      Arts       13     18JUN2000
Grade11     104      Che        12     14JUL2009
Grade11     104      Eng        18     09MAR2008
Grade11     104      Bio        21     25NOV2013
art297
Opal | Level 21

Good case for using a double dow:

 

data want (drop=keep);
  do until (last.ST_ID);
    set have;
    by ST_ID notsorted;
    if first.st_id then keep=0;
    if keep ne -1 then do;
      if strip(Course) in ('Bio', 'Math') and date lt '01jan2000'd then keep=-1;
      else if strip(Course) in ('Bio', 'Math') and date gt '01jan2000'd then keep=1;
    end;
  end;
  do until (last.ST_ID);
    set have;
    by ST_ID notsorted;
    if keep eq 1 then output;
  end;
run;

Interesting that the in operator wouldn't work as expected unless I stripped the Course variable.

 

Art, CEO, AnalystFinder.com

collinelliot
Barite | Level 11

 

I'm guessing at least some of the other solutions here are fine, but here's my approach:

 

proc sql;
    CREATE TABLE want AS
    SELECT * FROM have
    GROUP BY st_id
    HAVING max((course in ('Bio', 'Math')) and (date > '01JAN2000'd)) = 1 AND
           max((course in ('Bio', 'Math')) and (date <= '01JAN2000'd)) = 0 
    ORDER BY grade, st_id;
quit;
Ksharp
Super User

Mine like Arthur's code.

 

data have;
input Grade $ ST_ID $ Course $ Score date;
informat date ddmmyy8.;
format date date9.;
cards; 
Grade10 101 Eng 20 01/01/82
Grade11 105 Bio 25 25/11/99
Grade11 105 Math 18 23/09/00
Grade11 105 Eng 15 11/12/01
Grade11 105 Math 15 14/12/05
Grade11 105 Phy 15 21/03/08
Grade11 102 Eng 15 13/07/02
Grade11 102 Bio 20 22/11/07
Grade11 102 Phy 16 19/04/88
Grade11 102 Math 21 27/08/99
Grade11 104 Arts 13 18/06/00
Grade11 104 Che 12 14/07/09
Grade11 104 Eng 18 09/03/08
Grade11 104 Bio 21 25/11/13
Grade12 103 Bio 10 13/12/99
Grade12 103 Math 20 25/05/01
Grade12 103 Eng 16 04/04/06
Grade12 103 Che 15 09/11/02
Grade12 103 Phy 17 16/03/15
;
run;
data want;
has_key=0;after_date=1;
  do until (last.ST_ID);
    set have;
    by grade ST_ID notsorted;
    if strip(Course) in ('Bio', 'Math') then do;
     has_key=1;
     if date lt '01jan2000'd then after_date=0;
    end;
  end;
  do until (last.ST_ID);
    set have;
    by grade ST_ID notsorted;
    if has_key and after_date then output;
  end;
drop has_key after_date;
run;
mkeintz
PROC Star

If a proc sort is acceptable, then you can have a VERY simple program:

 

proc sort data=have;
  by grade st_id;
run;

data want;
  merge have (in=in1 where=(course in ('Math','Bio') and date<'01jan2000'd))
        have (in=in2 where=(course in ('Math','Bio') and date>='01jan2000'd))
        have ;
  by grade st_id;
  if in2>in1;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 1868 views
  • 9 likes
  • 7 in conversation