## By Group occurances

Solved
Regular Contributor
Posts: 244

# By Group occurances

Hi All,

I have the following table. How do I select all ID's of By Group where 'Math' or 'Bio' occured only after 01 Jan 2000.

``````data have;
input Grade \$ ST_ID \$ Course \$ Score date;
informat date ddmmyy8.;
format date date9.;
cards;
;
RUN;``````

Expected Output:

Accepted Solutions
Solution
‎02-24-2017 09:17 PM
PROC Star
Posts: 311

## Re: By Group occurances

I'm guessing at least some of the other solutions here are fine, but here's my approach:

``````proc sql;
CREATE TABLE want AS
SELECT * FROM have
GROUP BY st_id
HAVING max((course in ('Bio', 'Math')) and (date > '01JAN2000'd)) = 1 AND
max((course in ('Bio', 'Math')) and (date <= '01JAN2000'd)) = 0
quit;``````

All Replies
Super User
Posts: 9,599

## Re: By Group occurances

Something like:

```proc sql;
create table WANT as
select  *
from    HAVE
where  COURSE in ("Math","Eng")
and  SCORE > "01JAN2000"d);
quit;```
Posts: 1,337

## Re: By Group occurances

@RW9

The op's request was "where 'Math' or 'Bio' occured only after 01 Jan 2000".

If it's not ambiguously put, and means exactly what it says, then you probably want to exclude by group's in which any 'Math' or 'Bio" occurred on or before 01jan2000 - maybe through a limit on the minimum score for 'Math' or 'Bio' within a group.

Super User
Posts: 9,599

## Re: By Group occurances

Yep, I got that.  My code's sub-query creates a list of ID's only where they have a record for either Math or Eng and the date being after 01Jan2000.  Then it selects the data with ID in that list of IDs.  Seems to match what the OP asked?

Posts: 1,337

## Re: By Group occurances

[ Edited ]

My point is that if a by-group had a 'Math' or 'Bio' BOTH before and after the cut date, it is not clear from the OP's description that such a by group is wanted.

I.e. in the data set below do you want grade='Math before and After'?  The phrase "where 'Math' or 'Bio' occured only after 01 Jan 2000" suggests an intention to exclude that by group (assuming that grade is the "by group" variable).  That's why I suggested a within-group minimum criterion.

data have;

do score='01jan1990'd, '01jan2001'd;

do course='Math','Hist'; output; end;

end;

score='01jan1990'd;

do course='Math','Hist'; output; end;

score='01jan2001'd;

do course='Math','Hist'; output; end;

format score date9.;

run;

Super User
Posts: 10,244

## Re: By Group occurances

RW9 wrote:

Yep, I got that.  My code's sub-query creates a list of ID's only where they have a record for either Math or Eng and the date being after 01Jan2000.  Then it selects the data with ID in that list of IDs.  Seems to match what the OP asked?

Actually, when I ran your code with the slight change from "score" to "date":

``````data have;
input Grade \$ ST_ID \$ Course \$ Score date;
informat date ddmmyy8.;
format date date9.;
cards;
;
run;

proc sql;
create table WANT as
select  *
from    HAVE
from   HAVE
where  COURSE in ("Math","Eng")
and  date > "01JAN2000"d);
quit;``````

I got 18 observations. So the where condition and the variables used for matching need to be adapted. Since this got quite complicated quite quickly (at least for me), I chose the data step approach.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Super User
Posts: 10,244

## Re: By Group occurances

I think I see clearer now. My suggestion is to create a lookup table for groups and merge back with that:

``````proc sort data=have;
run;

set have;
retain flag;
if first.st_id
then flag = .;
if course in ('Math','Bio')
then do;
if flag lt 1
then if date lt '01jan2000'd
then flag = 1;
else flag = 0;
end;
if last.st_id and missing(flag) or flag then output;
run;

data want;
merge
have (in=a)
lookup (in=b)
;
if a and not b;
run;

proc print data=want noobs;
run;``````

The result looks like the expected one:

``` Grade     ST_ID    Course    Score         date

```
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Super User
Posts: 10,244

## Re: By Group occurances

What constitutes a by group (which variables)?

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
PROC Star
Posts: 8,164

## Re: By Group occurances

Good case for using a double dow:

```data want (drop=keep);
do until (last.ST_ID);
set have;
by ST_ID notsorted;
if first.st_id then keep=0;
if keep ne -1 then do;
if strip(Course) in ('Bio', 'Math') and date lt '01jan2000'd then keep=-1;
else if strip(Course) in ('Bio', 'Math') and date gt '01jan2000'd then keep=1;
end;
end;
do until (last.ST_ID);
set have;
by ST_ID notsorted;
if keep eq 1 then output;
end;
run;
```

Interesting that the in operator wouldn't work as expected unless I stripped the Course variable.

Art, CEO, AnalystFinder.com

Solution
‎02-24-2017 09:17 PM
PROC Star
Posts: 311

## Re: By Group occurances

I'm guessing at least some of the other solutions here are fine, but here's my approach:

``````proc sql;
CREATE TABLE want AS
SELECT * FROM have
GROUP BY st_id
HAVING max((course in ('Bio', 'Math')) and (date > '01JAN2000'd)) = 1 AND
max((course in ('Bio', 'Math')) and (date <= '01JAN2000'd)) = 0
quit;``````
Super User
Posts: 10,778

## Re: By Group occurances

Mine like Arthur's code.

``````data have;
input Grade \$ ST_ID \$ Course \$ Score date;
informat date ddmmyy8.;
format date date9.;
cards;
;
run;
data want;
has_key=0;after_date=1;
do until (last.ST_ID);
set have;
if strip(Course) in ('Bio', 'Math') then do;
has_key=1;
if date lt '01jan2000'd then after_date=0;
end;
end;
do until (last.ST_ID);
set have;
if has_key and after_date then output;
end;
drop has_key after_date;
run;``````
Posts: 1,337

## Re: By Group occurances

If a proc sort is acceptable, then you can have a VERY simple program:

``````proc sort data=have;
run;

data want;
merge have (in=in1 where=(course in ('Math','Bio') and date<'01jan2000'd))
have (in=in2 where=(course in ('Math','Bio') and date>='01jan2000'd))
have ;