Hi everyone, I come across this problem and I hope you could help me with:
I am trying to count "How many people received both PCV13 and PPSV23 after they turned 60 years of age, and the days between the two vaccine is at least 60 days apart".
I don't care which vaccine was taken the first after they turn 60, only the length of the period in between; and a person may have got the same type of vaccine more than once after they turn 60, so I care the max gap- the earliest date of one vaccine and the latest date of the other
I create a sample of what the input data looks like:
person_id | birth_date | vaccine | vaccine_date |
1 | 5/1/1940 | PCV13 | 3/1/2013 |
1 | 5/1/1940 | PCV13 | 5/8/1982 |
1 | 5/1/1940 | PPSV23 | 9/7/1989 |
1 | 5/1/1940 | PPSV23 | 4/9/2010 |
1 | 5/1/1940 | PPSV23 | 7/20/2016 |
2 | 11/6/1950 | PCV13 | 6/1/1968 |
2 | 11/6/1950 | PCV13 | 5/2/1986 |
2 | 11/6/1950 | PPSV23 | 8/1/2015 |
3 | 10/6/1955 | PCV13 | 1/19/2013 |
3 | 10/6/1955 | PPSV23 | 2/1/2013 |
3 | 10/6/1955 | PPSV23 | 12/20/2015 |
4 | 3/12/1990 | PCV13 | 1/6/2005 |
4 | 3/12/1990 | PPSV23 | 2/6/2005 |
My thinking:
- first create a flag if the vaccine is received after they turned 60, then exclude those rows that vaccine_date is earlier than their 60 years birth date anniversary, for example, person 4 in the above example will be filtered out.
- But then I haven't figure out how to use INTCK and other conditions to let SAS calculate the max gap (because I don't care which vaccine they receive first, some people receive PCV13 first, some received PPSV23 first, and some received 2 PPSV23, but only 1 is after 60 days they received PCV13 - all kinds of possibilities)
My vision of the output is something like:
person_id | birth_date | vaccine | vaccine_date | 60Days_Flag |
1 | 5/1/1940 | PCV13 | 3/1/2013 | 1 |
1 | 5/1/1940 | PCV13 | 5/8/1982 | 1 |
1 | 5/1/1940 | PPSV23 | 9/7/1989 | 1 |
1 | 5/1/1940 | PPSV23 | 4/9/2010 | 1 |
1 | 5/1/1940 | PPSV23 | 7/20/2016 | 1 |
2 | 11/6/1950 | PCV13 | 6/1/1968 | 0 |
2 | 11/6/1950 | PCV13 | 5/2/1986 | 0 |
2 | 11/6/1950 | PPSV23 | 8/1/2015 | 0 |
3 | 10/6/1955 | PCV13 | 1/19/2013 | 1 |
3 | 10/6/1955 | PPSV23 | 2/1/2013 | 1 |
3 | 10/6/1955 | PPSV23 | 12/20/2015 | 1 |
4 | 3/12/1990 | PCV13 | 1/6/2005 | 0 |
4 | 3/12/1990 | PPSV23 | 2/6/2005 | 0 |
Any idea would be greatly appreciated!
How are you calculating age? My initial poke at this says that ID 3 only has one vaccine after 60, so should not be flagged?
So as long as they are 60, it doesn't matter how many vaccines they have had in total, how many of each kind, or in what order they have had them, all you really care about is did they have one of each type more than 60 days apart?
Edit: I posted an answer but it was wrong. I neglected to account for only vaccines after ago 60. Not just 60 year old people but vaccinated after 60. Try this now.
Data vaccine_records;
input person_id birth_date:mmddyy10. vaccine:$6. vaccine_date:mmddyy10.;
format birth_date vaccine_date yymmdd10.;
datalines;
1 5/1/1940 PCV13 3/1/2013
1 5/1/1940 PCV13 5/8/1982
1 5/1/1940 PPSV23 9/7/1989
1 5/1/1940 PPSV23 4/9/2010
1 5/1/1940 PPSV23 7/20/2016
2 11/6/1950 PCV13 6/1/1968
2 11/6/1950 PCV13 5/2/1986
2 11/6/1950 PPSV23 8/1/2015
3 10/6/1955 PCV13 1/19/2013
3 10/6/1955 PPSV23 2/1/2013
3 10/6/1955 PPSV23 12/20/2015
4 3/12/1990 PCV13 1/6/2005
4 3/12/1990 PPSV23 2/6/2005
5 5/2/1940 PCV13 3/2/2013
5 5/2/1940 PPSV23 4/1/2011
6 3/12/1940 PCV13 2/6/2005
6 3/12/1940 PPSV23 1/6/2005
;
run;
*get only the records where the recipient was over 60 at the time of vaccination;
proc sql;
create table validvaccination as
select *
from vaccine_records
where floor((vaccine_date - birth_date)/365) > 60;
quit;
*find min and max dates for pcv13;
proc sql;
create table minmaxpcv13 as
select person_id, min(vaccine_date) as minpcv13, max(vaccine_date) as maxpcv13
from validvaccination
where vaccine = 'PCV13'
group by person_id;
quit;
*find min and max dates for ppsv23;
proc sql;
create table minmaxppsv23 as
select person_id, min(vaccine_date) as minppsv23, max(vaccine_date) as maxppsv23
from validvaccination
where vaccine = 'PPSV23'
group by person_id;
quit;
*find everybody with a 60 day gap between (minpcv13 and min or maxppsv23) or (maxpcv13 and min or maxppsv23);
proc sql;
create table sixtydaygap as
select a.person_id
from minmaxpcv13 a inner join minmaxppsv23 b on a.person_id = b.person_id
where abs(a.minpcv13 - b.minppsv23) > 60 or abs(a.minpcv13 - b.maxppsv23) > 60
or abs(a.maxpcv13 - b.minppsv23) > 60 or abs(a.maxpcv13 - b.maxppsv23) > 60;
quit;
That picks up 1 and 5. One and five are old enough and have a gap. Two and six don't have a gap and three and four aren't old enough.
So the issue with the calculation of age aside, which might suggest that I don't fully get what you want, this is one possible approach (see notes in code):
data have;
input person_id birth_date :mmddyy10. vaccine $ vaccine_date :mmddyy10.;
format birth_date vaccine_date mmddyy10.;
age_at_vaccination = yrdif(birth_date, vaccine_date, 'AGE');
if age_at_vaccination >= 60; /* Toss out any where person was < 60 at vaccination. */
datalines;
1 5/1/1940 PCV13 3/1/2013
1 5/1/1940 PCV13 5/8/1982
1 5/1/1940 PPSV23 9/7/1989
1 5/1/1940 PPSV23 4/9/2010
1 5/1/1940 PPSV23 7/20/2016
2 11/6/1950 PCV13 6/1/1968
2 11/6/1950 PCV13 5/2/1986
2 11/6/1950 PPSV23 8/1/2015
3 10/6/1955 PCV13 1/19/2013
3 10/6/1955 PPSV23 2/1/2013
3 10/6/1955 PPSV23 12/20/2015
4 3/12/1990 PCV13 1/6/2005
4 3/12/1990 PPSV23 2/6/2005
;
proc sql;
/* General idea: Join the data with itself by person_id
where the vaccine is different and the date of the right
table came after the date on the left table.
This just shows all cases where a different vaccine came
later and how many days were between them. */
SELECT DISTINCT aa.person_id, aa.vaccine, aa.vaccine_date,
bb.vaccine AS later_type,
bb.vaccine_date AS later_date,
bb.vaccine_date - aa.vaccine_date AS days
FROM have AS aa
LEFT JOIN have AS bb
ON aa.person_id = bb.person_id AND
aa.vaccine ne bb.vaccine AND
bb.vaccine_date > aa.vaccine_date
ORDER BY aa.person_id, aa.vaccine_date;
quit;
proc sql;
/* To produce just the requested flag, we can
merge the above on as a subquery that only produces
the IDs that match the criterion of 60 days. */
CREATE TABLE want AS
SELECT a.*, not missing(b.person_id) AS flag
FROM have AS a
LEFT JOIN (SELECT DISTINCT aa.person_id
FROM have AS aa
INNER JOIN have AS bb
ON aa.person_id = bb.person_id AND
aa.vaccine ne bb.vaccine AND
bb.vaccine_date - aa.vaccine_date > 60) AS b
ON a.person_id = b.person_id;
quit;
You apparently want to examine maximum cross-vaccine date ranges for vaccinations taken at age 60 or more, correct?
There is only one vaccination in that age range for id 3 (born 10/6/1955, so 60th birthday is on 10/6/2015). So why have you assigned flag=1 for that id? After all, only one vaccine was taken on or after that date.
The task: For each id, see if the maximum cross-vaccine date range, for vaccinations taken at age 60+, is at least sixty days. If true then all vaccination records for that id get flag=1. Other id's get flag=0.
This program does that. It assumes only that the data are sorted by ID (date order within id is not required):
data have;
input person_id birth_date :mmddyy10. vaccine :$6. vaccine_date :mmddyy10.;
format birth_date vaccine_date mmddyy10.;
datalines;
1 5/1/1940 PCV13 3/1/2013
1 5/1/1940 PCV13 5/8/1982
1 5/1/1940 PPSV23 9/7/1989
1 5/1/1940 PPSV23 4/9/2010
1 5/1/1940 PPSV23 7/20/2016
2 11/6/1950 PCV13 6/1/1968
2 11/6/1950 PCV13 5/2/1986
2 11/6/1950 PPSV23 8/1/2015
3 10/6/1955 PCV13 1/19/2013
3 10/6/1955 PPSV23 2/1/2013
3 10/6/1955 PPSV23 12/20/2015
4 3/12/1990 PCV13 1/6/2005
4 3/12/1990 PPSV23 2/6/2005
run;
data want (drop=v);
set have (in=sixty_years_old
where=(intck('year',birth_date,vaccine_date,'continuous')>=60))
have (in=keepin);
by person_id;
array mindate_ {2} _temporary_;
array maxdate_ {2} _temporary_;
retain flag ;
if first.person_id then do;
call missing(of mindate_{*}, of maxdate_{*});
flag=0;
end;
if sixty_years_old and flag=0 then do;
v=findw('PCV13 PPSV23',trim(vaccine),' ','E'); /*PCV13:v=1, PPSV23:v=2*/
mindate_{v}=min(mindate_{v},vaccine_date);
maxdate_{v}=max(maxdate_{v},vaccine_date);
if range(mindate_{1},maxdate_{2})>=60 or
range(mindate_{2},maxdate_{1})>=60 then
flag=1;
end;
if keepin;
run;
This works because I use SET with a BY id statement, in which the SET statement has two targets: (1) all age 60+ records for a given ID, followed by (2) records for ALL ages for the same id. This allows preliminary examination of the age 60+ records to determine maximum cross-vaccine date range, establishing the FLAG value. The flag value is then retained when rereading and outputting all records for the same id.
The other "trick" here is to determine the vaccine index (1 for PCV13, 2 for PPSV23) using the findw function (the 'E' modifier tells sas to return the position based on word count, not letter count).
Also I use the RANGE function instead of subtracting mindate_{1} from maxdate_{2} (and vice versa), because that would generate notes when only 1 vaccine has been taken. The range function returns a missing value in such cases, but no notes.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.