BookmarkSubscribeRSS Feed
LisaYIN9309
Obsidian | Level 7

Hi everyone, I come across this problem and I hope you could help me with:

 

I am trying to count "How many people received both PCV13 and PPSV23 after they turned 60 years of age, and the days between the two vaccine is at least 60 days apart".

 

I don't care which vaccine was taken the first after they turn 60, only the length of the period in between; and a person may have got the same type of vaccine more than once after they turn 60, so I care the max gap- the earliest date of one vaccine and the latest date of the other

 

I create a sample of what the input data looks like:

person_idbirth_datevaccinevaccine_date
15/1/1940PCV133/1/2013
15/1/1940PCV135/8/1982
15/1/1940PPSV239/7/1989
15/1/1940PPSV234/9/2010
15/1/1940PPSV237/20/2016
211/6/1950PCV136/1/1968
211/6/1950PCV135/2/1986
211/6/1950PPSV238/1/2015
310/6/1955PCV131/19/2013
310/6/1955PPSV232/1/2013
310/6/1955PPSV2312/20/2015
43/12/1990PCV131/6/2005
43/12/1990PPSV232/6/2005

 

My thinking:

- first create a flag if the vaccine is received after they turned 60, then exclude those rows that vaccine_date is earlier than their 60 years birth date anniversary, for example, person 4 in the above example will be filtered out.

- But then I haven't figure out how to use INTCK and other conditions to let SAS calculate the max gap (because I don't care which vaccine they receive first, some people receive PCV13 first, some received PPSV23 first, and some received 2 PPSV23, but only 1 is after 60 days they received PCV13 - all kinds of possibilities)

 

My vision of the output is something like:

person_idbirth_datevaccinevaccine_date60Days_Flag
15/1/1940PCV133/1/20131
15/1/1940PCV135/8/19821
15/1/1940PPSV239/7/19891
15/1/1940PPSV234/9/20101
15/1/1940PPSV237/20/20161
211/6/1950PCV136/1/19680
211/6/1950PCV135/2/19860
211/6/1950PPSV238/1/20150
310/6/1955PCV131/19/20131
310/6/1955PPSV232/1/20131
310/6/1955PPSV2312/20/20151
43/12/1990PCV131/6/20050
43/12/1990PPSV232/6/20050

 

Any idea would be greatly appreciated!

5 REPLIES 5
collinelliot
Barite | Level 11

How are you calculating age? My initial poke at this says that ID 3 only has one vaccine after 60, so should not be flagged?

HB
Barite | Level 11 HB
Barite | Level 11

So as long as they are 60, it doesn't matter how many vaccines they have had in total, how many of each kind, or in what order they have had them, all you really care about is did they have one of each type more than 60 days apart?

 

Edit:  I posted an answer but it was wrong. I neglected to account for only vaccines after ago 60. Not just 60 year old people but vaccinated after 60.   Try this now.

 

Data vaccine_records;
  input person_id birth_date:mmddyy10. vaccine:$6.	vaccine_date:mmddyy10.;
  format birth_date vaccine_date yymmdd10.;
  datalines;
1	5/1/1940	PCV13	3/1/2013
1	5/1/1940	PCV13	5/8/1982
1	5/1/1940	PPSV23	9/7/1989
1	5/1/1940	PPSV23	4/9/2010
1	5/1/1940	PPSV23	7/20/2016
2	11/6/1950	PCV13	6/1/1968
2	11/6/1950	PCV13	5/2/1986
2	11/6/1950	PPSV23	8/1/2015
3	10/6/1955	PCV13	1/19/2013
3	10/6/1955	PPSV23	2/1/2013
3	10/6/1955	PPSV23	12/20/2015
4	3/12/1990	PCV13	1/6/2005
4	3/12/1990	PPSV23	2/6/2005
5	5/2/1940	PCV13	3/2/2013
5	5/2/1940	PPSV23	4/1/2011
6	3/12/1940	PCV13	2/6/2005
6	3/12/1940	PPSV23	1/6/2005
;
run;

*get only the records where the recipient was over 60 at the time of vaccination;
proc sql;
	create table validvaccination as 
	select *   
	from vaccine_records
	where floor((vaccine_date - birth_date)/365) > 60;
quit;


*find min and max dates for pcv13;
proc sql;
	create table minmaxpcv13 as 
	select person_id, min(vaccine_date) as minpcv13, max(vaccine_date) as maxpcv13  
	from validvaccination
	where vaccine = 'PCV13'
	group by person_id;
quit;

*find min and max dates for ppsv23;
proc sql;
	create table minmaxppsv23 as 
	select person_id, min(vaccine_date) as minppsv23, max(vaccine_date) as maxppsv23  
	from validvaccination
	where vaccine = 'PPSV23'
	group by person_id;
quit;

*find everybody with a 60 day gap between (minpcv13 and min or maxppsv23) or (maxpcv13 and min or maxppsv23); 
proc sql;
	create table sixtydaygap as 
	select a.person_id 
	from minmaxpcv13 a inner join minmaxppsv23 b on a.person_id = b.person_id
	where abs(a.minpcv13 - b.minppsv23) > 60 or abs(a.minpcv13 - b.maxppsv23) > 60  
	or abs(a.maxpcv13 - b.minppsv23) > 60 or abs(a.maxpcv13 - b.maxppsv23) > 60;
quit;

 

 

 

 That picks up 1 and 5.  One and five are old enough and have a gap.  Two and six don't have a gap and three and four aren't old enough. 

collinelliot
Barite | Level 11

So the issue with the calculation of age aside, which might suggest that I don't fully get what you want, this is one possible approach (see notes in code):

 

 

data have;
    input person_id birth_date :mmddyy10. vaccine $ vaccine_date :mmddyy10.;
    format birth_date vaccine_date mmddyy10.;
    age_at_vaccination = yrdif(birth_date, vaccine_date, 'AGE');

    if age_at_vaccination >= 60; /* Toss out any where person was < 60 at vaccination. */

datalines;
1 5/1/1940 PCV13 3/1/2013
1 5/1/1940 PCV13 5/8/1982
1 5/1/1940 PPSV23 9/7/1989
1 5/1/1940 PPSV23 4/9/2010
1 5/1/1940 PPSV23 7/20/2016
2 11/6/1950 PCV13 6/1/1968
2 11/6/1950 PCV13 5/2/1986
2 11/6/1950 PPSV23 8/1/2015
3 10/6/1955 PCV13 1/19/2013
3 10/6/1955 PPSV23 2/1/2013
3 10/6/1955 PPSV23 12/20/2015
4 3/12/1990 PCV13 1/6/2005
4 3/12/1990 PPSV23 2/6/2005
;

proc sql;
    /* General idea: Join the data with itself by person_id
       where the vaccine is different and the date of the right
       table came after the date on the left table.

       This just shows all cases where a different vaccine came
       later and how many days were between them. */
    SELECT DISTINCT aa.person_id, aa.vaccine, aa.vaccine_date, 
          bb.vaccine AS later_type, 
          bb.vaccine_date AS later_date,
          bb.vaccine_date - aa.vaccine_date AS days
    FROM have AS aa
    LEFT JOIN have AS bb
    ON aa.person_id = bb.person_id AND
        aa.vaccine ne bb.vaccine AND
        bb.vaccine_date > aa.vaccine_date
    ORDER BY aa.person_id, aa.vaccine_date;
quit;

proc sql;
    /* To produce just the requested flag, we can
       merge the above on as a subquery that only produces
       the IDs that match the criterion of 60 days. */
    CREATE TABLE want AS
    SELECT a.*, not missing(b.person_id) AS flag
    FROM have AS a
    LEFT JOIN (SELECT DISTINCT aa.person_id
               FROM have AS aa
               INNER JOIN have AS bb
               ON aa.person_id = bb.person_id AND
                  aa.vaccine ne bb.vaccine AND
                  bb.vaccine_date - aa.vaccine_date > 60) AS b
    ON a.person_id = b.person_id;
quit;
mkeintz
PROC Star

You apparently want to examine maximum cross-vaccine date ranges for vaccinations taken at age 60 or more, correct?

 

There is only one vaccination in that age range for id 3 (born 10/6/1955, so 60th birthday is on 10/6/2015).  So why have you assigned flag=1 for that id?   After all, only one vaccine was taken on or after that date.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

The task: For each id, see if the maximum cross-vaccine date range, for vaccinations taken at age 60+, is at least sixty days.  If true then all vaccination records for that id get flag=1.  Other id's get flag=0.

 

This program does that.  It assumes only that the data are sorted by ID (date order within id is not required):

 

data have;
  input person_id birth_date :mmddyy10. vaccine :$6. vaccine_date :mmddyy10.;
  format birth_date vaccine_date mmddyy10.;
datalines;
1 5/1/1940 PCV13 3/1/2013 
1 5/1/1940 PCV13 5/8/1982 
1 5/1/1940 PPSV23 9/7/1989 
1 5/1/1940 PPSV23 4/9/2010 
1 5/1/1940 PPSV23 7/20/2016 
2 11/6/1950 PCV13 6/1/1968 
2 11/6/1950 PCV13 5/2/1986 
2 11/6/1950 PPSV23 8/1/2015 
3 10/6/1955 PCV13 1/19/2013 
3 10/6/1955 PPSV23 2/1/2013 
3 10/6/1955 PPSV23 12/20/2015 
4 3/12/1990 PCV13 1/6/2005 
4 3/12/1990 PPSV23 2/6/2005 
run;

data want (drop=v);
  set have (in=sixty_years_old
            where=(intck('year',birth_date,vaccine_date,'continuous')>=60))
      have (in=keepin);
  by person_id;

  array mindate_ {2} _temporary_;
  array maxdate_ {2} _temporary_;
  retain flag ;


  if first.person_id then do;
    call missing(of mindate_{*}, of maxdate_{*});
    flag=0;
  end;

  if sixty_years_old and flag=0 then do;
    v=findw('PCV13 PPSV23',trim(vaccine),' ','E'); /*PCV13:v=1, PPSV23:v=2*/
    mindate_{v}=min(mindate_{v},vaccine_date);
    maxdate_{v}=max(maxdate_{v},vaccine_date);
    if range(mindate_{1},maxdate_{2})>=60 or 
    range(mindate_{2},maxdate_{1})>=60 then
    flag=1;
  end;

  if keepin;
run;

 

This works because I use SET with a BY id statement, in which the SET statement has two targets: (1) all age 60+ records for a given ID, followed by (2) records for ALL ages for the same id.  This allows preliminary examination of the age 60+ records to determine maximum cross-vaccine date range, establishing the FLAG value.  The flag value is then retained when rereading and outputting all records for the same id.

 

The other "trick" here is to determine the vaccine index (1 for PCV13, 2 for PPSV23) using the findw function (the 'E' modifier tells sas to return the position based on word count, not letter count).

 

Also I use the RANGE function instead of  subtracting mindate_{1} from maxdate_{2} (and vice versa), because that would generate notes when only 1 vaccine has been taken.  The range function returns a missing value in such cases, but no notes.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1296 views
  • 1 like
  • 4 in conversation