I am working with a matched case control study, where there are multiple controls (case = 0) matched to one case (case = 1) by the variable "trimgroupid".
Each observation has it's own ID (uniqueid).
What I want to do is identify the controls that have a date of birth (dob) > 1 year away (+ or -) from it's matched case.
Here is what I have:
data have;
infile datalines delimiter=',';
length uniqueid $5 trimgroupid $7;
format dob mmddyy10.;
input uniqueid $ trimgroupid $ dob mmddyy10. case;
datalines;
X0131,XX01491,03/10/2000,1
X1831,XX01491,12/05/2002,0
X3691,XX01491,06/02/2000,0
X1971,XX04997,02/05/2010,1
X2611,XX04997,03/13/2011,0
X4371,XX04997,06/25/2009,0
X4621,XX04997,01/01/2009,0
;
run;
And here is what I want:
data want;
infile datalines delimiter=',';
length uniqueid $5 trimgroupid $7;
format dob mmddyy10.;
input uniqueid $ trimgroupid $ dob mmddyy10. case;
datalines;
X1831,XX01491,12/05/2002,0
X2611,XX04997,03/13/2011,0
X4621,XX04997,01/01/2009,0
;
run;
For the trimgroupid value of XX01491 there is 1 control whose dob is > 1 year away from the case it is matched to.
For the trimgroupid value of XX04997 there are 2 controls whose dob is > 1 year away from the case it is matched to.
If the data are already sorted by trimgroupid, then a MERGE statement (with a rename), accompanied by a BY statement will do what you want:
data have;
infile datalines delimiter=',';
length uniqueid $5 trimgroupid $7;
format dob mmddyy10.;
input uniqueid $ trimgroupid $ dob mmddyy10. case;
datalines;
X0131,XX01491,03/10/2000,1
X1831,XX01491,12/05/2002,0
X3691,XX01491,06/02/2000,0
X1971,XX04997,02/05/2010,1
X2611,XX04997,03/13/2011,0
X4371,XX04997,06/25/2009,0
X4621,XX04997,01/01/2009,0
run;
data want;
merge have (where=(case=1) rename=(dob=case_dob))
have (where=(case=0));
by trimgroupid;
if intck('year',case_dob,dob,'continuous')>0 or
intck('year',dob,case_dob,'continuous')>0 ;
run;
Since the control dob's can be either before or after the matching case dob, the subsetting IF checks on the interval from case_dob to dob, and also from dob to case_dob. Note this relies on the presence of exactly one case per trimgroupid. The program also has the advantage of storing the case_dob value in each of the qualifying controls.
To simplify the code I used 365 days as a year, neglecting a leap year:
data have;
infile datalines delimiter=',';
length uniqueid $5 trimgroupid $7;
format dob mmddyy10.;
input uniqueid $ trimgroupid $ dob mmddyy10. case;
datalines;
X0131,XX01491,03/10/2000,1
X1831,XX01491,12/05/2002,0
X3691,XX01491,06/02/2000,0
X1971,XX04997,02/05/2010,1
X2611,XX04997,03/13/2011,0
X4371,XX04997,06/25/2009,0
X4621,XX04997,01/01/2009,0
;
run;
proc sort data=have; by trimgroupid dob; run;
data want;
set have;
by trimgroupid dob;
if dif(dob) ge 365;
run;
This solution also works, but for my particular dataset the other approach fit better.
Thank you!
If the data are already sorted by trimgroupid, then a MERGE statement (with a rename), accompanied by a BY statement will do what you want:
data have;
infile datalines delimiter=',';
length uniqueid $5 trimgroupid $7;
format dob mmddyy10.;
input uniqueid $ trimgroupid $ dob mmddyy10. case;
datalines;
X0131,XX01491,03/10/2000,1
X1831,XX01491,12/05/2002,0
X3691,XX01491,06/02/2000,0
X1971,XX04997,02/05/2010,1
X2611,XX04997,03/13/2011,0
X4371,XX04997,06/25/2009,0
X4621,XX04997,01/01/2009,0
run;
data want;
merge have (where=(case=1) rename=(dob=case_dob))
have (where=(case=0));
by trimgroupid;
if intck('year',case_dob,dob,'continuous')>0 or
intck('year',dob,case_dob,'continuous')>0 ;
run;
Since the control dob's can be either before or after the matching case dob, the subsetting IF checks on the interval from case_dob to dob, and also from dob to case_dob. Note this relies on the presence of exactly one case per trimgroupid. The program also has the advantage of storing the case_dob value in each of the qualifying controls.
This worked great, thank you!
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.