BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JJ_83
Obsidian | Level 7

I am working with a matched case control study, where there are multiple controls (case = 0) matched to one case (case = 1) by the variable "trimgroupid".

 

Each observation has it's own ID (uniqueid).

 

What I want to do is identify the controls that have a date of birth (dob) > 1 year away (+ or -) from it's matched case.

 

Here is what I have:

 

data have;
	infile datalines delimiter=',';
	length uniqueid $5 trimgroupid $7;
	format dob mmddyy10.;
	input uniqueid $ trimgroupid $ dob mmddyy10. case;
	datalines;
	X0131,XX01491,03/10/2000,1
	X1831,XX01491,12/05/2002,0
	X3691,XX01491,06/02/2000,0
	X1971,XX04997,02/05/2010,1
	X2611,XX04997,03/13/2011,0
	X4371,XX04997,06/25/2009,0
	X4621,XX04997,01/01/2009,0
	;
run;

 

And here is what I want:

data want;
	infile datalines delimiter=',';
	length uniqueid $5 trimgroupid $7;
	format dob mmddyy10.;
	input uniqueid $ trimgroupid $ dob mmddyy10. case;
	datalines;
	X1831,XX01491,12/05/2002,0
	X2611,XX04997,03/13/2011,0
	X4621,XX04997,01/01/2009,0
	;
run;

 

For the trimgroupid value of XX01491 there is 1 control whose dob is > 1 year away from the case it is matched to.

For the trimgroupid value of XX04997 there are 2 controls whose dob is > 1 year away from the case it is matched to.

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

If the data are already sorted by trimgroupid, then a MERGE statement (with a rename), accompanied by a BY statement will do what you want:

 

data have;
  infile datalines delimiter=',';
  length uniqueid $5 trimgroupid $7;
  format dob mmddyy10.;
  input uniqueid $ trimgroupid $ dob mmddyy10. case;
datalines;
X0131,XX01491,03/10/2000,1
X1831,XX01491,12/05/2002,0
X3691,XX01491,06/02/2000,0
X1971,XX04997,02/05/2010,1
X2611,XX04997,03/13/2011,0
X4371,XX04997,06/25/2009,0
X4621,XX04997,01/01/2009,0
run;
data want;
  merge have (where=(case=1) rename=(dob=case_dob))
        have (where=(case=0));
  by trimgroupid;
  if intck('year',case_dob,dob,'continuous')>0 or 
     intck('year',dob,case_dob,'continuous')>0 ;
run;

Since the control dob's can be either before or after the matching case dob, the subsetting IF checks on the interval from case_dob to dob, and also from dob to case_dob.   Note this relies on the presence of exactly one case per trimgroupid.  The program also has the advantage of storing the case_dob value in each of the qualifying controls.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
Shmuel
Garnet | Level 18

To simplify the code I used 365 days as a year, neglecting a leap year:

data have;
	infile datalines delimiter=',';
	length uniqueid $5 trimgroupid $7;
	format dob mmddyy10.;
	input uniqueid $ trimgroupid $ dob mmddyy10. case;
	datalines;
	X0131,XX01491,03/10/2000,1
	X1831,XX01491,12/05/2002,0
	X3691,XX01491,06/02/2000,0
	X1971,XX04997,02/05/2010,1
	X2611,XX04997,03/13/2011,0
	X4371,XX04997,06/25/2009,0
	X4621,XX04997,01/01/2009,0
	;
run;
proc sort data=have; by trimgroupid dob; run;
data want;
 set have;
  by trimgroupid dob;
     if dif(dob) ge 365;
run;
JJ_83
Obsidian | Level 7

This solution also works, but for my particular dataset the other approach fit better. 

Thank you!

mkeintz
PROC Star

If the data are already sorted by trimgroupid, then a MERGE statement (with a rename), accompanied by a BY statement will do what you want:

 

data have;
  infile datalines delimiter=',';
  length uniqueid $5 trimgroupid $7;
  format dob mmddyy10.;
  input uniqueid $ trimgroupid $ dob mmddyy10. case;
datalines;
X0131,XX01491,03/10/2000,1
X1831,XX01491,12/05/2002,0
X3691,XX01491,06/02/2000,0
X1971,XX04997,02/05/2010,1
X2611,XX04997,03/13/2011,0
X4371,XX04997,06/25/2009,0
X4621,XX04997,01/01/2009,0
run;
data want;
  merge have (where=(case=1) rename=(dob=case_dob))
        have (where=(case=0));
  by trimgroupid;
  if intck('year',case_dob,dob,'continuous')>0 or 
     intck('year',dob,case_dob,'continuous')>0 ;
run;

Since the control dob's can be either before or after the matching case dob, the subsetting IF checks on the interval from case_dob to dob, and also from dob to case_dob.   Note this relies on the presence of exactly one case per trimgroupid.  The program also has the advantage of storing the case_dob value in each of the qualifying controls.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
JJ_83
Obsidian | Level 7

This worked great, thank you!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 532 views
  • 2 likes
  • 3 in conversation