DATA Step, Macro, Functions and more

Case-control 1-4 matching by date

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 138
Accepted Solution

Case-control 1-4 matching by date

Hello,

I have a dataset composed of 1.5 million subjects which are divided into cases and controls. I need SAS coding to create a new dataset composed of all the cases plus four randomly selected controls for each case, matched by date interviewed.

Any help is appreciated and I am also available to provide more details if needed.

Thanks!


Accepted Solutions
Solution
‎12-26-2012 10:17 PM
Respected Advisor
Posts: 4,646

Re: Case-control 1-4 matching by date

You can match cases and 4 controls that were interviewed on the same day (dateInt) randomly this way:

data have;
input id case DateInt :yymmdd. @@;
format dateInt date11.;
datalines;
1  0 20120101  2 0 20120103  3 1 20120101
4  0 20120103  5 0 20120101  6 1 20120103
7  0 20120101  8 0 20120102  9 0 20120103
10 1 20120103 11 0 20120103 12 0 20120103
13 0 20120103 14 0 20120103 15 0 20120103
;

/* Assign a random order to all cases and controls */
data tmp;
call streaminit(98768976);
set have;
rnd = rand("UNIFORM");
run;

proc sort data=tmp; by dateInt case rnd; run;

proc rank data=tmp out=tmpR;
by dateInt case;
var rnd;
run;

/* Match the first case with the first 4 controls, the second case with controls

     5 to 8 and so on, within each date. */
proc sql;
create table want as
select ca.dateInt, ca.id as caseId, co.id as controlId
from tmpR as ca inner join tmpR as co
     on ca.dateInt=co.dateInt and ca.rnd=ceil(co.rnd/4)
where ca.case and not co.case;
drop table tmp, tmpR;
select * from want;
quit;

PG

PG

View solution in original post


All Replies
Solution
‎12-26-2012 10:17 PM
Respected Advisor
Posts: 4,646

Re: Case-control 1-4 matching by date

You can match cases and 4 controls that were interviewed on the same day (dateInt) randomly this way:

data have;
input id case DateInt :yymmdd. @@;
format dateInt date11.;
datalines;
1  0 20120101  2 0 20120103  3 1 20120101
4  0 20120103  5 0 20120101  6 1 20120103
7  0 20120101  8 0 20120102  9 0 20120103
10 1 20120103 11 0 20120103 12 0 20120103
13 0 20120103 14 0 20120103 15 0 20120103
;

/* Assign a random order to all cases and controls */
data tmp;
call streaminit(98768976);
set have;
rnd = rand("UNIFORM");
run;

proc sort data=tmp; by dateInt case rnd; run;

proc rank data=tmp out=tmpR;
by dateInt case;
var rnd;
run;

/* Match the first case with the first 4 controls, the second case with controls

     5 to 8 and so on, within each date. */
proc sql;
create table want as
select ca.dateInt, ca.id as caseId, co.id as controlId
from tmpR as ca inner join tmpR as co
     on ca.dateInt=co.dateInt and ca.rnd=ceil(co.rnd/4)
where ca.case and not co.case;
drop table tmp, tmpR;
select * from want;
quit;

PG

PG
Trusted Advisor
Posts: 2,113

Re: Case-control 1-4 matching by date

PG's approach works for exact date matches.  If you want to look for dates 'near' the case-date, do a google search on

nearest neighbor match site:sas.com

for a bunch of related results.

Frequent Contributor
Posts: 138

Re: Case-control 1-4 matching by date

Thank you for the reply. Upon entering the code, I received the following error message: "Column case could not be found in the table/view identified with the correlation name CA" (and another one for the correlation name CO). As I am not conversant in proc sql I am unable to troubleshoot myself and any assistance would be appreciated.

Thanks!

Respected Advisor
Posts: 4,646

Re: Case-control 1-4 matching by date

The message means that dataset tmpR doesn't include a variable named case. It should be there if the previous step (proc rank) ran without error. - PG

PG
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 442 views
  • 0 likes
  • 3 in conversation