BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Walternate
Obsidian | Level 7

Hello,

I have a dataset composed of 1.5 million subjects which are divided into cases and controls. I need SAS coding to create a new dataset composed of all the cases plus four randomly selected controls for each case, matched by date interviewed.

Any help is appreciated and I am also available to provide more details if needed.

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

You can match cases and 4 controls that were interviewed on the same day (dateInt) randomly this way:

data have;
input id case DateInt :yymmdd. @@;
format dateInt date11.;
datalines;
1  0 20120101  2 0 20120103  3 1 20120101
4  0 20120103  5 0 20120101  6 1 20120103
7  0 20120101  8 0 20120102  9 0 20120103
10 1 20120103 11 0 20120103 12 0 20120103
13 0 20120103 14 0 20120103 15 0 20120103
;

/* Assign a random order to all cases and controls */
data tmp;
call streaminit(98768976);
set have;
rnd = rand("UNIFORM");
run;

proc sort data=tmp; by dateInt case rnd; run;

proc rank data=tmp out=tmpR;
by dateInt case;
var rnd;
run;

/* Match the first case with the first 4 controls, the second case with controls

     5 to 8 and so on, within each date. */
proc sql;
create table want as
select ca.dateInt, ca.id as caseId, co.id as controlId
from tmpR as ca inner join tmpR as co
     on ca.dateInt=co.dateInt and ca.rnd=ceil(co.rnd/4)
where ca.case and not co.case;
drop table tmp, tmpR;
select * from want;
quit;

PG

PG

View solution in original post

4 REPLIES 4
PGStats
Opal | Level 21

You can match cases and 4 controls that were interviewed on the same day (dateInt) randomly this way:

data have;
input id case DateInt :yymmdd. @@;
format dateInt date11.;
datalines;
1  0 20120101  2 0 20120103  3 1 20120101
4  0 20120103  5 0 20120101  6 1 20120103
7  0 20120101  8 0 20120102  9 0 20120103
10 1 20120103 11 0 20120103 12 0 20120103
13 0 20120103 14 0 20120103 15 0 20120103
;

/* Assign a random order to all cases and controls */
data tmp;
call streaminit(98768976);
set have;
rnd = rand("UNIFORM");
run;

proc sort data=tmp; by dateInt case rnd; run;

proc rank data=tmp out=tmpR;
by dateInt case;
var rnd;
run;

/* Match the first case with the first 4 controls, the second case with controls

     5 to 8 and so on, within each date. */
proc sql;
create table want as
select ca.dateInt, ca.id as caseId, co.id as controlId
from tmpR as ca inner join tmpR as co
     on ca.dateInt=co.dateInt and ca.rnd=ceil(co.rnd/4)
where ca.case and not co.case;
drop table tmp, tmpR;
select * from want;
quit;

PG

PG
Doc_Duke
Rhodochrosite | Level 12

PG's approach works for exact date matches.  If you want to look for dates 'near' the case-date, do a google search on

nearest neighbor match site:sas.com

for a bunch of related results.

Walternate
Obsidian | Level 7

Thank you for the reply. Upon entering the code, I received the following error message: "Column case could not be found in the table/view identified with the correlation name CA" (and another one for the correlation name CO). As I am not conversant in proc sql I am unable to troubleshoot myself and any assistance would be appreciated.

Thanks!

PGStats
Opal | Level 21

The message means that dataset tmpR doesn't include a variable named case. It should be there if the previous step (proc rank) ran without error. - PG

PG

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1986 views
  • 1 like
  • 3 in conversation