BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lvb1
Calcite | Level 5

I have a dataset with potential case-control pairs. I would like to select for every case four controls, without replacement. 

 

Have:

casecontrol
11
12
13
14
15
16
22
24
28
210
212
214
31
34
39
311
313
314
316
320
322


Want:

casecontrol
11
12
13
14
28
210
212
214
39
311
313
316

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
s_lassen
Meteorite | Level 14

Great to hear - please mark it as a solution, then 😊

View solution in original post

12 REPLIES 12
lvb1
Calcite | Level 5

Yes, Smantha, I did:

 

proc surveyselect data=cc out=RandOrderCC noprint
method=SRS /* sample w/o replacement seed=123 */
samprate=1 /* proportion of observations in sample */
OUTRANDOM; 
run;

 

That didn't give four controls for every case. I should have N cases * 4 controls. Depending on the seed, I get a different number of controls.

 

(Also: if I have for case 1 controls 1 2 3 4 and for case 2 I also have control 2, I would like to avoid that the selection procedure uses control 2 for case 2, since I only have 4 controls for case 1. I would like to have as many controls as possible.)

VENKATAMAHESH
Calcite | Level 5
data given;
input case control;
datalines;
1 1
1 2
1 3
1 4
1 5
1 6
2 2
2 4
2 8
2 10
2 12
2 14
3 1
3 4
3 9
3 11
3 13
3 14
3 16
3 20
3 22
;
run;

proc sort nodupkey;
by control;
run;

proc sort;
by case control;
run;

%macro function(dsn,value);
data &dsn;
set given;
if case = &value then output &dsn;
run;

data &dsn;
set &dsn;
if _n_ < 5 then output;
run;
%mend function;

%function(one,1);
%function(two,2);
%function(three,3);

data final;
set one two three;
run;
lvb1
Calcite | Level 5

Dear Venkatamahesh,

 

your code generates indeed the want table I gave in my original request. However, by sorting on control, you also delete records which are still valid. For example if you use:

 

datalines;

1 1
1 2
1 3
1 4
1 5
1 6
2 2
2 4
2 8
2 10
2 12
2 14
3 1
3 4
3 9
3 11
3 13
3 14
3 16
3 20
3 22
4 1
4 2
4 3
4 4
4 5
4 6

 

The fourth case is completely deleted, but case 4 should be matched to control 5 and 6.  

smantha
Lapis Lazuli | Level 10
data cc;
input case control;
datalines;
1 1
1 2
1 3
1 4
1 5
1 6
2 2
2 4
2 8
2 10
2 12
2 14
3 1
3 4
3 9
3 11
3 13
3 14
3 16
3 20
3 22
4 1
4 2
4 3
4 4
4 5
4 6
;
run;

proc sort; by case control; 
run;

data _null_;
set cc nobs=n;
call symputx('nrow',n);
stop;
run;
options symbolgen;
data want;
set cc ;
array test{&nrow.};
retain test1-test&nrow. count;
by case;
if first.case then count=1;

if count <5 then do;
 put _all_;
 if control not in test then do;
   count=count+1;
   output;
   test[_n_]= control;
 end;
end;


drop test1-test&nrow. count;
run;
lvb1
Calcite | Level 5

Hi Smanta, the codes works for a small set of data, but I have 400.000 potential case control matches. The code ran for three days and the computer blocked. The log file was about 177GB. I will need something else I'm afraid. 

 

s_lassen
Meteorite | Level 14

You can do it in a single datastep, quite simple:

data have;
input case control;
datalines;
1 1
1 2
1 3
1 4
1 5
1 6
2 2
2 4
2 8
2 10
2 12
2 14
3 1
3 4
3 9
3 11
3 13
3 14
3 16
3 20
3 22
;
run;

data want;
  do _N_=1 by 1 until(last.case);
    set have;
    by case;
    end;
  out=4;
  do _N_=_N_ to 1 by -1;
    set have;
    if rand('uniform')<out/_N_ then do;
      output;
      out=out-1;
      end;
    end;
  drop out;  /* Yes, that's a valid SAS statement... */
run;
lvb1
Calcite | Level 5

Sorry, that didn't work. As said, once I have four cases, the code doesn't generate the correct output.

s_lassen
Meteorite | Level 14

Sorry, I misunderstood your requirements. I thought you needed a random selection of cases, but it seems it does not have to be. On the other hand you need CONTROL to be unique. Is that correct?

 

If I have understood that correctly, here is a way that may work:

data want;
  set have;
  stop;
run;

proc sql;
  create unique index control on want(control);
quit;

data want;
  if 0 then modify want;
  out=4;
  do until(last.case);
    set have;
    by case;
    if out=0 then continue;
    _iorc_=0;
    output;
    if _iorc_ then continue; /* unique key restraint violated */
    out=out-1;
    end;
  _error_=0;
run;
lvb1
Calcite | Level 5

This works, thanks.

s_lassen
Meteorite | Level 14

Great to hear - please mark it as a solution, then 😊

kjmathes03
Fluorite | Level 6

Hello,

 

I have the same problem and when I run the first step of your code -

 

data want;

set have;

stop;

run;

 

I don;t under stand this step as it just reduces my data to zero records?

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 3299 views
  • 2 likes
  • 5 in conversation