DATA Step, Macro, Functions and more

Matching cases with controls

Accepted Solution Solved
Reply
Contributor
Posts: 61
Accepted Solution

Matching cases with controls

Hi,

I want to match cases with controls on these variables: age (± 3 years), edu (± 2), race, parity (0,≥1) and date of blood sample (dobw).  I am providing sample data as my original data are still not ready.  I also could not enter variable dobw. 

 

If possible, please give some guidance on entering in SAS correctly.

 

data sample;
input id age edu race parity wt ht;
datalines;
05 32 9 1 0 90 5.2
12 35 12 1 2 110 5.0
17 40 10 2 1 130 4.9
22 25 14 3 0 110 5.4
29 27 12 2 1 95 5.0
40 28 10 3 2 130 6
;
data controls;
input id age edu race parity wt ht;
datalines;
10 33 11 2 1 98 5.7
11 35 10 2 2 120 5.9
19 42 11 1 3 139 5.9
26 26 14 0 1 99 5.0
31 27 10 2 1 95 5.5
32 29 12 2 2 125 6.2
37 31 10 2 1 190 7.2
43 35 12 1 1 110 5.0
46 45 10 1 1 130 4.9
51 28 12 3 0 110 5.4
55 22 10 1 1 95 5.0
58 28 10 3 2 130 6
70 32 9 1 0 90 5.2
79 37 11 1 2 110 5.0
85 48 14 2 1 130 4.9
88 29 14 3 0 110 5.4
90 25 14 2 1 95 5.0
93 27 11 3 2 130 6
99 35 10 1 0 90 5.2
101 33 11 1 2 110 5.0
114 38 16 2 1 130 4.9
125 25 13 3 0 110 5.4
129 25 10 2 1 95 5.0
130 24 11 3 2 130 6
;
run;


Accepted Solutions
Solution
‎02-24-2016 06:08 PM
Super User
Posts: 5,431

Re: Matching cases with controls

Attaching a sample of desired output based on your inputs helps describing your requirement.
SQL join with between-and criteria is one way to handle matching on ranges.
Data never sleeps

View solution in original post


All Replies
Solution
‎02-24-2016 06:08 PM
Super User
Posts: 5,431

Re: Matching cases with controls

Attaching a sample of desired output based on your inputs helps describing your requirement.
SQL join with between-and criteria is one way to handle matching on ranges.
Data never sleeps
Contributor
Posts: 61

Re: Matching cases with controls

Hi,
I could not find the attachment.  May be I am missing; I am a new member and don't know much about the communication.

 

 

Contributor
Posts: 61

Re: Matching cases with controls

Hi,

 

I accidently clicked the 'accept the solution' tab; I have not been able to access the solution on this page. I will appreciate if I could access the solution.

Thanks

Abdul

Respected Advisor
Posts: 4,927

Re: Matching cases with controls

If you have a licence to SAS/OR, you can easily find an optimal assignment with proc optnet, even with many controls per case:

 

data sample;
input id age edu race parity wt ht;
datalines;
05 32 9 1 0 90 5.2
12 35 12 1 2 110 5.0
17 40 10 2 1 130 4.9
22 25 14 3 0 110 5.4
29 27 12 2 1 95 5.0
40 28 10 3 2 130 6
;
data controls;
input id age edu race parity wt ht;
datalines;
10 33 11 2 1 98 5.7
11 35 10 2 2 120 5.9
19 42 11 1 3 139 5.9
26 26 14 0 1 99 5.0
31 27 10 2 1 95 5.5
32 29 12 2 2 125 6.2
37 31 10 2 1 190 7.2
43 35 12 1 1 110 5.0
46 45 10 1 1 130 4.9
51 28 12 3 0 110 5.4
55 22 10 1 1 95 5.0
58 28 10 3 2 130 6
70 32 9 1 0 90 5.2
79 37 11 1 2 110 5.0
85 48 14 2 1 130 4.9
88 29 14 3 0 110 5.4
90 25 14 2 1 95 5.0
93 27 11 3 2 130 6
99 35 10 1 0 90 5.2
101 33 11 1 2 110 5.0
114 38 16 2 1 130 4.9
125 25 13 3 0 110 5.4
129 25 10 2 1 95 5.0
130 24 11 3 2 130 6
;

/* Find all possible matches */
proc sql;
create table matches as
select 
    s.id as sampleId, 
    c.id as controlId,
    euclid((c.age-s.age)/7, (c.edu-s.edu)/5) as distance format=5.2
from 
    sample as s left join 
    controls as c
    on  c.age between s.age-3 and s.age+3 and
        c.edu between s.edu-2 and s.edu+2 and
        c.race = s.race and
        (c.parity=0) = (s.parity=0)
order by sampleId, distance;
select * from matches;
quit;

/* How many controls per sample ? */
%let controlsPerSample=2;

/* Prepare data for linear assingment algorithm */
data links;
set matches;
where not missing(controlId);
controlNode = controlId;
do i = 1 to min(99, &controlsPerSample);
    sampleNode = sampleId + i/100;
    output;
    end;
format sampleNode 10.2;
keep sampleNode controlNode distance;
run;

/* Perform best linear assignment algorithm */
proc optnet data_links=links graph_direction=directed;
data_links_var from=sampleNode to=controlNode weight=distance;
linear_assignment out=bestMatches;
run;

proc sql;
select 
    floor(sampleNode) as sampleId,
    controlNode as controlId
from bestMatches;
quit;
PG
Contributor
Posts: 61

Re: Matching cases with controls

Dear PG,

Excellent!  I am very much grateful to you for your time and effort!

 

Here is the output data.  There was only one problem, it did not match with no 17 sample; may be there was no match.

Can you write me the code for the date of blood sample withdrawn (dobw)?  I have to receive this information from our data management unit, I ll be bit flexible and take it in upto two weeks±.  Also, is there any simple way of entering date in sas.

Best regards

The SAS System

 

sampleId controlsId
570
599
1243
12101
22125
2251
2932
2931
4058
4093
Respected Advisor
Posts: 4,927

Re: Matching cases with controls

Glad you could run my code. I hope you can adapt it to your precise requirements. Revise the distance calculation to make sure it weights the matching criteria the way you want. I guess you will want to include the dobw in the distance calculation when it becomes available.

 

You are right, there was no match for case 17. 

 

SAS can handle a great variety of date formats. Something like 2016-02-25 will do just fine.

PG
Contributor
Posts: 61

Re: Matching cases with controls

Hi PG,

 

I can't run SAS codes for analysis.  The final data in the form of matched set contains only the IDs of both cases and controls. Can yo uguide me how to bring all the variables alongwith IDs in the final matchd file?

 

Thanks

 

Contributor
Posts: 61

Re: Matching cases with controls

Hi PG,

 

How can I stack each sample with its two matched controls?

 

Thanks

Respected Advisor
Posts: 4,927

Re: Matching cases with controls

Not sure what you mean by stacking, but here are two possibilities

 

/* Stacked matches are either ... */
proc sql;
create table finalMatches as
select 
    floor(sampleNode) as caseId "Case Id",
    controlNode as controlId
from bestMatches;
quit;

proc transpose data=finalMatches out=stackedMatches1(drop=_name_) prefix=control; 
by caseId;
var controlId;
run;

/* ... Or */
data stackedMatches2;
length type $8;
label group="Group" type="Type" id="Id";
set bestMatches;
caseId = floor(sampleNode);
if caseId ne lag(caseId) then do;
    ctrl = 0;
    group + 1;
    type = "Case";
    Id = caseId;
    output;
    end;
ctrl + 1;
type = cats("Control", ctrl);
Id = controlNode;
output;
keep group type id;
run;
PG
Contributor
Posts: 61

Re: Matching cases with controls

Hi PG,

Thanks again for guiding me! The second code, with stack…, worked; I wanted the data in this shape.

 

As this datafile, in this case stackmatches 2, contains only IDs and no other information. For analysis when I run SAS code, to which datafile should I apply this code, stachmatches 2 does not have information on other variables.

 

Respected Advisor
Posts: 4,927

Re: Matching cases with controls

To get a full dataset, ready for analysis, try:

 

data stackedMatches2;
length type $8;
label group="Group" type="Type" id="Id" ctrlId="Control Id";
set bestMatches;
caseId = floor(sampleNode);
if caseId ne lag(caseId) then do;
    ctrlId = .;
    group + 1;
    type = "Case";
    Id = caseId;
    output;
    end;
ctrlId + 1;
type = "Control";
Id = controlNode;
output;
keep group type ctrlId id;
run;

proc sql;
create table allMatches as
select a.group, a.type, a.ctrlId, b.*
from stackedMatches2 as a inner join sample as b on a.id=b.id
union all corresponding
select a.group, a.type, a.ctrlId, b.*
from stackedMatches2 as a inner join controls as b on a.id=b.id
order by group, type, ctrlId;
quit;
PG
Contributor
Posts: 61

Re: Matching cases with controls

Dear PGstats,

Once again thanks a lot!

I was able to get data with all variables and could perform analysis.

As I shared with you this is not my real dataset; I will be getting my dataset soon and write codes for my real dataset and see if there is any problem will let you know.  I hope you won't mind.

 

Regards

Respected Advisor
Posts: 4,927

Re: Matching cases with controls

Best of luck!

PG
Contributor
Posts: 61

Re: Matching cases with controls

Hi PGstats,

 

I am writing this after exploring the web/SAS pages to find out answer but could not.

 

How the following figures, such as 7 and 5 as demonitors and format are determined? As I plan to add my another matching variable, the date of blood withdrawal, with a matching range of 2 weeks, I need this information for completing the code in addition to my knowledge.

 

euclid((c.age-s.age)/7, (c.edu-s.edu)/5) as distance format=5.2.

Thanks

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 25 replies
  • 522 views
  • 4 likes
  • 3 in conversation