Hi,
I want to match cases with controls on these variables: age (± 3 years), edu (± 2), race, parity (0,≥1) and date of blood sample (dobw). I am providing sample data as my original data are still not ready. I also could not enter variable dobw.
If possible, please give some guidance on entering in SAS correctly.
data sample;
input id age edu race parity wt ht;
datalines;
05 32 9 1 0 90 5.2
12 35 12 1 2 110 5.0
17 40 10 2 1 130 4.9
22 25 14 3 0 110 5.4
29 27 12 2 1 95 5.0
40 28 10 3 2 130 6
;
data controls;
input id age edu race parity wt ht;
datalines;
10 33 11 2 1 98 5.7
11 35 10 2 2 120 5.9
19 42 11 1 3 139 5.9
26 26 14 0 1 99 5.0
31 27 10 2 1 95 5.5
32 29 12 2 2 125 6.2
37 31 10 2 1 190 7.2
43 35 12 1 1 110 5.0
46 45 10 1 1 130 4.9
51 28 12 3 0 110 5.4
55 22 10 1 1 95 5.0
58 28 10 3 2 130 6
70 32 9 1 0 90 5.2
79 37 11 1 2 110 5.0
85 48 14 2 1 130 4.9
88 29 14 3 0 110 5.4
90 25 14 2 1 95 5.0
93 27 11 3 2 130 6
99 35 10 1 0 90 5.2
101 33 11 1 2 110 5.0
114 38 16 2 1 130 4.9
125 25 13 3 0 110 5.4
129 25 10 2 1 95 5.0
130 24 11 3 2 130 6
;
run;
Hi,
I could not find the attachment. May be I am missing; I am a new member and don't know much about the communication.
Hi,
I accidently clicked the 'accept the solution' tab; I have not been able to access the solution on this page. I will appreciate if I could access the solution.
Thanks
Abdul
If you have a licence to SAS/OR, you can easily find an optimal assignment with proc optnet, even with many controls per case:
data sample;
input id age edu race parity wt ht;
datalines;
05 32 9 1 0 90 5.2
12 35 12 1 2 110 5.0
17 40 10 2 1 130 4.9
22 25 14 3 0 110 5.4
29 27 12 2 1 95 5.0
40 28 10 3 2 130 6
;
data controls;
input id age edu race parity wt ht;
datalines;
10 33 11 2 1 98 5.7
11 35 10 2 2 120 5.9
19 42 11 1 3 139 5.9
26 26 14 0 1 99 5.0
31 27 10 2 1 95 5.5
32 29 12 2 2 125 6.2
37 31 10 2 1 190 7.2
43 35 12 1 1 110 5.0
46 45 10 1 1 130 4.9
51 28 12 3 0 110 5.4
55 22 10 1 1 95 5.0
58 28 10 3 2 130 6
70 32 9 1 0 90 5.2
79 37 11 1 2 110 5.0
85 48 14 2 1 130 4.9
88 29 14 3 0 110 5.4
90 25 14 2 1 95 5.0
93 27 11 3 2 130 6
99 35 10 1 0 90 5.2
101 33 11 1 2 110 5.0
114 38 16 2 1 130 4.9
125 25 13 3 0 110 5.4
129 25 10 2 1 95 5.0
130 24 11 3 2 130 6
;
/* Find all possible matches */
proc sql;
create table matches as
select
s.id as sampleId,
c.id as controlId,
euclid((c.age-s.age)/7, (c.edu-s.edu)/5) as distance format=5.2
from
sample as s left join
controls as c
on c.age between s.age-3 and s.age+3 and
c.edu between s.edu-2 and s.edu+2 and
c.race = s.race and
(c.parity=0) = (s.parity=0)
order by sampleId, distance;
select * from matches;
quit;
/* How many controls per sample ? */
%let controlsPerSample=2;
/* Prepare data for linear assingment algorithm */
data links;
set matches;
where not missing(controlId);
controlNode = controlId;
do i = 1 to min(99, &controlsPerSample);
sampleNode = sampleId + i/100;
output;
end;
format sampleNode 10.2;
keep sampleNode controlNode distance;
run;
/* Perform best linear assignment algorithm */
proc optnet data_links=links graph_direction=directed;
data_links_var from=sampleNode to=controlNode weight=distance;
linear_assignment out=bestMatches;
run;
proc sql;
select
floor(sampleNode) as sampleId,
controlNode as controlId
from bestMatches;
quit;
Dear PG,
Excellent! I am very much grateful to you for your time and effort!
Here is the output data. There was only one problem, it did not match with no 17 sample; may be there was no match.
Can you write me the code for the date of blood sample withdrawn (dobw)? I have to receive this information from our data management unit, I ll be bit flexible and take it in upto two weeks±. Also, is there any simple way of entering date in sas.
Best regards
The SAS System |
5 | 70 |
5 | 99 |
12 | 43 |
12 | 101 |
22 | 125 |
22 | 51 |
29 | 32 |
29 | 31 |
40 | 58 |
40 | 93 |
Glad you could run my code. I hope you can adapt it to your precise requirements. Revise the distance calculation to make sure it weights the matching criteria the way you want. I guess you will want to include the dobw in the distance calculation when it becomes available.
You are right, there was no match for case 17.
SAS can handle a great variety of date formats. Something like 2016-02-25 will do just fine.
Hi PG,
I can't run SAS codes for analysis. The final data in the form of matched set contains only the IDs of both cases and controls. Can yo uguide me how to bring all the variables alongwith IDs in the final matchd file?
Thanks
Hi PG,
How can I stack each sample with its two matched controls?
Thanks
Not sure what you mean by stacking, but here are two possibilities
/* Stacked matches are either ... */
proc sql;
create table finalMatches as
select
floor(sampleNode) as caseId "Case Id",
controlNode as controlId
from bestMatches;
quit;
proc transpose data=finalMatches out=stackedMatches1(drop=_name_) prefix=control;
by caseId;
var controlId;
run;
/* ... Or */
data stackedMatches2;
length type $8;
label group="Group" type="Type" id="Id";
set bestMatches;
caseId = floor(sampleNode);
if caseId ne lag(caseId) then do;
ctrl = 0;
group + 1;
type = "Case";
Id = caseId;
output;
end;
ctrl + 1;
type = cats("Control", ctrl);
Id = controlNode;
output;
keep group type id;
run;
Hi PG,
Thanks again for guiding me! The second code, with stack…, worked; I wanted the data in this shape.
As this datafile, in this case stackmatches 2, contains only IDs and no other information. For analysis when I run SAS code, to which datafile should I apply this code, stachmatches 2 does not have information on other variables.
To get a full dataset, ready for analysis, try:
data stackedMatches2;
length type $8;
label group="Group" type="Type" id="Id" ctrlId="Control Id";
set bestMatches;
caseId = floor(sampleNode);
if caseId ne lag(caseId) then do;
ctrlId = .;
group + 1;
type = "Case";
Id = caseId;
output;
end;
ctrlId + 1;
type = "Control";
Id = controlNode;
output;
keep group type ctrlId id;
run;
proc sql;
create table allMatches as
select a.group, a.type, a.ctrlId, b.*
from stackedMatches2 as a inner join sample as b on a.id=b.id
union all corresponding
select a.group, a.type, a.ctrlId, b.*
from stackedMatches2 as a inner join controls as b on a.id=b.id
order by group, type, ctrlId;
quit;
Dear PGstats,
Once again thanks a lot!
I was able to get data with all variables and could perform analysis.
As I shared with you this is not my real dataset; I will be getting my dataset soon and write codes for my real dataset and see if there is any problem will let you know. I hope you won't mind.
Regards
Best of luck!
Hi PGstats,
I am writing this after exploring the web/SAS pages to find out answer but could not.
How the following figures, such as 7 and 5 as demonitors and format are determined? As I plan to add my another matching variable, the date of blood withdrawal, with a matching range of 2 weeks, I need this information for completing the code in addition to my knowledge.
euclid((c.age-s.age)/7, (c.edu-s.edu)/5) as distance format=5.2.
Thanks
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.