BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wajmsu
Obsidian | Level 7

Hi,

I want to match cases with controls on these variables: age (± 3 years), edu (± 2), race, parity (0,≥1) and date of blood sample (dobw).  I am providing sample data as my original data are still not ready.  I also could not enter variable dobw. 

 

If possible, please give some guidance on entering in SAS correctly.

 

data sample;
input id age edu race parity wt ht;
datalines;
05 32 9 1 0 90 5.2
12 35 12 1 2 110 5.0
17 40 10 2 1 130 4.9
22 25 14 3 0 110 5.4
29 27 12 2 1 95 5.0
40 28 10 3 2 130 6
;
data controls;
input id age edu race parity wt ht;
datalines;
10 33 11 2 1 98 5.7
11 35 10 2 2 120 5.9
19 42 11 1 3 139 5.9
26 26 14 0 1 99 5.0
31 27 10 2 1 95 5.5
32 29 12 2 2 125 6.2
37 31 10 2 1 190 7.2
43 35 12 1 1 110 5.0
46 45 10 1 1 130 4.9
51 28 12 3 0 110 5.4
55 22 10 1 1 95 5.0
58 28 10 3 2 130 6
70 32 9 1 0 90 5.2
79 37 11 1 2 110 5.0
85 48 14 2 1 130 4.9
88 29 14 3 0 110 5.4
90 25 14 2 1 95 5.0
93 27 11 3 2 130 6
99 35 10 1 0 90 5.2
101 33 11 1 2 110 5.0
114 38 16 2 1 130 4.9
125 25 13 3 0 110 5.4
129 25 10 2 1 95 5.0
130 24 11 3 2 130 6
;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
LinusH
Tourmaline | Level 20
Attaching a sample of desired output based on your inputs helps describing your requirement.
SQL join with between-and criteria is one way to handle matching on ranges.
Data never sleeps

View solution in original post

25 REPLIES 25
LinusH
Tourmaline | Level 20
Attaching a sample of desired output based on your inputs helps describing your requirement.
SQL join with between-and criteria is one way to handle matching on ranges.
Data never sleeps
wajmsu
Obsidian | Level 7

Hi,
I could not find the attachment.  May be I am missing; I am a new member and don't know much about the communication.

 

 

wajmsu
Obsidian | Level 7

Hi,

 

I accidently clicked the 'accept the solution' tab; I have not been able to access the solution on this page. I will appreciate if I could access the solution.

Thanks

Abdul

PGStats
Opal | Level 21

If you have a licence to SAS/OR, you can easily find an optimal assignment with proc optnet, even with many controls per case:

 

data sample;
input id age edu race parity wt ht;
datalines;
05 32 9 1 0 90 5.2
12 35 12 1 2 110 5.0
17 40 10 2 1 130 4.9
22 25 14 3 0 110 5.4
29 27 12 2 1 95 5.0
40 28 10 3 2 130 6
;
data controls;
input id age edu race parity wt ht;
datalines;
10 33 11 2 1 98 5.7
11 35 10 2 2 120 5.9
19 42 11 1 3 139 5.9
26 26 14 0 1 99 5.0
31 27 10 2 1 95 5.5
32 29 12 2 2 125 6.2
37 31 10 2 1 190 7.2
43 35 12 1 1 110 5.0
46 45 10 1 1 130 4.9
51 28 12 3 0 110 5.4
55 22 10 1 1 95 5.0
58 28 10 3 2 130 6
70 32 9 1 0 90 5.2
79 37 11 1 2 110 5.0
85 48 14 2 1 130 4.9
88 29 14 3 0 110 5.4
90 25 14 2 1 95 5.0
93 27 11 3 2 130 6
99 35 10 1 0 90 5.2
101 33 11 1 2 110 5.0
114 38 16 2 1 130 4.9
125 25 13 3 0 110 5.4
129 25 10 2 1 95 5.0
130 24 11 3 2 130 6
;

/* Find all possible matches */
proc sql;
create table matches as
select 
    s.id as sampleId, 
    c.id as controlId,
    euclid((c.age-s.age)/7, (c.edu-s.edu)/5) as distance format=5.2
from 
    sample as s left join 
    controls as c
    on  c.age between s.age-3 and s.age+3 and
        c.edu between s.edu-2 and s.edu+2 and
        c.race = s.race and
        (c.parity=0) = (s.parity=0)
order by sampleId, distance;
select * from matches;
quit;

/* How many controls per sample ? */
%let controlsPerSample=2;

/* Prepare data for linear assingment algorithm */
data links;
set matches;
where not missing(controlId);
controlNode = controlId;
do i = 1 to min(99, &controlsPerSample);
    sampleNode = sampleId + i/100;
    output;
    end;
format sampleNode 10.2;
keep sampleNode controlNode distance;
run;

/* Perform best linear assignment algorithm */
proc optnet data_links=links graph_direction=directed;
data_links_var from=sampleNode to=controlNode weight=distance;
linear_assignment out=bestMatches;
run;

proc sql;
select 
    floor(sampleNode) as sampleId,
    controlNode as controlId
from bestMatches;
quit;
PG
wajmsu
Obsidian | Level 7

Dear PG,

Excellent!  I am very much grateful to you for your time and effort!

 

Here is the output data.  There was only one problem, it did not match with no 17 sample; may be there was no match.

Can you write me the code for the date of blood sample withdrawn (dobw)?  I have to receive this information from our data management unit, I ll be bit flexible and take it in upto two weeks±.  Also, is there any simple way of entering date in sas.

Best regards

The SAS System

 

sampleId controlsId
570
599
1243
12101
22125
2251
2932
2931
4058
4093
PGStats
Opal | Level 21

Glad you could run my code. I hope you can adapt it to your precise requirements. Revise the distance calculation to make sure it weights the matching criteria the way you want. I guess you will want to include the dobw in the distance calculation when it becomes available.

 

You are right, there was no match for case 17. 

 

SAS can handle a great variety of date formats. Something like 2016-02-25 will do just fine.

PG
wajmsu
Obsidian | Level 7

Hi PG,

 

I can't run SAS codes for analysis.  The final data in the form of matched set contains only the IDs of both cases and controls. Can yo uguide me how to bring all the variables alongwith IDs in the final matchd file?

 

Thanks

 

wajmsu
Obsidian | Level 7

Hi PG,

 

How can I stack each sample with its two matched controls?

 

Thanks

PGStats
Opal | Level 21

Not sure what you mean by stacking, but here are two possibilities

 

/* Stacked matches are either ... */
proc sql;
create table finalMatches as
select 
    floor(sampleNode) as caseId "Case Id",
    controlNode as controlId
from bestMatches;
quit;

proc transpose data=finalMatches out=stackedMatches1(drop=_name_) prefix=control; 
by caseId;
var controlId;
run;

/* ... Or */
data stackedMatches2;
length type $8;
label group="Group" type="Type" id="Id";
set bestMatches;
caseId = floor(sampleNode);
if caseId ne lag(caseId) then do;
    ctrl = 0;
    group + 1;
    type = "Case";
    Id = caseId;
    output;
    end;
ctrl + 1;
type = cats("Control", ctrl);
Id = controlNode;
output;
keep group type id;
run;
PG
wajmsu
Obsidian | Level 7

Hi PG,

Thanks again for guiding me! The second code, with stack…, worked; I wanted the data in this shape.

 

As this datafile, in this case stackmatches 2, contains only IDs and no other information. For analysis when I run SAS code, to which datafile should I apply this code, stachmatches 2 does not have information on other variables.

 

PGStats
Opal | Level 21

To get a full dataset, ready for analysis, try:

 

data stackedMatches2;
length type $8;
label group="Group" type="Type" id="Id" ctrlId="Control Id";
set bestMatches;
caseId = floor(sampleNode);
if caseId ne lag(caseId) then do;
    ctrlId = .;
    group + 1;
    type = "Case";
    Id = caseId;
    output;
    end;
ctrlId + 1;
type = "Control";
Id = controlNode;
output;
keep group type ctrlId id;
run;

proc sql;
create table allMatches as
select a.group, a.type, a.ctrlId, b.*
from stackedMatches2 as a inner join sample as b on a.id=b.id
union all corresponding
select a.group, a.type, a.ctrlId, b.*
from stackedMatches2 as a inner join controls as b on a.id=b.id
order by group, type, ctrlId;
quit;
PG
wajmsu
Obsidian | Level 7

Dear PGstats,

Once again thanks a lot!

I was able to get data with all variables and could perform analysis.

As I shared with you this is not my real dataset; I will be getting my dataset soon and write codes for my real dataset and see if there is any problem will let you know.  I hope you won't mind.

 

Regards

PGStats
Opal | Level 21

Best of luck!

PG
wajmsu
Obsidian | Level 7

Hi PGstats,

 

I am writing this after exploring the web/SAS pages to find out answer but could not.

 

How the following figures, such as 7 and 5 as demonitors and format are determined? As I plan to add my another matching variable, the date of blood withdrawal, with a matching range of 2 weeks, I need this information for completing the code in addition to my knowledge.

 

euclid((c.age-s.age)/7, (c.edu-s.edu)/5) as distance format=5.2.

Thanks

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 25 replies
  • 3533 views
  • 4 likes
  • 3 in conversation