BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lillymaginta
Lapis Lazuli | Level 10
data a;
  input patientid  dg_date mmddyy10. age sex;
datalines;
1 5/5/2009 18 1
2  1/2/2007 60 1
3  2/2/2004 70 0
;
run;
 

 

data b;
  input patientid  date2 mmddyy10. age sex;
datalines;
4 5/5/2009 18 1
4 10/6/2009 18 1
5  1/2/2007 60 1
5 1/1/2007 60 1
6  2/2/2004 79 0
6 3/3/2004  79 0
6  1/1/2004  79 0
;
run;
 

I have two data a and b. Data a has the diagnosis date, I want to match data a to data b by selecting ids which match on age, sex, and have any date "date2" that matches exactly the dg_date.

output (1 case and 0 control) 

 

id label

1  1

2  1

4  0

5  0

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Yes. I could. Just add one more PROC SORT at bottom of code.

 

data a;
  input patientid  dg_date : mmddyy10. age sex;
  format  dg_date  mmddyy10.;
datalines;
1 5/5/2009 18 1
2  1/2/2007 60 1
3  2/2/2004 70 0
;
run;

data b;
  input patientid  date2 : mmddyy10. age sex;
  format date2  mmddyy10.;
datalines;
4 5/5/2009 18 1
4 10/6/2009 18 1
5  1/2/2007 60 1
5 1/1/2007 60 1
6  2/2/2004 79 0
6 3/3/2004  79 0
6  1/1/2004  79 0
7 5/5/2009 18 1
7 10/6/2009 18 1
8  1/2/2007 60 1
8 1/1/2007 60 1
9  2/2/2004 79 0
9 3/3/2004  79 0
9  1/1/2004  79 0
10 5/5/2009 18 1
10 10/6/2009 18 1
11  1/2/2007 60 1
11 1/1/2007 60 1
12  2/2/2004 79 0
12 3/3/2004  79 0
12  1/1/2004  79 0
13 5/5/2009 18 1
13 10/6/2009 18 1
14  1/2/2007 60 1
14 1/1/2007 60 1
15  2/2/2004 79 0
15 3/3/2004  79 0
15  1/1/2004  79 0
;
run;

proc sort data=a out=temp_a;
by  dg_date  age sex;
run;
proc sort data=b out=temp_b;
by  date2  age sex;
run;
data temp;
 merge temp_a(in=ina) 
 temp_b(rename=(date2=dg_date patientid=_patientid) in=inb);
 by  dg_date  age sex;
 if ina and inb;
run;
data want;
 set temp;
 id=patientid;label=1;output;
 id=_patientid;label=0;output;
 keep id label;
run;

proc sort data=want out=final_want nodupkey;
by id label ;
run;

View solution in original post

11 REPLIES 11
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, if I understand you correctly:

proc sql;
  create table want as
  select  a.id,
          case when b.patientid ne . then 1 else 0 end as label
  from    a a
  left join b b
  on      a.age=b.age
  and     a.sex=b.sex
  and     a.dg_date=date2;
quit;
lillymaginta
Lapis Lazuli | Level 10

Thank you RW9. Although the specification you included seems correct, the output is not correct so it includes only ids from data a. 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Ok, you would need one for each way then:

proc sql;
  create table want as
  select  a.id,
          case when b.patientid ne . then 1 else 0 end as label
  from    a a
  left join b b
  on      a.age=b.age
  and     a.sex=b.sex
  and     a.dg_date=date2
union all
select a.id,
case when b.patientid ne . then 1 else 0 end as label
from b a
left join a b
on a.age=b.age
and a.sex=b.sex
and a.date2=b.dg_date
;
quit;
lillymaginta
Lapis Lazuli | Level 10

thank you for the prompt response but the output is still not correct. id's number 3 and 6 should not be included in the dataset and the output listed id 5 once as 1 (case) and one as 0 (control), it should be only a control (0). 

Ksharp
Super User
data a;
  input patientid  dg_date : mmddyy10. age sex;
  format  dg_date  mmddyy10.;
datalines;
1 5/5/2009 18 1
2  1/2/2007 60 1
3  2/2/2004 70 0
;
run;
data b;
  input patientid  date2 : mmddyy10. age sex;
  format date2  mmddyy10.;
datalines;
4 5/5/2009 18 1
4 10/6/2009 18 1
5  1/2/2007 60 1
5 1/1/2007 60 1
6  2/2/2004 79 0
6 3/3/2004  79 0
6  1/1/2004  79 0
;
run;

proc sort data=a out=temp_a;
by  dg_date  age sex;
run;
proc sort data=b out=temp_b;
by  date2  age sex;
run;
data temp;
 merge temp_a(in=ina) 
 temp_b(rename=(date2=dg_date patientid=_patientid) in=inb);
 by  dg_date  age sex;
 if ina and inb;
run;
data want;
 set temp;
 id=patientid;label=1;output;
 id=_patientid;label=0;output;
 keep id label;
run;
lillymaginta
Lapis Lazuli | Level 10

Perfect Ksharp! I have a large pool of controls, so would it be possible to use the same code to match 1 case to more than one control (e.g. 16)? 

 

Ksharp
Super User

Technically , it could. 

Post your data and output to explain your question more details.

lillymaginta
Lapis Lazuli | Level 10
data a;
  input patientid  dg_date : mmddyy10. age sex;
  format  dg_date  mmddyy10.;
datalines;
1 5/5/2009 18 1
2  1/2/2007 60 1
3  2/2/2004 70 0
;
run;

data b;
  input patientid  date2 : mmddyy10. age sex;
  format date2  mmddyy10.;
datalines;
4 5/5/2009 18 1
4 10/6/2009 18 1
5  1/2/2007 60 1
5 1/1/2007 60 1
6  2/2/2004 79 0
6 3/3/2004  79 0
6  1/1/2004  79 0
7 5/5/2009 18 1
7 10/6/2009 18 1
8  1/2/2007 60 1
8 1/1/2007 60 1
9  2/2/2004 79 0
9 3/3/2004  79 0
9  1/1/2004  79 0
10 5/5/2009 18 1
10 10/6/2009 18 1
11  1/2/2007 60 1
11 1/1/2007 60 1
12  2/2/2004 79 0
12 3/3/2004  79 0
12  1/1/2004  79 0
13 5/5/2009 18 1
13 10/6/2009 18 1
14  1/2/2007 60 1
14 1/1/2007 60 1
15  2/2/2004 79 0
15 3/3/2004  79 0
15  1/1/2004  79 0
;
run;

output

id label

1  1

2  1

4  0

5  0

7 0

8 0

10 0

11 0

13 0

14 0

 

Just to avoid lengthy data, in this data, each case from a would match to 4 controls from b 

Ksharp
Super User

Yes. I could. Just add one more PROC SORT at bottom of code.

 

data a;
  input patientid  dg_date : mmddyy10. age sex;
  format  dg_date  mmddyy10.;
datalines;
1 5/5/2009 18 1
2  1/2/2007 60 1
3  2/2/2004 70 0
;
run;

data b;
  input patientid  date2 : mmddyy10. age sex;
  format date2  mmddyy10.;
datalines;
4 5/5/2009 18 1
4 10/6/2009 18 1
5  1/2/2007 60 1
5 1/1/2007 60 1
6  2/2/2004 79 0
6 3/3/2004  79 0
6  1/1/2004  79 0
7 5/5/2009 18 1
7 10/6/2009 18 1
8  1/2/2007 60 1
8 1/1/2007 60 1
9  2/2/2004 79 0
9 3/3/2004  79 0
9  1/1/2004  79 0
10 5/5/2009 18 1
10 10/6/2009 18 1
11  1/2/2007 60 1
11 1/1/2007 60 1
12  2/2/2004 79 0
12 3/3/2004  79 0
12  1/1/2004  79 0
13 5/5/2009 18 1
13 10/6/2009 18 1
14  1/2/2007 60 1
14 1/1/2007 60 1
15  2/2/2004 79 0
15 3/3/2004  79 0
15  1/1/2004  79 0
;
run;

proc sort data=a out=temp_a;
by  dg_date  age sex;
run;
proc sort data=b out=temp_b;
by  date2  age sex;
run;
data temp;
 merge temp_a(in=ina) 
 temp_b(rename=(date2=dg_date patientid=_patientid) in=inb);
 by  dg_date  age sex;
 if ina and inb;
run;
data want;
 set temp;
 id=patientid;label=1;output;
 id=_patientid;label=0;output;
 keep id label;
run;

proc sort data=want out=final_want nodupkey;
by id label ;
run;
sms1891
Quartz | Level 8
Hi Ksharp,
I am looking for exactly the same (matching controls for cases based on Age, Sex and Date) but I did not follow the code which picks ups 3 controls for every case (randomly). How to achieve this random selection? Can you please provide the sas code using the same variables in this example?
Thanks,
Sat
Ksharp
Super User
Sorry. I can't understand your question. if you want randomly select obs ,
could try PROC SURVEYSELECT .

Better start a new topic to let more sas user to discuss it .

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1668 views
  • 2 likes
  • 4 in conversation