DATA Step, Macro, Functions and more

Merging two datasets

Accepted Solution Solved
Reply
Contributor
Posts: 29
Accepted Solution

Merging two datasets

Hi

 

Thank you for all the great advise in this group.

 

I have two datasets that I want to merge, however i'm merging on dates and closest proximity to dates so this is causing me some problems.

 

Data

 

A1

 

data haveA1;

input  Record_id  admission_number  (Admission_date Discharge_date date_of_diagnosis) (:ddmmyy10.)  diagnosis $;format Admission_date Discharge_date date_of_diagnosis ddmmyyd10. numbermeet;

 

datalines;

1 1 01-01-2010 01-02-2010 02-01-2010 df200 1

1 2 03-03-2010 31-03-2010 04-03-2010 df147 1

2 1 03-04-2010 31-08-2010 04-03-2010 df201 1 ;

 

run;

 

Data A2

 

data haveA1;

input  Record_id date_of_ques ddmmyy10.  numberquest var1 var2 var3;

 

datalines;

1 07-01-2010 1 A B D

1 09-01-2010 2 F D G

1 09-02-2010 3 A D G

1 09-03-2010 4 F D G

1 09-03-2017 5 X T V

2 07-02-2009 1 C R V

2 03-09-2010 2 C T V

;

run;

 

I would like to use the following criteria for merging:

merge on record_id;

merge data A2 on closest  admission in data A1 based on admission_date and discharge_date in A1 and date_of_ques in A2.

The date_of_ques has to be after admission_date

The date of questionnaire (date_of_ques) can maximum be 365 days from discharge_date, but can not be within next admission for same person (record_id).

 

Want:

 

1 1 01-01-2010 01-02-2010 02-01-2010 df200 1  07-01-2010 1 A B D

1 1 01-01-2010 01-02-2010 02-01-2010 df200 1  09-01-2010 2 F D G

1 1 01-01-2010 01-02-2010 02-01-2010 df200 1  09-02-2010 3 A D G

1 2 03-03-2010 31-03-2010 04-03-2010 df147 1  09-03-2010 4 F D G

2 1 03-04-2010 31-08-2010 04-03-2010 df201 1 03-09-2010 2 C T V ;

 

I hope this makes sence or let me know.

 

Regards

 

Solvej

 

 


Accepted Solutions
Solution
a week ago
PROC Star
Posts: 2,231

Re: Merging two datasets

[ Edited ]

1. It would be good if you tested your program before providing it. It cannot run as supplied.

2. Post code using {i}

3. This does some of the work.

   Not too sure why 1.2 matches only one record in your result. 

   You can probably extrapolate and add to my code to get rid of the unwanted matches.

data have1;
input  Record_id  admission_number (Admission_date Discharge_date date_of_diagnosis) (:ddmmyy10.)  diagnosis $;
format Admission_date Discharge_date date_of_diagnosis ddmmyyd10. ;
cards;
1 1 01-01-2010 01-02-2010 02-01-2010 df200 1
1 2 03-03-2010 31-03-2010 04-03-2010 df147 1
2 1 03-04-2010 31-08-2010 04-03-2010 df201 1
; 
run;

data have2;
input  Record_id date_of_ques ddmmyy10.  numberquest var1 $ var2 $ var3 $;
format date_of_ques ddmmyyd10. ;
datalines;
1 07-01-2010 1 A B D
1 09-01-2010 2 F D G
1 09-02-2010 3 A D G
1 09-03-2010 4 F D G
1 09-03-2017 5 X T V
2 07-02-2009 1 C R V
2 03-09-2010 2 C T V
;
run;
proc sql;
  select have1.*, have2.*
  from have1 
        left join 
      have2
        on  have1.Record_id = have2.Record_id 
        and date_of_ques - Discharge_date between -365 and 365
        left join 
      have1 next 
        on  have1.Record_id          = next.Record_id 
        and have1.admission_number+1 = next.admission_number 
        and next.Admission_date      < have2.date_of_ques 
  where next.Record_id is null     
  order by record_id, admission_number, numberquest                         
;

 

Record_id admission_number Admission_date Discharge_date date_of_diagnosis diagnosis Record_id date_of_ques numberquest var1 var2 var3
1 1 01-01-2010 01-02-2010 02-01-2010 df200 1 07-01-2010 1 A B D
1 1 01-01-2010 01-02-2010 02-01-2010 df200 1 09-01-2010 2 F D G
1 1 01-01-2010 01-02-2010 02-01-2010 df200 1 09-02-2010 3 A D G
1 2 03-03-2010 31-03-2010 04-03-2010 df147 1 07-01-2010 1 A B D
1 2 03-03-2010 31-03-2010 04-03-2010 df147 1 09-01-2010 2 F D G
1 2 03-03-2010 31-03-2010 04-03-2010 df147 1 09-02-2010 3 A D G
1 2 03-03-2010 31-03-2010 04-03-2010 df147 1 09-03-2010 4 F D G
2 1 03-04-2010 31-08-2010 04-03-2010 df201 2 03-09-2010 2 C T V

 

 

View solution in original post


All Replies
Solution
a week ago
PROC Star
Posts: 2,231

Re: Merging two datasets

[ Edited ]

1. It would be good if you tested your program before providing it. It cannot run as supplied.

2. Post code using {i}

3. This does some of the work.

   Not too sure why 1.2 matches only one record in your result. 

   You can probably extrapolate and add to my code to get rid of the unwanted matches.

data have1;
input  Record_id  admission_number (Admission_date Discharge_date date_of_diagnosis) (:ddmmyy10.)  diagnosis $;
format Admission_date Discharge_date date_of_diagnosis ddmmyyd10. ;
cards;
1 1 01-01-2010 01-02-2010 02-01-2010 df200 1
1 2 03-03-2010 31-03-2010 04-03-2010 df147 1
2 1 03-04-2010 31-08-2010 04-03-2010 df201 1
; 
run;

data have2;
input  Record_id date_of_ques ddmmyy10.  numberquest var1 $ var2 $ var3 $;
format date_of_ques ddmmyyd10. ;
datalines;
1 07-01-2010 1 A B D
1 09-01-2010 2 F D G
1 09-02-2010 3 A D G
1 09-03-2010 4 F D G
1 09-03-2017 5 X T V
2 07-02-2009 1 C R V
2 03-09-2010 2 C T V
;
run;
proc sql;
  select have1.*, have2.*
  from have1 
        left join 
      have2
        on  have1.Record_id = have2.Record_id 
        and date_of_ques - Discharge_date between -365 and 365
        left join 
      have1 next 
        on  have1.Record_id          = next.Record_id 
        and have1.admission_number+1 = next.admission_number 
        and next.Admission_date      < have2.date_of_ques 
  where next.Record_id is null     
  order by record_id, admission_number, numberquest                         
;

 

Record_id admission_number Admission_date Discharge_date date_of_diagnosis diagnosis Record_id date_of_ques numberquest var1 var2 var3
1 1 01-01-2010 01-02-2010 02-01-2010 df200 1 07-01-2010 1 A B D
1 1 01-01-2010 01-02-2010 02-01-2010 df200 1 09-01-2010 2 F D G
1 1 01-01-2010 01-02-2010 02-01-2010 df200 1 09-02-2010 3 A D G
1 2 03-03-2010 31-03-2010 04-03-2010 df147 1 07-01-2010 1 A B D
1 2 03-03-2010 31-03-2010 04-03-2010 df147 1 09-01-2010 2 F D G
1 2 03-03-2010 31-03-2010 04-03-2010 df147 1 09-02-2010 3 A D G
1 2 03-03-2010 31-03-2010 04-03-2010 df147 1 09-03-2010 4 F D G
2 1 03-04-2010 31-08-2010 04-03-2010 df201 2 03-09-2010 2 C T V

 

 

Contributor
Posts: 29

Re: Merging two datasets

Thank you so much ChrisNZ.

 

I have tweaked the code a bit in order to only include questionnaires that were filled in after admission date. See the complete code below. I have only changed this line

 

and date_of_ques between admission_date and Discharge_date + 365 .

 

It seems to be working however I get this message when running the code. Should I be concerned?

 

WARNING: Column named record_id is duplicated in a select expression (or a view). Explicit

references to it will be to the first one.

WARNING: Variable Record_id already exists on file WORK.WANT.

 

Complete code:

{

proc sql;

create table want as

select have1.*, have2.*

from have1

left join

 

have2

on have1.Record_id = have2.Record_id

and date_of_ques between admission_date and Discharge_date + 365

left join

 

have1 next

on have1.Record_id = next.Record_id

and have1.admission_number+1 = next.admission_number

and next.Admission_date < have2.date_of_ques

where next.Record_id is null

order by record_id, admission_number, numberquest

 

;

quit; }

 

PROC Star
Posts: 2,231

Re: Merging two datasets

[ Edited ]

Just read the message, it's very clear.

You run 

select have1.*, have2.*

so extract 2 variables with the same name from each of the tables. 

Rewrite the select statement to only select the columns you want.

Contributor
Posts: 29

Re: Merging two datasets

HI ChrisNZ

I have a minor problem with the code.

If the first part regarding a maximum of 365 days from dicharge date (admission 1) to questionnaire date is correct but it turns out that the questionnaire is linked with an admission which is after admission one and this specific admission is within the 365 days. Hence this part is not satisfied

{i}
and next.Admission_date < have2.date_of_ques
{i}

then I loose admission 1 in the output data . Can this be fixed?

I hope this makes sence

Kind regards Solvej
PROC Star
Posts: 2,231

Re: Merging two datasets

I do not understand your requirement.

 

Know that you can use this syntax if the variables contain SAS dates:

 

and next.Admission_date + 1 < have2.date_of_ques

to shift by one day.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 148 views
  • 1 like
  • 2 in conversation