BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Solvej
Obsidian | Level 7

Hi

 

Thank you for all the great advise in this group.

 

I have two datasets that I want to merge, however i'm merging on dates and closest proximity to dates so this is causing me some problems.

 

Data

 

A1

 

data haveA1;

input  Record_id  admission_number  (Admission_date Discharge_date date_of_diagnosis) (:ddmmyy10.)  diagnosis $;format Admission_date Discharge_date date_of_diagnosis ddmmyyd10. numbermeet;

 

datalines;

1 1 01-01-2010 01-02-2010 02-01-2010 df200 1

1 2 03-03-2010 31-03-2010 04-03-2010 df147 1

2 1 03-04-2010 31-08-2010 04-03-2010 df201 1 ;

 

run;

 

Data A2

 

data haveA1;

input  Record_id date_of_ques ddmmyy10.  numberquest var1 var2 var3;

 

datalines;

1 07-01-2010 1 A B D

1 09-01-2010 2 F D G

1 09-02-2010 3 A D G

1 09-03-2010 4 F D G

1 09-03-2017 5 X T V

2 07-02-2009 1 C R V

2 03-09-2010 2 C T V

;

run;

 

I would like to use the following criteria for merging:

merge on record_id;

merge data A2 on closest  admission in data A1 based on admission_date and discharge_date in A1 and date_of_ques in A2.

The date_of_ques has to be after admission_date

The date of questionnaire (date_of_ques) can maximum be 365 days from discharge_date, but can not be within next admission for same person (record_id).

 

Want:

 

1 1 01-01-2010 01-02-2010 02-01-2010 df200 1  07-01-2010 1 A B D

1 1 01-01-2010 01-02-2010 02-01-2010 df200 1  09-01-2010 2 F D G

1 1 01-01-2010 01-02-2010 02-01-2010 df200 1  09-02-2010 3 A D G

1 2 03-03-2010 31-03-2010 04-03-2010 df147 1  09-03-2010 4 F D G

2 1 03-04-2010 31-08-2010 04-03-2010 df201 1 03-09-2010 2 C T V ;

 

I hope this makes sence or let me know.

 

Regards

 

Solvej

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

1. It would be good if you tested your program before providing it. It cannot run as supplied.

2. Post code using {i}

3. This does some of the work.

   Not too sure why 1.2 matches only one record in your result. 

   You can probably extrapolate and add to my code to get rid of the unwanted matches.

data have1;
input  Record_id  admission_number (Admission_date Discharge_date date_of_diagnosis) (:ddmmyy10.)  diagnosis $;
format Admission_date Discharge_date date_of_diagnosis ddmmyyd10. ;
cards;
1 1 01-01-2010 01-02-2010 02-01-2010 df200 1
1 2 03-03-2010 31-03-2010 04-03-2010 df147 1
2 1 03-04-2010 31-08-2010 04-03-2010 df201 1
; 
run;

data have2;
input  Record_id date_of_ques ddmmyy10.  numberquest var1 $ var2 $ var3 $;
format date_of_ques ddmmyyd10. ;
datalines;
1 07-01-2010 1 A B D
1 09-01-2010 2 F D G
1 09-02-2010 3 A D G
1 09-03-2010 4 F D G
1 09-03-2017 5 X T V
2 07-02-2009 1 C R V
2 03-09-2010 2 C T V
;
run;
proc sql;
  select have1.*, have2.*
  from have1 
        left join 
      have2
        on  have1.Record_id = have2.Record_id 
        and date_of_ques - Discharge_date between -365 and 365
        left join 
      have1 next 
        on  have1.Record_id          = next.Record_id 
        and have1.admission_number+1 = next.admission_number 
        and next.Admission_date      < have2.date_of_ques 
  where next.Record_id is null     
  order by record_id, admission_number, numberquest                         
;

 

Record_id admission_number Admission_date Discharge_date date_of_diagnosis diagnosis Record_id date_of_ques numberquest var1 var2 var3
1 1 01-01-2010 01-02-2010 02-01-2010 df200 1 07-01-2010 1 A B D
1 1 01-01-2010 01-02-2010 02-01-2010 df200 1 09-01-2010 2 F D G
1 1 01-01-2010 01-02-2010 02-01-2010 df200 1 09-02-2010 3 A D G
1 2 03-03-2010 31-03-2010 04-03-2010 df147 1 07-01-2010 1 A B D
1 2 03-03-2010 31-03-2010 04-03-2010 df147 1 09-01-2010 2 F D G
1 2 03-03-2010 31-03-2010 04-03-2010 df147 1 09-02-2010 3 A D G
1 2 03-03-2010 31-03-2010 04-03-2010 df147 1 09-03-2010 4 F D G
2 1 03-04-2010 31-08-2010 04-03-2010 df201 2 03-09-2010 2 C T V

 

 

View solution in original post

5 REPLIES 5
ChrisNZ
Tourmaline | Level 20

1. It would be good if you tested your program before providing it. It cannot run as supplied.

2. Post code using {i}

3. This does some of the work.

   Not too sure why 1.2 matches only one record in your result. 

   You can probably extrapolate and add to my code to get rid of the unwanted matches.

data have1;
input  Record_id  admission_number (Admission_date Discharge_date date_of_diagnosis) (:ddmmyy10.)  diagnosis $;
format Admission_date Discharge_date date_of_diagnosis ddmmyyd10. ;
cards;
1 1 01-01-2010 01-02-2010 02-01-2010 df200 1
1 2 03-03-2010 31-03-2010 04-03-2010 df147 1
2 1 03-04-2010 31-08-2010 04-03-2010 df201 1
; 
run;

data have2;
input  Record_id date_of_ques ddmmyy10.  numberquest var1 $ var2 $ var3 $;
format date_of_ques ddmmyyd10. ;
datalines;
1 07-01-2010 1 A B D
1 09-01-2010 2 F D G
1 09-02-2010 3 A D G
1 09-03-2010 4 F D G
1 09-03-2017 5 X T V
2 07-02-2009 1 C R V
2 03-09-2010 2 C T V
;
run;
proc sql;
  select have1.*, have2.*
  from have1 
        left join 
      have2
        on  have1.Record_id = have2.Record_id 
        and date_of_ques - Discharge_date between -365 and 365
        left join 
      have1 next 
        on  have1.Record_id          = next.Record_id 
        and have1.admission_number+1 = next.admission_number 
        and next.Admission_date      < have2.date_of_ques 
  where next.Record_id is null     
  order by record_id, admission_number, numberquest                         
;

 

Record_id admission_number Admission_date Discharge_date date_of_diagnosis diagnosis Record_id date_of_ques numberquest var1 var2 var3
1 1 01-01-2010 01-02-2010 02-01-2010 df200 1 07-01-2010 1 A B D
1 1 01-01-2010 01-02-2010 02-01-2010 df200 1 09-01-2010 2 F D G
1 1 01-01-2010 01-02-2010 02-01-2010 df200 1 09-02-2010 3 A D G
1 2 03-03-2010 31-03-2010 04-03-2010 df147 1 07-01-2010 1 A B D
1 2 03-03-2010 31-03-2010 04-03-2010 df147 1 09-01-2010 2 F D G
1 2 03-03-2010 31-03-2010 04-03-2010 df147 1 09-02-2010 3 A D G
1 2 03-03-2010 31-03-2010 04-03-2010 df147 1 09-03-2010 4 F D G
2 1 03-04-2010 31-08-2010 04-03-2010 df201 2 03-09-2010 2 C T V

 

 

Solvej
Obsidian | Level 7

Thank you so much ChrisNZ.

 

I have tweaked the code a bit in order to only include questionnaires that were filled in after admission date. See the complete code below. I have only changed this line

 

and date_of_ques between admission_date and Discharge_date + 365 .

 

It seems to be working however I get this message when running the code. Should I be concerned?

 

WARNING: Column named record_id is duplicated in a select expression (or a view). Explicit

references to it will be to the first one.

WARNING: Variable Record_id already exists on file WORK.WANT.

 

Complete code:

{

proc sql;

create table want as

select have1.*, have2.*

from have1

left join

 

have2

on have1.Record_id = have2.Record_id

and date_of_ques between admission_date and Discharge_date + 365

left join

 

have1 next

on have1.Record_id = next.Record_id

and have1.admission_number+1 = next.admission_number

and next.Admission_date < have2.date_of_ques

where next.Record_id is null

order by record_id, admission_number, numberquest

 

;

quit; }

 

ChrisNZ
Tourmaline | Level 20

Just read the message, it's very clear.

You run 

select have1.*, have2.*

so extract 2 variables with the same name from each of the tables. 

Rewrite the select statement to only select the columns you want.

Solvej
Obsidian | Level 7
HI ChrisNZ

I have a minor problem with the code.

If the first part regarding a maximum of 365 days from dicharge date (admission 1) to questionnaire date is correct but it turns out that the questionnaire is linked with an admission which is after admission one and this specific admission is within the 365 days. Hence this part is not satisfied

{i}
and next.Admission_date < have2.date_of_ques
{i}

then I loose admission 1 in the output data . Can this be fixed?

I hope this makes sence

Kind regards Solvej
ChrisNZ
Tourmaline | Level 20

I do not understand your requirement.

 

Know that you can use this syntax if the variables contain SAS dates:

 

and next.Admission_date + 1 < have2.date_of_ques

to shift by one day.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 959 views
  • 1 like
  • 2 in conversation