BookmarkSubscribeRSS Feed
DhanaMalla
Calcite | Level 5

Hi I am trying to merge two datasets with visit dates. Both datasets have repeated visit date and this (below) is the message I see in log. Please help

 

DATA FIRSTQTR.MRGD_KHCMHDL;

MERGE FIRSTQTR.KHC_DATA_A FIRSTQTR.MHDL_DATA_D;

BY VISIT_DT;

RUN;

the log says;

 MERGE statements has more than one dataset with repeats of By values.

 

 

 

10 REPLIES 10
Reeza
Super User

A data step merge cannot do a many to many join which is what this would be. In the majority of cases this means you need another variable in your BY statement that would uniquely identify each match - BY some sort of ID as well? 

 

Are you sure many to many is what you want? That would mean if you have 2 records in one file and 3 records in another you're going to end up with 6 records in total.

 


@DhanaMalla wrote:

Hi I am trying to merge two datasets with visit dates. Both datasets have repeated visit date and this (below) is the message I see in log. Please help

 

DATA FIRSTQTR.MRGD_KHCMHDL;

MERGE FIRSTQTR.KHC_DATA_A FIRSTQTR.MHDL_DATA_D;

BY VISIT_DT;

RUN;

the log says;

 MERGE statements has more than one dataset with repeats of By values.

 

 

 


 

DhanaMalla
Calcite | Level 5

HI Reeza,

So then I should porbably use Visit id and  DOB variable.  One of my friends said he did it by converting the dates to sequential serial numbers and then concatonated those two numbers into one 10 digit number, and matched on that.  He did it in Epiinfo, there might be a way to do what he did in SAS, but I don't know. It would be great if you can help with that. 
Thank you so much.

Reeza
Super User

@DhanaMalla wrote:

One of my friends said he did it by converting the dates to sequential serial numbers and then concatonated those two numbers into one 10 digit number, and matched on that. 


That's the old fashioned way of doing it. You can have two variables in your BY statement. 

 

Your question and Merge question so far, neither actually show what you want as output, just have two different source data sets. If you can't show us what you want as the output, we can make guesses but that's about it.

DhanaMalla
Calcite | Level 5
Reeza,
I know I am not being very explicit about what I have and what I want. Let me try to explain as clear as I can. I did ORDER=FREQ and NLEVELS
To identify duplicates, and this is what I found out.
I have two datasets. This data set has other variables for example test type and test result. Datasets A (lab data) has 932 unique DOB, of these Of 932 DOB:
443 DOB IS REPEATED TWICE
94 IS REPEATED THREE TIMES
23 REPEATED FOURTIMES
9 REPEATED 5 TIMES
4 REPEATED SIX TIMES
2 REPEATED 7 TIMES

This data set also has other variables for example test type and test result. Data set B has 899 (clinic data) DOB, Of 899 DOB:
127 is repeated twice
37 is repeated three times
3 is repeated four times.
And both dataset has visit date which is repeated as many as 39 times meaning that date that many test were conducted.

So what I am trying to get is find out number and type of tests done, and number of test results negative or positive by merging these two data set. your help will be really appreciated.
Reeza
Super User

I understand what you have.  I don't understand what you want.

 

Take one ID, with say X records in data1 and Y records in data2 and show exactly what you want as output.

Ideally, you would also include cases such as:

  • 1 record in data1 and multiple in data2
  • multiple in data1 and 1 in data2
  • multiple in data1 and multiple in data2
  • 1/multiple in data2 and 0 in data2
  • 0 in data2 and 1/multiple in data2

These are the case types I'd check to ensure the merge was happening the way I wanted it to. 

 


@DhanaMalla wrote:
Reeza,
I know I am not being very explicit about what I have and what I want. Let me try to explain as clear as I can. I did ORDER=FREQ and NLEVELS
To identify duplicates, and this is what I found out.
I have two datasets. This data set has other variables for example test type and test result. Datasets A (lab data) has 932 unique DOB, of these Of 932 DOB:
443 DOB IS REPEATED TWICE
94 IS REPEATED THREE TIMES
23 REPEATED FOURTIMES
9 REPEATED 5 TIMES
4 REPEATED SIX TIMES
2 REPEATED 7 TIMES

This data set also has other variables for example test type and test result. Data set B has 899 (clinic data) DOB, Of 899 DOB:
127 is repeated twice
37 is repeated three times
3 is repeated four times.
And both dataset has visit date which is repeated as many as 39 times meaning that date that many test were conducted.

So what I am trying to get is find out number and type of tests done, and number of test results negative or positive by merging these two data set. your help will be really appreciated.

 

 

DhanaMalla
Calcite | Level 5

Hi Reeza,

I have 1109 visit date data in set 1 (clinic visit data)and I have 1704 visit date data in another set Lab test data). I have 899 unique patient in set 1 and 932 in set 2. Don't know how to say. So I have multiple in data1 and multiple in data2 in some case but in some case I have 1 in set one ands multiple in another case. Same person can have multiple tests on the same day and same day multiples patients get tested. I am trying to find the number of tests and test results.

Reeza
Super User

 Let's bring this back to your other question then:

 

data LAB;

input visit_dt date mmddyy10. Bdate date mmddyy10. Test  testresult $;

format date mmddyy10.;

cards;

03/13/2018  02/27/1931  1  N

02/26/2018 10/21/1944  10  P

02/13/2018  03/06/1978  1   N

02/13/2018  03/06/1978  1   N

01/26/2018  04/14/1949   1  N

02/06/2018  04/14/1949  1  P

; RUN;

 

data CLINIC;

input visit_dt Bdate date clinic $ diag;

format date mmddyy10.;

cards;

03/13/2018  02/27/1931  mmc 200

01/19/2018 10/06/1934  mmc  300

02/13/2018  03/06/1978  mmc  300

02/13/2018  03/06/1978  mmc  300

01/26/2018  04/14/1949   mmc 200

02/06/2018  04/14/1949  mmc  300

;

RUN;

 

Let's look at the coloured lines above which are the same person (assumption). What EXACTLY do you want as your final output?

 


@DhanaMalla wrote:

Hi Reeza,

Don't know how to say. So I have multiple in data1 and multiple in data2 in some case but in some case I have 1 in set one ands multiple in another case. Same person can have multiple tests on the same day and same day multiples patients get tested. I am trying to find the number of tests and test results.

 

If you don't know, there's no way I can know, especially from the outside looking in. This is a logical decision, not a programming problem per se. 

 

If your final requirement is # of tests and results then you may have different options that don't require merging, unless you need to link the lab test to the results directly and if you do, this is a horrible method to do it and I'd expect your system to have identifiers that link the two - whether or not you have access to that is a different story, but they would have to exist somewhere or you'd have a massive data integrity problem.


 

DhanaMalla
Calcite | Level 5

Hi Reeza,

Thank you for still being with me. Hope you will be with me until I am able to solve this problem.  So I guess this is what I want

03/13/2018  02/27/1931  1  N mmc 200

03/13/2018  02/27/1931  1  N mmc 200

01/19/2018 10/06/1934  mmc  300

02/26/2018 10/21/1944  10  P

02/13/2018  03/06/1978  1   N mmc  300

02/13/2018  03/06/1978  1   N mmc  300

01/26/2018  04/14/1949   1  N mmc 200

02/06/2018  04/14/1949  1  P mmc  300

I want everything for the lab data with added info from clinic if available.

 

Reeza,

May be this might give you an idea better.

 

Lab

Clinic

A

A

B

B

C

C

C

 

D

D

D

D

E

E

F

F

G

G

G

 

H

MISSING

I

I

J

J

 

J

 

J

In the table above, person A & B visit clinic and get tested so I have info in both data sets. Person C visited clinic once, but got two tests done. Person D visited clinic twice, tested twice. Person E & F visited clinic and got tested. Person G visited clinic once and tested twice or had two different tests done. Person H is not in clinic data set. Person J visited clinic four times but only had two tests done. This is the pattern of the two datasets I am trying to merge. Your help will be appreciated.

Reeza
Super User

For each of your data sets go through and create a sequence number. I'm assuming you also have some sort of ID variable?

https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/

 

Once you have your sequence number you can merge using a "BY ID testdate Sequence" 

 


@DhanaMalla wrote:

Hi Reeza,

Thank you for still being with me. Hope you will be with me until I am able to solve this problem.  So I guess this is what I want

03/13/2018  02/27/1931  1  N mmc 200

03/13/2018  02/27/1931  1  N mmc 200

01/19/2018 10/06/1934  mmc  300

02/26/2018 10/21/1944  10  P

02/13/2018  03/06/1978  1   N mmc  300

02/13/2018  03/06/1978  1   N mmc  300

01/26/2018  04/14/1949   1  N mmc 200

02/06/2018  04/14/1949  1  P mmc  300

I want everything for the lab data with added info from clinic if available.

 

Reeza,

May be this might give you an idea better.

 

Lab

Clinic

A

A

B

B

C

C

C

 

D

D

D

D

E

E

F

F

G

G

G

 

H

MISSING

I

I

J

J

 

J

 

J

In the table above, person A & B visit clinic and get tested so I have info in both data sets. Person C visited clinic once, but got two tests done. Person D visited clinic twice, tested twice. Person E & F visited clinic and got tested. Person G visited clinic once and tested twice or had two different tests done. Person H is not in clinic data set. Person J visited clinic four times but only had two tests done. This is the pattern of the two datasets I am trying to merge. Your help will be appreciated.


 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1223 views
  • 0 likes
  • 2 in conversation