Hi I am trying to merge two datasets with visit dates. Both datasets have repeated visit date and this (below) is the message I see in log. Please help
DATA FIRSTQTR.MRGD_KHCMHDL;
MERGE FIRSTQTR.KHC_DATA_A FIRSTQTR.MHDL_DATA_D;
BY VISIT_DT;
RUN;
the log says;
MERGE statements has more than one dataset with repeats of By values.
A data step merge cannot do a many to many join which is what this would be. In the majority of cases this means you need another variable in your BY statement that would uniquely identify each match - BY some sort of ID as well?
Are you sure many to many is what you want? That would mean if you have 2 records in one file and 3 records in another you're going to end up with 6 records in total.
@DhanaMalla wrote:
Hi I am trying to merge two datasets with visit dates. Both datasets have repeated visit date and this (below) is the message I see in log. Please help
DATA FIRSTQTR.MRGD_KHCMHDL;
MERGE FIRSTQTR.KHC_DATA_A FIRSTQTR.MHDL_DATA_D;
BY VISIT_DT;
RUN;
the log says;
MERGE statements has more than one dataset with repeats of By values.
HI Reeza,
So then I should porbably use Visit id and DOB variable. One of my friends said he did it by converting the dates to sequential serial numbers and then concatonated those two numbers into one 10 digit number, and matched on that. He did it in Epiinfo, there might be a way to do what he did in SAS, but I don't know. It would be great if you can help with that.
Thank you so much.
@DhanaMalla wrote:
One of my friends said he did it by converting the dates to sequential serial numbers and then concatonated those two numbers into one 10 digit number, and matched on that.
That's the old fashioned way of doing it. You can have two variables in your BY statement.
Your question and Merge question so far, neither actually show what you want as output, just have two different source data sets. If you can't show us what you want as the output, we can make guesses but that's about it.
I understand what you have. I don't understand what you want.
Take one ID, with say X records in data1 and Y records in data2 and show exactly what you want as output.
Ideally, you would also include cases such as:
These are the case types I'd check to ensure the merge was happening the way I wanted it to.
@DhanaMalla wrote:
Reeza,
I know I am not being very explicit about what I have and what I want. Let me try to explain as clear as I can. I did ORDER=FREQ and NLEVELS
To identify duplicates, and this is what I found out.
I have two datasets. This data set has other variables for example test type and test result. Datasets A (lab data) has 932 unique DOB, of these Of 932 DOB:
443 DOB IS REPEATED TWICE
94 IS REPEATED THREE TIMES
23 REPEATED FOURTIMES
9 REPEATED 5 TIMES
4 REPEATED SIX TIMES
2 REPEATED 7 TIMES
This data set also has other variables for example test type and test result. Data set B has 899 (clinic data) DOB, Of 899 DOB:
127 is repeated twice
37 is repeated three times
3 is repeated four times.
And both dataset has visit date which is repeated as many as 39 times meaning that date that many test were conducted.
So what I am trying to get is find out number and type of tests done, and number of test results negative or positive by merging these two data set. your help will be really appreciated.
Hi Reeza,
I have 1109 visit date data in set 1 (clinic visit data)and I have 1704 visit date data in another set Lab test data). I have 899 unique patient in set 1 and 932 in set 2. Don't know how to say. So I have multiple in data1 and multiple in data2 in some case but in some case I have 1 in set one ands multiple in another case. Same person can have multiple tests on the same day and same day multiples patients get tested. I am trying to find the number of tests and test results.
Let's bring this back to your other question then:
data LAB;
input visit_dt date mmddyy10. Bdate date mmddyy10. Test testresult $;
format date mmddyy10.;
cards;
03/13/2018 02/27/1931 1 N
02/26/2018 10/21/1944 10 P
02/13/2018 03/06/1978 1 N
02/13/2018 03/06/1978 1 N
01/26/2018 04/14/1949 1 N
02/06/2018 04/14/1949 1 P
; RUN;
data CLINIC;
input visit_dt Bdate date clinic $ diag;
format date mmddyy10.;
cards;
03/13/2018 02/27/1931 mmc 200
01/19/2018 10/06/1934 mmc 300
02/13/2018 03/06/1978 mmc 300
02/13/2018 03/06/1978 mmc 300
01/26/2018 04/14/1949 mmc 200
02/06/2018 04/14/1949 mmc 300
;
RUN;
Let's look at the coloured lines above which are the same person (assumption). What EXACTLY do you want as your final output?
@DhanaMalla wrote:
Hi Reeza,
Don't know how to say. So I have multiple in data1 and multiple in data2 in some case but in some case I have 1 in set one ands multiple in another case. Same person can have multiple tests on the same day and same day multiples patients get tested. I am trying to find the number of tests and test results.
If you don't know, there's no way I can know, especially from the outside looking in. This is a logical decision, not a programming problem per se.
If your final requirement is # of tests and results then you may have different options that don't require merging, unless you need to link the lab test to the results directly and if you do, this is a horrible method to do it and I'd expect your system to have identifiers that link the two - whether or not you have access to that is a different story, but they would have to exist somewhere or you'd have a massive data integrity problem.
Hi Reeza,
Thank you for still being with me. Hope you will be with me until I am able to solve this problem. So I guess this is what I want
03/13/2018 02/27/1931 1 N mmc 200
03/13/2018 02/27/1931 1 N mmc 200
01/19/2018 10/06/1934 mmc 300
02/26/2018 10/21/1944 10 P
02/13/2018 03/06/1978 1 N mmc 300
02/13/2018 03/06/1978 1 N mmc 300
01/26/2018 04/14/1949 1 N mmc 200
02/06/2018 04/14/1949 1 P mmc 300
I want everything for the lab data with added info from clinic if available.
Reeza,
May be this might give you an idea better.
Lab | Clinic |
A | A |
B | B |
C | C |
C | |
D | D |
D | D |
E | E |
F | F |
G | G |
G |
|
H | MISSING |
I | I |
J | J |
J | |
J |
In the table above, person A & B visit clinic and get tested so I have info in both data sets. Person C visited clinic once, but got two tests done. Person D visited clinic twice, tested twice. Person E & F visited clinic and got tested. Person G visited clinic once and tested twice or had two different tests done. Person H is not in clinic data set. Person J visited clinic four times but only had two tests done. This is the pattern of the two datasets I am trying to merge. Your help will be appreciated.
For each of your data sets go through and create a sequence number. I'm assuming you also have some sort of ID variable?
https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/
Once you have your sequence number you can merge using a "BY ID testdate Sequence"
@DhanaMalla wrote:
Hi Reeza,
Thank you for still being with me. Hope you will be with me until I am able to solve this problem. So I guess this is what I want
03/13/2018 02/27/1931 1 N mmc 200
03/13/2018 02/27/1931 1 N mmc 200
01/19/2018 10/06/1934 mmc 300
02/26/2018 10/21/1944 10 P
02/13/2018 03/06/1978 1 N mmc 300
02/13/2018 03/06/1978 1 N mmc 300
01/26/2018 04/14/1949 1 N mmc 200
02/06/2018 04/14/1949 1 P mmc 300
I want everything for the lab data with added info from clinic if available.
Reeza,
May be this might give you an idea better.
Lab
Clinic
A
A
B
B
C
C
C
D
D
D
D
E
E
F
F
G
G
G
H
MISSING
I
I
J
J
J
J
In the table above, person A & B visit clinic and get tested so I have info in both data sets. Person C visited clinic once, but got two tests done. Person D visited clinic twice, tested twice. Person E & F visited clinic and got tested. Person G visited clinic once and tested twice or had two different tests done. Person H is not in clinic data set. Person J visited clinic four times but only had two tests done. This is the pattern of the two datasets I am trying to merge. Your help will be appreciated.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.