Hi ,
i am having problem verifying if the date i merged is right. This is what i did
data LAB;
input visit_dt date mmddyy10. Bdate date mmddyy10. Test testresult $;
format date mmddyy10.;
cards;
03/13/2018 02/27/1931 1 N
02/26/2018 10/21/1944 10 P
02/13/2018 03/06/1978 1 N
02/13/2018 03/06/1978 1 N
01/26/2018 04/14/1949 1 N
02/06/2018 04/14/1949 1 P
; RUN;
data CLINIC;
input visit_dt Bdate date clinic $ diag;
format date mmddyy10.;
cards;
03/13/2018 02/27/1931 mmc 200
01/19/2018 10/06/1934 mmc 300
02/13/2018 03/06/1978 mmc 300
02/13/2018 03/06/1978 mmc 300
01/26/2018 04/14/1949 mmc 200
02/06/2018 04/14/1949 mmc 300
;
RUN;
The data above is just an example. I have 1704 records in lab data set and 1109 records in clinic datatset. I need to have 1704 records after it is merged because that is number of tests done. I tired sorting by visit_dt and merged, the note says ;
MERGE statement has more than one data set with repeats of by values. The SAS system stopped processing this step because of errors.
Your data steps with example data are not correct (too many variables in the input statement, and missing informats). Test them and take care of the errors.
Also show the merge code.
Hello,
than you for your response. I did it again
data LAB;
input visit_dt date9. @12 Bdate date9. @22 Test $ @24 testresult $;
format Bdate mmddyy10. visit_dt mmddyy10. ;
cards;
13mar2018 27feb1931 gc n
26feb2018 21oct1944 gc n
13feb2018 06mar1948 gc p
13feb2018 06mar1948 gc n
26jan2018 14apr1949 gc p
02feb2018 14apr1949 gc n
; RUN;
proc print data=lab;
run;
proc sort data=lab;
by visit_dt bdate;
run;
data CLINIC;
input visit_dt date9. @12 Bdate date9. @22 clinic $ diag @26 ;
format Bdate mmddyy10. visit_dt mmddyy10. ;
cards;
13mar2018 27feb1931 mmc 200
19jan2018 06oct1934 mmc 300
26feb2018 21oct1944 mmc 300
26jan2018 03jun1978 mmc 300
13feb2018 06mar1948 mmc 200
26jan2018 14apr1949 mmc 300
02feb2018 14apr1949 mmc 300
;
RUN;
proc print data=clinic;
run;
proc sort data=clinic;
by visit_dt bdate;
run;
data labcli;
merge lab clinic;
by visit_dt bdate;
run;
it wirked perfectly here but in real datasets I am not sure if it merged correctly because when I did frequency analysis for test the result was not accurate and te log says; merge statement has multiple repeat values in by variable etc....Please help:(
In a large data base, you will unvariably have events where people with identical birthdates will visit on the same day (eg think about twins).
You need to deal with that. It might be that you need a unique key for a visit that is also communicated to the lab.
Paste from the LOG the code and messages. Best is to paste into code box opened using the forum's {I} icon.
Any error came from something else than the "more than one data set with repeats of by values". That is note because the result is often not what you might intend.
A brief example follows of repeats of by values. Look at the two data sets and then think about what you might be thinking of how the merge should work. Then run the code and examine the output to see if matches your expectations.
data work.one; input id value ; datalines; 1 3 1 4 1 5 ; data work.two; input id value2; datalines; 1 8 1 6 1 2 1 1 ; data work.error; merge work.one work.two; by id; run;
Just looking at the data, no computers ...
How do you know how the observations are supposed to match up?
Your first data set contains:
02/13/2018 03/06/1978 1 N
02/13/2018 03/06/1978 1 N
Your second data set contains:
02/13/2018 03/06/1978 mmc 300
02/13/2018 03/06/1978 mmc 300
Are we to assume that these match (first row with first row, second row with second row)? How do we know whether that match should be reversed (first row with second row, second row with first row)? In this case, you have two identical rows so the outcome would be the same either way. But your data set doesn't contain a reliable way to match the data when both data sets have multiple rows for the same combination of dates. The situation only gets worse if you have two rows in one data set, and three possible matches in the other data set. (You did mention that your data sets don't contain the same number of observations.)
That's what SAS's message is trying to warn you about. But regardless of whether SAS warns you, how can you reliably match the two data sets? Should there be another variable such as a PatientID?
FROM THIS :
First data set contents;
02/13/2018 03/06/1978 1 N
02/13/2018 03/06/1978 1 N
Second data set contains:
02/13/2018 03/06/1978 mmc 300
02/13/2018 03/06/1978 mmc 300
THIS IS WHAT I WANT
02/13/2018 03/06/1978 1 N mmc 300
02/13/2018 03/06/1978 1 N mmc 300
For the case you posted, that's a perfectly fine solution. How about this case:
First data set contents;
02/13/2018 03/06/1978 1 N
02/13/2018 03/06/1978 1 N
02/13/2018 03/06/1978 1 P
Second data set contains:
02/13/2018 03/06/1978 mmc 300
02/13/2018 03/06/1978 mmc 200
What should the result be?
As I mentioned earlier, these two data sets (lab and clinic) have two common variables and they are repeated in both datasets. Same individual may have several test at the same date and result of different can be different. And on the same date many individual get tested. All I want to know is number and type of tests done and their results.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.