BookmarkSubscribeRSS Feed
Dhana18
Obsidian | Level 7

Hi ,

i am having problem verifying if the date i merged is right. This is what i did

 

data LAB;

input visit_dt date mmddyy10. Bdate date mmddyy10. Test  testresult $;

format date mmddyy10.;

cards;

03/13/2018  02/27/1931  1  N

02/26/2018 10/21/1944  10  P

02/13/2018  03/06/1978  1   N

02/13/2018  03/06/1978  1   N

01/26/2018  04/14/1949   1  N

02/06/2018  04/14/1949  1  P

; RUN;

 

data CLINIC;

input visit_dt Bdate date clinic $ diag;

format date mmddyy10.;

cards;

03/13/2018  02/27/1931  mmc 200

01/19/2018 10/06/1934  mmc  300

02/13/2018  03/06/1978  mmc  300

02/13/2018  03/06/1978  mmc  300

01/26/2018  04/14/1949   mmc 200

02/06/2018  04/14/1949  mmc  300

;

RUN;

 

The data above is just an example. I have 1704 records in lab data set and 1109 records in clinic datatset. I need to have 1704 records after it is merged because that is number of tests done.  I tired sorting by visit_dt and merged, the note says ;

 

MERGE statement has more than one data set with repeats of by values. The SAS system stopped processing this step because of errors.

 

8 REPLIES 8
Kurt_Bremser
Super User

Your data steps with example data are not correct (too many variables in the input statement, and missing informats). Test them and take care of the errors.

Also show the merge code.

DhanaMalla
Calcite | Level 5

Hello,

than you for your response. I did it again

data LAB;

input visit_dt date9. @12 Bdate date9. @22 Test $ @24 testresult $;

format Bdate mmddyy10. visit_dt mmddyy10. ;

cards;

 

13mar2018 27feb1931 gc n

26feb2018 21oct1944 gc n

13feb2018 06mar1948 gc p

13feb2018 06mar1948 gc n

26jan2018 14apr1949 gc p

02feb2018 14apr1949 gc n

; RUN;

proc print data=lab;

run;

proc sort data=lab;

by visit_dt bdate;

run;

data CLINIC;

input visit_dt date9. @12 Bdate date9. @22 clinic $ diag @26 ;

format Bdate mmddyy10. visit_dt mmddyy10. ;

cards;

 

13mar2018 27feb1931 mmc 200

19jan2018 06oct1934 mmc 300

26feb2018 21oct1944 mmc 300

26jan2018 03jun1978 mmc 300

13feb2018 06mar1948 mmc 200

26jan2018 14apr1949 mmc 300

02feb2018 14apr1949 mmc 300

;

RUN;

proc print data=clinic;

run;

proc sort data=clinic;

by visit_dt bdate;

run;

data labcli;

merge lab clinic;

by visit_dt bdate;

run;

 it wirked perfectly here but in real datasets I am not sure if it merged correctly because when I did frequency analysis for test the result was not accurate and te log says; merge statement has multiple repeat values in by variable etc....Please help:(

 

Kurt_Bremser
Super User

In a large data base, you will unvariably have events where people with identical birthdates will visit on the same day (eg think about twins).

You need to deal with that. It might be that you need a unique key for a visit that is also communicated to the lab.

ballardw
Super User

Paste from the LOG the code and messages. Best is to paste into code box opened using the forum's {I} icon.

 

Any error came from something else than the "more than one data set with repeats of by values". That is note because the result is often not what you might intend.

 

A brief example follows of repeats of by values. Look at the two data sets and then think about what you might be thinking of how the merge should work. Then run the code and examine the output to see if matches your expectations.

data work.one;
   input id value  ;
datalines;
1 3
1 4
1 5
;

data work.two;
   input id value2;
datalines;
1 8
1 6
1 2
1 1
;


data work.error;
   merge work.one work.two;
   by id;
run;
Astounding
PROC Star

Just looking at the data, no computers ...

 

How do you know how the observations are supposed to match up?

 

Your first data set contains:

 

02/13/2018  03/06/1978  1   N

02/13/2018  03/06/1978  1   N

 

Your second data set contains:

 

02/13/2018  03/06/1978  mmc  300

02/13/2018  03/06/1978  mmc  300

 

Are we to assume that these match (first row with first row, second row with second row)?  How do we know whether that match should be reversed (first row with second row, second row with first row)?  In this case, you have two identical rows so the outcome would be the same either way.  But your data set doesn't contain a reliable way to match the data when both data sets have multiple rows for the same combination of dates.  The situation only gets worse if you have two rows in one data set, and three possible matches in the other data set.  (You did mention that your data sets don't contain the same number of observations.)

 

That's what SAS's message is trying to warn you about.  But regardless of whether SAS warns you, how can you reliably match the two data sets?  Should there be another variable such as a PatientID?

DhanaMalla
Calcite | Level 5

 

FROM THIS :

First data set contents;

02/13/2018  03/06/1978  1   N

02/13/2018  03/06/1978  1   N

 

Second data set contains:

02/13/2018  03/06/1978  mmc  300

02/13/2018  03/06/1978  mmc  300

 

THIS IS WHAT I WANT

02/13/2018  03/06/1978  1   N mmc  300

02/13/2018  03/06/1978  1   N mmc  300

Astounding
PROC Star

For the case you posted, that's a perfectly fine solution.  How about this case:

 

First data set contents;

02/13/2018  03/06/1978  1   N

02/13/2018  03/06/1978  1   N

02/13/2018  03/06/1978  1   P

 

Second data set contains:

02/13/2018  03/06/1978  mmc  300

02/13/2018  03/06/1978  mmc  200

 

What should the result be?

DhanaMalla
Calcite | Level 5

As I mentioned earlier, these two data sets (lab and clinic) have two common variables and they are repeated in both datasets. Same individual may have several test at the same date and result of different can be different. And on the same date many individual get tested. All I want to know is number and type of tests done and their results.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 867 views
  • 0 likes
  • 5 in conversation