Re: Many to Many Merge

robertrao · Posted 01-25-2013 04:16 PM

Diag dataset

person diag_code

101 999

101 888

101 777

101 666

101 555

Proc dataset

person Proc_code

101 111

101 222

101 333

robertrao · Posted 01-25-2013 04:18 PM

Hi Team,

I have two datasets like shown...For the same patient i want to add the procedures codes to the existing diagnosis dataset(with diag codes) he has....

Can I do this merge and still take it easy with the

NOTE: MERGE statement has more than one data set with repeats of BY values.!!!!!!!!

even if the PROCS are more in number for the same patient or VICE VERSA?????

Regards

ballardw · Posted 01-25-2013 04:56 PM

Merge may not do what you are looking for. Can you show your desired output? There are a couple of ways to interpret your request and some details will be helpful.

LarryWorley · Posted 01-25-2013 04:57 PM

How do you want your final data set to look? I suspect the merge you are doing does not give you what you want?

robertrao · Posted 01-25-2013 05:06 PM

DATA WANT

person diag_code proc _code

101 999 111

101 888 222

101 777 333

101 666

101 555

102 xxx code1

102 yyy code2

102 zzz code3

102 code4

102 code5

hdodson_pacificmetrics_com · Posted 01-25-2013 05:27 PM

robertrao,

I'm going to take your title literally as your question. I do not think that the data step's MERGE statement is capable of doing a true Many-to-Many merge. However PROC SQL does:

data Diag;
input @1 person 3. @5 diag_code 3.;
datalines;
101 999
101 888
101 777
101 666
101 555
102 666
102 555
;
run;

data Proc;
input @1 person 3. @5 Proc_code 3.;
datalines;
101 111
101 222
101 333
102 222
102 333
;
run;

PROC SQL;
     CREATE TABLE many_to_many AS
     SELECT Diag.person AS Diag_id,
                     Proc.person AS Proc_id,
                     diag_code,
                     Proc_code
     FROM work.Diag,
                 work.Proc
     WHERE Diag_id = Proc_id;
QUIT;

The blue code above performs a many-to-many merge.

Hope this helps,

Huey

robertrao · Posted 01-25-2013 05:33 PM

Thanks a lot...I have been doing like what i have shown with the concept.....

"it just appends the procedure codes of the same patient to the existing Diagnosis codes

Tom · Posted 01-25-2013 06:31 PM

Looks like a pretty useless format for analysis. Might be useful for a report.

You need to do something to prevent SAS from carrying the last value from the dataset with fewer observations onto all of the rest of rows for that person.

Use code like this.

data diag;

input person diag ;

cards;

101 999

101 888

101 777

101 666

101 555

;

data proc ;

input person proc;

cards;

101 111

101 222

101 333

;

data want ;

merge diag proc;

by person;

output;

call missing(diag,proc);

run;

robertrao · Posted 01-28-2013 04:01 PM

Hi,

I will reframe my question again....

Both the datasets have a fixed observations for the person number....for example in diag dataset any person you take has 50 observations....BUT might have less than 50 diag codes....other occupy a missing code(like shown below)..

data diag;

input person diag ;

cards;

101 999

101 888

101 777

101 666

101 555

101

..

101

The proc dataset on the other hand has less than 50(diag dataset)...say 35 per person......but in that 35 only few has proc codes....

BASICALLY BOTH THE DATASETS ARE THE RESULT OF A TRANSPOSED DATASETS WHERE 50 is the MAXIMUM codes ....FOR A particular patient and likewise 35 is the max procedures for a particular patient in the Proc dataset

hdodson_pacificmetrics_com · Posted 01-28-2013 04:22 PM

robertrao,

Please tell us, in detail, what you plan to use the resultant "merged" dataset for.

I ask for this because, as Tom stated, it is hard to see a practical use for the dataset you desire. That said, there might be a better method to achieve you overall goal.

This would help us to help you.

Thanks,

Huey

robertrao · Posted 01-28-2013 04:28 PM

The resultant is my final dataset containing all the diagnosis codes as well as procedural codes for that patient(have many patients)

I am also wondering why there is no cross merge occuring(for example...each of the obs from procedural is not alligning with each of the ones in the diag dataset???

Regards

hdodson_pacificmetrics_com · Posted 01-28-2013 06:39 PM

robertrao,

You did not say what you plan to do with the dataset.

I don't understand what you mean by "cross merge" and "alligning". I'm sorry.

Would this resultant dataset work instead (?):

person Code Code_Type

101 999 Diagnosis

101 888 Diagnosis

101 777 Diagnosis

101 666 Diagnosis

101 555 Diagnosis

101 111 Procedure

101 222 Procedure

data Diag;
input person 3. @5 Diag_code 3.;
datalines;
101 999
101 888
101 777
101 666
101 555
101
101
;
run;

data Proc;
input person 3. @5 Proc_code 3.;
datalines;
101 111
101 222
101
;
run;

data Merged;
set Diag (rename=(Diag_code = Code))
Proc (rename=(Proc_code = Code))
indsname = current_dataset;

where not missing(Code);

if upcase(strip(current_dataset)) = "WORK.DIAG" then Code_Type = "Diagnosis";
else Code_Type = "Procedure";
run;

robertrao · Posted 01-28-2013 08:55 PM

Basically I have diagnosis dataset initially like:

person dia1 dia2 dia3 dia4 dia5

101 111 222

102 333 444 555 666 777 ..........HAS ALL 5 CODES.................................

103 888..............HAS ONLY ONE CODE.............................................................

I transpose the above and for each person I get 5 observations because a single person in our list has the maximum of 5 diagnosis codes..

person dia

101 111

101 222

101

102 333

102 444

102 555

102 666

102 777

103 888

103

I want to merge the above dataset to a procedure dataset......which has a max proceure of 3(so 3 observations per person)...which was similarly got by transposing like above....

person proc

101 pr1

101 pr2

101

102 pr1

102

103 pr1

103 pr2

103 pr3

Now I want to merge these two....

I use the following code:

data procedures_descrip;

merge dia(in=A) proc;

by person;

if A;

run;

I GOT THE BELOW with a note:

mERGE HAS REPEATS OF bY variables

can i ignore that note..if this is the result i wanted?????????? or does that note mean i went wrong and needed corrections???????

person dia proc

101 111 pr1

101 222 pr2

101

102 333 pr1

102 444

102 555

102 666

102 777

103 888 pr1

103 pr2

103

Linlin · Posted 01-28-2013 09:03 PM

The note is fine if you get what you want.

Tom · Posted 01-28-2013 09:21 PM

First, why not just include a counter variable when you transpose the data so that you can use that in the merge. That will make sure that first PROC aligns with the first DIAG and also eliminate the warning about many-to-many merge.

data diagtall;

set diagwide;

array d diag1-diag50;

do i=1 to dim(d);

diag=d(i); output;

end;

drop diag1-diag50;

run;

data proctall;

set procwide ;

array p proc1-proc3 ;

do i=1 to dim(p);

proc = p(i);

output;

end;

drop proc1-proc3 ;

run;

data want ;

merge diagwide procwide

by person i ;

run;

Or better still merge the two original datasets by person first and then transpose them.

data want ;

merge diagtall proctall ;

by person ;

array p proc1-proc3 ;

array d diag1 -diag50 ;

do i=1 to dim(d);

diag = d(i);

if i <= dim(p) then proc= p(i) ;

else call missing(proc);

output;

end;

run;

SAS Innovate 2025: Call for Content

Classroom Training Available!