DATA Step, Macro, Functions and more

Many to Many Merge

Reply
Super Contributor
Posts: 1,041

Many to Many Merge

Diag  dataset

person   diag_code

101        999

101        888

101        777

101        666

101        555

Proc  dataset

person   Proc_code

101        111

101        222

101        333

Super Contributor
Posts: 1,041

Re: Many to Many Merge

Posted in reply to robertrao

Hi Team,

I have two datasets like shown...For the same patient i want to add the procedures codes to the existing diagnosis dataset(with diag codes) he has....

Can I do this merge and still take it easy with the

NOTE: MERGE statement has more than one data set with repeats of BY values.!!!!!!!!

even if the PROCS are more in number for the same patient or VICE VERSA?????

Regards

Super User
Posts: 11,343

Re: Many to Many Merge

Posted in reply to robertrao

Merge may not do what you are looking for. Can you show your desired output? There are a couple of ways to interpret your request and some details will be helpful.

Frequent Contributor
Posts: 129

Re: Many to Many Merge

Posted in reply to robertrao

How do you want your final data set to look?  I suspect the merge you are doing does not give you what you want? 

Super Contributor
Posts: 1,041

Re: Many to Many Merge

Posted in reply to LarryWorley

DATA WANT

person   diag_code   proc _code

101        999               111

101        888               222

101        777               333

101        666             

101        555

102       xxx             code1    

102       yyy             code2

102       zzz             code3

102                         code4

102                         code5

Contributor
Posts: 22

Re: Many to Many Merge

Posted in reply to robertrao

robertrao,

     I'm going to take your title literally as your question. I do not think that the data step's MERGE statement is capable of doing a true Many-to-Many merge. However PROC SQL does:

data Diag;
     input @1 person 3. @5 diag_code 3.;
     datalines;
101 999
101 888
101 777
101 666
101 555
102 666
102 555
;
run;

data Proc;
     input @1 person 3. @5 Proc_code 3.;
     datalines;
101 111
101 222
101 333
102 222
102 333
;
run;

PROC SQL;
     CREATE TABLE many_to_many AS
     SELECT  Diag.person AS Diag_id,
                     Proc.person AS Proc_id,
                     diag_code,
                     Proc_code
     FROM work.Diag,
                 work.Proc
     WHERE Diag_id = Proc_id;  
QUIT;

The blue code above performs a many-to-many merge.

Hope this helps,

Huey

Super Contributor
Posts: 1,041

Re: Many to Many Merge

Posted in reply to hdodson_pacificmetrics_com

Thanks a lot...I have been doing like what i have shown with the concept.....

"it just appends the procedure codes of the same patient to the existing Diagnosis codes

Super User
Super User
Posts: 7,060

Re: Many to Many Merge

Posted in reply to robertrao

Looks like a pretty useless format for analysis.  Might be useful for a report.

You need to do something to prevent SAS from carrying the last value from the dataset with fewer observations onto all of the rest of rows for that person.

Use code like this.

data diag;

  input person diag ;

cards;

101  999

101  888

101  777

101  666

101  555

;

data proc ;

  input person proc;

cards;

101  111

101  222

101  333

;

data want ;

  merge diag proc;

  by person;

  output;

  call missing(diag,proc);

run;

Super Contributor
Posts: 1,041

Re: Many to Many Merge

Hi,

I will reframe my question again....

Both the datasets have a fixed observations for the person number....for example in diag dataset any person you take has 50 observations....BUT might have less than 50 diag codes....other occupy a missing code(like shown below)..

data diag;

  input person diag ;

cards;

101  999

101  888

101  777

101  666

101  555

101  

101

101

101

101

101

..

..

..

101

101

The proc dataset on the other hand has less than 50(diag dataset)...say 35 per person......but in that 35 only few has proc codes....

BASICALLY BOTH THE DATASETS ARE THE RESULT OF A TRANSPOSED DATASETS WHERE 50 is the MAXIMUM codes ....FOR A particular patient and likewise 35 is the max procedures for a particular patient in the Proc dataset

Contributor
Posts: 22

Re: Many to Many Merge

Posted in reply to robertrao

robertrao,

     Please tell us, in detail, what you plan to use the resultant "merged" dataset for.

     I ask for this because, as Tom stated, it is hard to see a practical use for the dataset you desire. That said, there might be a better method to achieve you overall goal.

     This would help us to help you.

Thanks,

Huey

Super Contributor
Posts: 1,041

Re: Many to Many Merge

Posted in reply to hdodson_pacificmetrics_com

The resultant is my final dataset containing all the diagnosis codes as well as procedural codes for that patient(have many patients)


I am also wondering why there is no cross merge occuring(for example...each of the obs from procedural is not alligning with each of the ones in the diag dataset???

Regards

Contributor
Posts: 22

Re: Many to Many Merge

Posted in reply to robertrao

robertrao,

    

     You did not say what you plan to do with the dataset.

     I don't understand what you mean by "cross merge" and "alligning". I'm sorry.

     Would this resultant dataset work instead (?):

person      Code      Code_Type

101           999           Diagnosis

101           888           Diagnosis

101           777           Diagnosis

101           666           Diagnosis

101           555           Diagnosis

101           111           Procedure

101           222           Procedure

data Diag;
     input person 3. @5 Diag_code 3.;
     datalines;
101 999
101 888
101 777
101 666
101 555
101
101
;
run;


data Proc;
     input person 3. @5 Proc_code 3.;
     datalines;
101 111
101 222
101
;
run;


data Merged;
set Diag (rename=(Diag_code = Code))
  Proc (rename=(Proc_code = Code))
  indsname = current_dataset;

where not missing(Code);

if upcase(strip(current_dataset)) = "WORK.DIAG" then Code_Type = "Diagnosis";
else Code_Type = "Procedure";
run;

Super Contributor
Posts: 1,041

Re: Many to Many Merge

Posted in reply to hdodson_pacificmetrics_com

Basically I have diagnosis dataset initially like:

person    dia1    dia2   dia3   dia4   dia5

101       111       222  

102       333       444      555  666    777 ..........HAS ALL 5 CODES.................................

103      888..............HAS ONLY ONE CODE.............................................................   

I transpose the above and for each person I get 5 observations because a single person in our list has the maximum of 5 diagnosis codes..

person    dia  

101       111     

101        222

101      

101      

101

102       333                 

102      444

102      555

102      666

102      777

103     888

103

103

103

103

I want to merge the above dataset to a procedure dataset......which has a max proceure of 3(so 3 observations per person)...which was similarly got by transposing like above....

person   proc

101       pr1

101       pr2

101

102      pr1

102    

102

103     pr1

103    pr2

103   pr3

Now I want to merge these two....

I use the following code:

data procedures_descrip;

merge dia(in=A)  proc;

by person;

if A;

run;

I GOT THE BELOW with a note:

mERGE HAS REPEATS OF bY variables

can i ignore that note..if this is the result i wanted?????????? or does that note mean i went wrong and needed corrections???????

person    dia     proc  

101       111      pr1  

101        222      pr2

101                 

101                

101      

102       333    pr1                 

102      444

102      555

102      666

102      777

103     888     pr1

103                pr2

103               pr2

103

103

Super Contributor
Posts: 1,636

Re: Many to Many Merge

Posted in reply to robertrao

The note is fine if you get what you want.

Super User
Super User
Posts: 7,060

Re: Many to Many Merge

Posted in reply to robertrao

First, why not just include a counter variable when you transpose the data so that you can use that in the merge. That will make sure that first PROC aligns with the first DIAG and also eliminate the warning about many-to-many merge.

data diagtall;

  set diagwide;

  array d diag1-diag50;

  do i=1 to dim(d);

     diag=d(i); output;

  end;

  drop diag1-diag50;

run;

data proctall;

  set procwide ;

  array p proc1-proc3 ;

  do i=1 to dim(p);

     proc = p(i);

     output;

  end;

  drop proc1-proc3 ;

run;

data want ;

  merge diagwide procwide

  by person i ;

run;

Or better still merge the two original datasets by person first and then transpose them.

data want ;

  merge diagtall proctall ;

  by person ;

  array p proc1-proc3 ;

  array d diag1 -diag50 ;

  do i=1 to dim(d);

     diag = d(i);

     if i <= dim(p) then proc= p(i) ;

     else call missing(proc);

     output;

  end;

run;

Ask a Question
Discussion stats
  • 17 replies
  • 525 views
  • 3 likes
  • 6 in conversation