Diag dataset
person diag_code
101 999
101 888
101 777
101 666
101 555
Proc dataset
person Proc_code
101 111
101 222
101 333
Hi Team,
I have two datasets like shown...For the same patient i want to add the procedures codes to the existing diagnosis dataset(with diag codes) he has....
Can I do this merge and still take it easy with the
NOTE: MERGE statement has more than one data set with repeats of BY values.!!!!!!!!
even if the PROCS are more in number for the same patient or VICE VERSA?????
Regards
Merge may not do what you are looking for. Can you show your desired output? There are a couple of ways to interpret your request and some details will be helpful.
How do you want your final data set to look? I suspect the merge you are doing does not give you what you want?
DATA WANT
person diag_code proc _code
101 999 111
101 888 222
101 777 333
101 666
101 555
102 xxx code1
102 yyy code2
102 zzz code3
102 code4
102 code5
robertrao,
I'm going to take your title literally as your question. I do not think that the data step's MERGE statement is capable of doing a true Many-to-Many merge. However PROC SQL does:
data Diag;
input @1 person 3. @5 diag_code 3.;
datalines;
101 999
101 888
101 777
101 666
101 555
102 666
102 555
;
run;
data Proc;
input @1 person 3. @5 Proc_code 3.;
datalines;
101 111
101 222
101 333
102 222
102 333
;
run;
PROC SQL;
CREATE TABLE many_to_many AS
SELECT Diag.person AS Diag_id,
Proc.person AS Proc_id,
diag_code,
Proc_code
FROM work.Diag,
work.Proc
WHERE Diag_id = Proc_id;
QUIT;
The blue code above performs a many-to-many merge.
Hope this helps,
Huey
Thanks a lot...I have been doing like what i have shown with the concept.....
"it just appends the procedure codes of the same patient to the existing Diagnosis codes
Looks like a pretty useless format for analysis. Might be useful for a report.
You need to do something to prevent SAS from carrying the last value from the dataset with fewer observations onto all of the rest of rows for that person.
Use code like this.
data diag;
input person diag ;
cards;
101 999
101 888
101 777
101 666
101 555
;
data proc ;
input person proc;
cards;
101 111
101 222
101 333
;
data want ;
merge diag proc;
by person;
output;
call missing(diag,proc);
run;
Hi,
I will reframe my question again....
Both the datasets have a fixed observations for the person number....for example in diag dataset any person you take has 50 observations....BUT might have less than 50 diag codes....other occupy a missing code(like shown below)..
data diag;
input person diag ;
cards;
101 999
101 888
101 777
101 666
101 555
101
101
101
101
101
101
..
..
..
101
101
The proc dataset on the other hand has less than 50(diag dataset)...say 35 per person......but in that 35 only few has proc codes....
BASICALLY BOTH THE DATASETS ARE THE RESULT OF A TRANSPOSED DATASETS WHERE 50 is the MAXIMUM codes ....FOR A particular patient and likewise 35 is the max procedures for a particular patient in the Proc dataset
robertrao,
Please tell us, in detail, what you plan to use the resultant "merged" dataset for.
I ask for this because, as Tom stated, it is hard to see a practical use for the dataset you desire. That said, there might be a better method to achieve you overall goal.
This would help us to help you.
Thanks,
Huey
The resultant is my final dataset containing all the diagnosis codes as well as procedural codes for that patient(have many patients)
I am also wondering why there is no cross merge occuring(for example...each of the obs from procedural is not alligning with each of the ones in the diag dataset???
Regards
robertrao,
You did not say what you plan to do with the dataset.
I don't understand what you mean by "cross merge" and "alligning". I'm sorry.
Would this resultant dataset work instead (?):
person Code Code_Type
101 999 Diagnosis
101 888 Diagnosis
101 777 Diagnosis
101 666 Diagnosis
101 555 Diagnosis
101 111 Procedure
101 222 Procedure
data Diag;
input person 3. @5 Diag_code 3.;
datalines;
101 999
101 888
101 777
101 666
101 555
101
101
;
run;
data Proc;
input person 3. @5 Proc_code 3.;
datalines;
101 111
101 222
101
;
run;
data Merged;
set Diag (rename=(Diag_code = Code))
Proc (rename=(Proc_code = Code))
indsname = current_dataset;
where not missing(Code);
if upcase(strip(current_dataset)) = "WORK.DIAG" then Code_Type = "Diagnosis";
else Code_Type = "Procedure";
run;
Basically I have diagnosis dataset initially like:
person dia1 dia2 dia3 dia4 dia5
101 111 222
102 333 444 555 666 777 ..........HAS ALL 5 CODES.................................
103 888..............HAS ONLY ONE CODE.............................................................
I transpose the above and for each person I get 5 observations because a single person in our list has the maximum of 5 diagnosis codes..
person dia
101 111
101 222
101
101
101
102 333
102 444
102 555
102 666
102 777
103 888
103
103
103
103
I want to merge the above dataset to a procedure dataset......which has a max proceure of 3(so 3 observations per person)...which was similarly got by transposing like above....
person proc
101 pr1
101 pr2
101
102 pr1
102
102
103 pr1
103 pr2
103 pr3
Now I want to merge these two....
I use the following code:
data procedures_descrip;
merge dia(in=A) proc;
by person;
if A;
run;
I GOT THE BELOW with a note:
mERGE HAS REPEATS OF bY variables
can i ignore that note..if this is the result i wanted?????????? or does that note mean i went wrong and needed corrections???????
person dia proc
101 111 pr1
101 222 pr2
101
101
101
102 333 pr1
102 444
102 555
102 666
102 777
103 888 pr1
103 pr2
103 pr2
103
103
The note is fine if you get what you want.
First, why not just include a counter variable when you transpose the data so that you can use that in the merge. That will make sure that first PROC aligns with the first DIAG and also eliminate the warning about many-to-many merge.
data diagtall;
set diagwide;
array d diag1-diag50;
do i=1 to dim(d);
diag=d(i); output;
end;
drop diag1-diag50;
run;
data proctall;
set procwide ;
array p proc1-proc3 ;
do i=1 to dim(p);
proc = p(i);
output;
end;
drop proc1-proc3 ;
run;
data want ;
merge diagwide procwide
by person i ;
run;
Or better still merge the two original datasets by person first and then transpose them.
data want ;
merge diagtall proctall ;
by person ;
array p proc1-proc3 ;
array d diag1 -diag50 ;
do i=1 to dim(d);
diag = d(i);
if i <= dim(p) then proc= p(i) ;
else call missing(proc);
output;
end;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.