Hello,
I am trying to merge two tables (example below) by two variables (PatientID and AdmissionID). When I do so, the data in one of the variables (patientWeight) is dropped. If I merge exclusively on PatientID however, this isn't the case.
Here is an example of table 1 (merged_txainformation):
PatientID AdmissionID txaAdministrationType
1 HSC1 Infusion
1 HSC2 Infusion
2 34253 Bolus
3 32223 Infusion
Here is an example of table 2 (weightinfo):
PatientID AdmissionID PatientWeight
1 HSC1 68
1 HSC2 78
1 HSC3 110
2 34253 96
3 32223 .
Ideally, I would like the table to look like the following:
PatientID AdmissionID txaAdministrationType PatientWeight
1 HSC1 Infusion 68
1 HSC2 Infusion 78
2 34253 Bolus 96
3 32223 Infusion .
I've tried the following code:
data mergedTXAweight;
merge merged_txaInformation (in=TXA)
weightinfo (in=Ottweight);
by PatientID AdmissionID;
if TXA then
output;
run;
If I merge exclusively by PatientID then the values in PatientWeight do result in the merged table, but so does the row with PatientID #1 and AdmissionID HSC3 (I only want the people in the merged_txaInformation table). If I merge by PatientID and AdmissionID, then the values in PatientWeight are all missing.
Of note, PatientID is a numeric variable (length = 😎 in both tables, and AdmissionID is a character variable (length = 10) in both tables. I'm not sure if this is the issue?
Thanks in advance!
Brett
Thanks for the reply - I really appreciate it! As best I can tell they match? Here is an excerpt from the proc contents I've run on both of the tables.
Is there a different way for me to figure out if there are non-printing characters? Could it be because of the informats or label?
@bretthouston wrote:
Hello,
I am trying to merge two tables (example below) by two variables (PatientID and AdmissionID). When I do so, the data in one of the variables (patientWeight) is dropped. If I merge exclusively on PatientID however, this isn't the case.
Here is an example of table 1 (merged_txainformation):
PatientID AdmissionID txaAdministrationType
1 HSC1 Infusion
1 HSC2 Infusion
2 34253 Bolus
3 32223 Infusion
Here is an example of table 2 (weightinfo):
PatientID AdmissionID PatientWeight
1 HSC1 68
1 HSC2 78
1 HSC3 110
2 34253 96
3 32223 .
Ideally, I would like the table to look like the following:
PatientID AdmissionID txaAdministrationType PatientWeight
1 HSC1 Infusion 68
1 HSC2 Infusion 78
2 34253 Bolus 96
3 32223 Infusion .
I've tried the following code:
data mergedTXAweight;
merge merged_txaInformation (in=TXA)
weightinfo (in=Ottweight);
by PatientID AdmissionID;
if TXA then
output;
run;
If I merge exclusively by PatientID then the values in PatientWeight do result in the merged table, but so does the row with PatientID #1 and AdmissionID HSC3 (I only want the people in the merged_txaInformation table). If I merge by PatientID and AdmissionID, then the values in PatientWeight are all missing.
Of note, PatientID is a numeric variable (length = 😎 in both tables, and AdmissionID is a character variable (length = 10) in both tables. I'm not sure if this is the issue?
Thanks in advance!
Brett
What does this have to do with what is going on? Is this a different data set?
If I merge exclusively by PatientID then the values in PatientWeight do result in the merged table, but so does the row with PatientID #1 and AdmissionID HSC3 (I only want the people in the merged_txaInformation table). If I merge by PatientID and AdmissionID, then the values in PatientWeight are all missing.
If you made a separate data set merged_txaInformation that is different than the set you used above then you may have done something such that the AdmissionId values do not align.
I suggest that you try running your code without the "if TXA then output; " and see if you have data where the PatientId and Admission values look a bit odd sequentially. Note that if one of your data sets somehow has leading spaces in the AdmissionId that "HSC3" will not be identical to " HSC3" and the merge doesn't happen as you expect.
Your data works find for me. Are you sure the ADMISSIONID values actually match? Does one have non-printing characters, such as TAB, or leading spaces, case differences?
data one ;
input PatientID AdmissionID $ txaAdministrationType :$20.;
cards;
1 HSC1 Infusion
1 HSC2 Infusion
2 34253 Bolus
3 32223 Infusion
;
data two;
input PatientID AdmissionID $ PatientWeight ;
cards;
1 HSC1 68
1 HSC2 78
1 HSC3 110
2 34253 96
3 32223 .
;
data want;
merge one (in=in1) two (in=in2) ;
by patientid admissionid ;
if in1;
run;
proc print; run;
txa Patient Admission Administration Patient Obs ID ID Type Weight 1 1 HSC1 Infusion 68 2 1 HSC2 Infusion 78 3 2 34253 Bolus 96 4 3 32223 Infusion .
Thanks for the reply - I really appreciate it! As best I can tell they match? Here is an excerpt from the proc contents I've run on both of the tables.
Is there a different way for me to figure out if there are non-printing characters? Could it be because of the informats or label?
I compressed the values in one of the AdmissionID tables, and now it is merging properly. Presumably there we leading characters I couldn't see. Thanks for the advice!
@bretthouston wrote:
I compressed the values in one of the AdmissionID tables, and now it is merging properly. Presumably there we leading characters I couldn't see. Thanks for the advice!
I'm glad you solved the problem. In that case I would suggest changing your marked solution from my hints to your compressed data solution.
I was going to mention that a character format shorter than the length of a character variable is also something to be careful with.
What will happen when you either print data or use a table viewer is that the last character(s) may not appear. Which visually may lead you to believe that the values are identical when they actually are not.
data example; input x $ 1-10; format x $9.; datalines; 1234567890 ; Proc print data=example; title 'Default format $9.'; run; /* compare with*/ proc print data=example; title 'Longer format $10.'; format x $10.; run; title;
So if I were attempting to merge with another data set where X was actually only 9 characters then the merge would come out wonky in some form.
There are a number of functions you can use to find specific sorts of characters that might not be printable. ANYCTRL, ANYSPACE will return the first position of either a control character (a number of odd things) or space type character (blank, tab, carriage return, line feed or form feed) or zero if not found.
A brief example:
data example; input x $ 1-10; if anyspace(x)>0 then put 'Some space found in x: ' x=; else put 'No space found in x'; datalines; 12345 7890 abcdefghij ;
This is an example of why I always try to remove formats from character variables. You have define ADMISSIONID is length 10 but assigned a display format that only has a width of 9. So your printouts will look the same even when the 10tth character is different.
To remove a format from a variable use the FORMAT statement that lists the variable(s) without any format specification.
data want;
set have;
format _character_ ;
run;
Formats convert values to text. They are used to tell SAS how to display the variable's values. SAS already knows how to display character variables, so it does not need to have any special format attached.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.