BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
bretthouston
Obsidian | Level 7

Hello,

 

I am trying to merge two tables (example below) by two variables (PatientID and AdmissionID). When I do so, the data in one of the variables (patientWeight) is dropped. If I merge exclusively on PatientID however, this isn't the case.

 

Here is an example of table 1 (merged_txainformation):

 

PatientID   AdmissionID txaAdministrationType

1                    HSC1             Infusion

1                    HSC2             Infusion

2                    34253             Bolus

3                   32223              Infusion

 

Here is an example of table 2 (weightinfo):

PatientID   AdmissionID   PatientWeight

1                    HSC1                  68

1                    HSC2                  78

1                    HSC3                  110

2                    34253                  96

3                    32223                  .

 

Ideally, I would like the table to look like the following:

PatientID   AdmissionID   txaAdministrationType PatientWeight

1                    HSC1             Infusion                     68

1                    HSC2             Infusion                     78

2                    34253             Bolus                        96

3                   32223              Infusion                     .

 

I've tried the following code:

data mergedTXAweight;
merge merged_txaInformation (in=TXA)
weightinfo (in=Ottweight);
by PatientID AdmissionID;
if TXA then
output;
run;

 

If I merge exclusively by PatientID then the values in PatientWeight do result in the merged table, but so does the row with PatientID #1 and AdmissionID HSC3 (I only want the people in the merged_txaInformation table). If I merge by PatientID and AdmissionID, then the values in PatientWeight are all missing. 

 

Of note, PatientID is a numeric variable (length = 😎 in both tables, and AdmissionID is a character variable (length = 10) in both tables. I'm not sure if this is the issue? 

 

Thanks in advance!

Brett

1 ACCEPTED SOLUTION

Accepted Solutions
bretthouston
Obsidian | Level 7

Thanks for the reply - I really appreciate it! As best I can tell they match? Here is an excerpt from the proc contents I've run on both of the tables.

 

Is there a different way for me to figure out if there are non-printing characters? Could it be because of the informats or label?

 

 

 

 

View solution in original post

6 REPLIES 6
ballardw
Super User

@bretthouston wrote:

Hello,

 

I am trying to merge two tables (example below) by two variables (PatientID and AdmissionID). When I do so, the data in one of the variables (patientWeight) is dropped. If I merge exclusively on PatientID however, this isn't the case.

 

Here is an example of table 1 (merged_txainformation):

 

PatientID   AdmissionID txaAdministrationType

1                    HSC1             Infusion

1                    HSC2             Infusion

2                    34253             Bolus

3                   32223              Infusion

 

Here is an example of table 2 (weightinfo):

PatientID   AdmissionID   PatientWeight

1                    HSC1                  68

1                    HSC2                  78

1                    HSC3                  110

2                    34253                  96

3                    32223                  .

 

Ideally, I would like the table to look like the following:

PatientID   AdmissionID   txaAdministrationType PatientWeight

1                    HSC1             Infusion                     68

1                    HSC2             Infusion                     78

2                    34253             Bolus                        96

3                   32223              Infusion                     .

 

I've tried the following code:

data mergedTXAweight;
merge merged_txaInformation (in=TXA)
weightinfo (in=Ottweight);
by PatientID AdmissionID;
if TXA then
output;
run;

 

If I merge exclusively by PatientID then the values in PatientWeight do result in the merged table, but so does the row with PatientID #1 and AdmissionID HSC3 (I only want the people in the merged_txaInformation table). If I merge by PatientID and AdmissionID, then the values in PatientWeight are all missing. 

 

Of note, PatientID is a numeric variable (length = 😎 in both tables, and AdmissionID is a character variable (length = 10) in both tables. I'm not sure if this is the issue? 

 

Thanks in advance!

Brett


What does this have to do with what is going on? Is this a different data set?

If I merge exclusively by PatientID then the values in PatientWeight do result in the merged table, but so does the row with PatientID #1 and AdmissionID HSC3 (I only want the people in the merged_txaInformation table). If I merge by PatientID and AdmissionID, then the values in PatientWeight are all missing. 

 

If you made a separate data set merged_txaInformation that is different than the set you used above then you may have done something such that the AdmissionId values do not align.

 

I suggest that you try running your code without the "if TXA then output; " and see if you have data where the PatientId and Admission values look a bit odd sequentially. Note that if one of your data sets somehow has leading spaces in the AdmissionId that "HSC3" will not be identical to " HSC3" and the merge doesn't happen as you expect.

 

 

 

Tom
Super User Tom
Super User

Your data works find for me.  Are you sure the ADMISSIONID values actually match?  Does one have non-printing characters, such as TAB, or leading spaces, case differences?

 

data one ;
 input PatientID  AdmissionID $ txaAdministrationType :$20.;
cards;
1                    HSC1             Infusion
1                    HSC2             Infusion
2                    34253             Bolus
3                   32223              Infusion
;

data two;
  input PatientID  AdmissionID $   PatientWeight ;
cards;
1                    HSC1                  68
1                    HSC2                  78
1                    HSC3                  110
2                    34253                  96
3                    32223                  .
;

data want;
 merge one (in=in1) two (in=in2) ;
 by patientid admissionid ;
 if in1;
run;

proc print; run;
                                    txa
       Patient    Admission    Administration    Patient
Obs       ID         ID             Type          Weight

 1        1         HSC1          Infusion          68
 2        1         HSC2          Infusion          78
 3        2         34253         Bolus             96
 4        3         32223         Infusion           .

bretthouston
Obsidian | Level 7

Thanks for the reply - I really appreciate it! As best I can tell they match? Here is an excerpt from the proc contents I've run on both of the tables.

 

Is there a different way for me to figure out if there are non-printing characters? Could it be because of the informats or label?

 

 

 

 

bretthouston
Obsidian | Level 7

I compressed the values in one of the AdmissionID tables, and now it is merging properly. Presumably there we leading characters I couldn't see. Thanks for the advice!

ballardw
Super User

@bretthouston wrote:

I compressed the values in one of the AdmissionID tables, and now it is merging properly. Presumably there we leading characters I couldn't see. Thanks for the advice!


I'm glad you solved the problem. In that case I would suggest changing your marked solution from my hints to your compressed data solution.

 

I was going to mention that a character format shorter than the length of a character variable is also something to be careful with.

What will happen when you either print data or use a table viewer is that the last character(s) may not appear. Which visually may lead you to believe that the values are identical when they actually are not.

 

data example;
   input x $ 1-10;
   format x $9.;
datalines;
1234567890
;

Proc print data=example;
   title 'Default format $9.';
run;
/* compare with*/

proc print data=example;
   title 'Longer format $10.';
   format x $10.;
run;  title;

So if I were attempting to merge with another data set where X was actually only 9 characters then the merge would come out wonky in some form.

 

There are a number of functions you can use to find specific sorts of characters that might not be printable. ANYCTRL, ANYSPACE will return the first position of either a control character  (a number of odd things) or space type character (blank, tab, carriage return, line feed or form feed) or zero if not found.

A brief example:

data example;
   input x $ 1-10;
   if anyspace(x)>0 then put 'Some space found in x: ' x=;
   else put 'No space found in x';
datalines;
12345 7890
abcdefghij
;
Tom
Super User Tom
Super User

This is an example of why I always try to remove formats from character variables. You have define ADMISSIONID is length 10 but assigned a display format that only has a width of 9.  So your printouts will look the same even when the 10tth character is different.

 

To remove a format from a variable use the FORMAT statement that lists the variable(s) without any format specification.

data want;
  set have;
  format _character_ ;
run;

Formats convert values to text.  They are used to tell SAS how to display the variable's values. SAS already knows how to display character variables, so it does not need to have any special format attached. 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 651 views
  • 1 like
  • 3 in conversation