BookmarkSubscribeRSS Feed
Newtrix
Fluorite | Level 6

The sort key is the same on both, and is formatted identically on both as a character with idential width. The merged dataset simply stacks one on top of the other. 

 

Code:

PROC SORT DATA=AE.LOG_AE;
BY Subject;
RUN;
PROC SORT DATA=AE.Treatment;
    BY Subject;
RUN;
 
DATA AEListings;
MERGE AE.Log_AE AE.Treatment;
BY Subject;
RUN;
 
Any and all help is welcome. Thanks.
9 REPLIES 9
Tom
Super User Tom
Super User

So something is different.

First make sure the sort actually worked (check the log).

Now look at examples values of SUBJECT from the two different datasets that you THINK are the same to see why SAS does not consider them the same.  Obvious things to look for:

 

Numeric variables.  The values differ by such a small amount that it is not visible when printed.  Print the values using HEX16. format to see the exact bit pattern used to represent the values.

 

Character variables.  Make sure the case of letters is the same.  Make sure that there are not leading spaces, Print the values using the $QUOTE. format.   Make sure that there are not hidden characters (such as tab, non-breaking space, carraige return) etc.  Print the values with the $HEX. format.  Make sure that you have not attached a format with too small a width so that displayed values are truncated, hiding difference in the full value.  Print the value without any format attached. Make sure the two datasets are using the same setting of the ENCODING= option.

Newtrix
Fluorite | Level 6

The variables in each dataset are 3-digit numbers ranging from 201-208 but are classified as characters, formatted $30. There are no leading or trailing spaces, no carriage returns, etc. Both were uploaded from TXT files with no ENCODING commands in the uploading Macro.

Tom
Super User Tom
Super User

What do you mean by "formatted $30"?

Did you attach the $30. format specification to the variable?

What LENGTH did you define for the variable?  If you did not explicitly define its length and instead first referenced the variable in a format statement that attached the $30. format to it then it will have a length of 30 instead of the 3 you need to store your values.

 

If the length is 30 then a three character string can by stored in 28 different ways inside of the 30 character value.  (With from 0 to 27 leading spaces). And if you only ever PRINT the values using ODS output (the "pretty" output) then the leading spaces will disappear and the value "201" will look exactly the same as the value "     201".

 

Try removing the leading spaces by using the LEFT() function.

 

Note: If you assign the numeric value 201 to character variable of length 30 it will have 27 leading spaces.  Also if you read in the values using the $CHAR informat instead of the $ informat then leading spaces will be preserved.

Newtrix
Fluorite | Level 6

I have since created a numeric variable for each set with the identical info, INFORMAT 5.

Sorted by the new variable, the merge still doesn't work. One gets stacked on the other.

Tom
Super User Tom
Super User

Please take the time to show a couple of examples of the values that "stack" (or perhaps interleave) instead of merging.

 

Example demonstrating the effect of leading spaces on character values.

data a;
  input subject $char5.;
  sourceA='FROM A';
cards;
  201
 202
203
;

data b;
  input subject $char5.;
  sourceB='FROM B';
cards;
201
202
203
;

data want;
 merge a b;
 by subject;
run;

proc print;
  format subject $quote.;
run;

Tom_0-1747861288775.png

 

Newtrix
Fluorite | Level 6

Thanks for your time on this. I did try changing to numeric (see post just above) with the same result. There are no leading or trailing characters - 3 digits, that's all. This is the first time I've ever had such issues with a merge.

Newtrix
Fluorite | Level 6

I meant "spaces" just now, not characters.

Tom
Super User Tom
Super User

@Newtrix wrote:

Thanks for your time on this. I did try changing to numeric (see post just above) with the same result. There are no leading or trailing characters - 3 digits, that's all. This is the first time I've ever had such issues with a merge.


Do the obvious checks again.  Make sure you are using MERGE and not SET. Make sure each step finished without errors or unexpected notes.  Make sure you are checking the datasets that are actually mentioned in the LOG.

 

Then pick a value that you think should be in both and try to find the observations with that value for each source.  If you cannot find them in one of the sources then check what value is actually there for that observation (perhaps you can select it by some other variable to avoid looking at all values).

 

For numeric values check the actual value by using the HEX16. format so you can see what it actually is.

 

Example

data a;
  do subject=200,300,400; output; end;
run;
data b;
  set a;
  subject+1E-10;
run;
data both;
  merge a(in=in1) b(in=in2);
  by subject;
  ina=in1; inb=in2;
run;

proc print;
run;
proc print;
 format subject hex16.;
run;

Result

Obs    subject    ina    inb

 1       200       1      0
 2       200       0      1
 3       300       1      0
 4       300       0      1
 5       400       1      0
 6       400       0      1

Obs             subject    ina    inb

 1     4069000000000000     1      0
 2     4069000000000DBE     0      1
 3     4072C00000000000     1      0
 4     4072C000000006DF     0      1
 5     4079000000000000     1      0
 6     40790000000006DF     0      1

 

 

 

 

Quentin
Super User

Pick a subject value that you know is in both datasets, and check that PROC PRINT can find that value.  So if the subject ID is 333, then try:

proc print data=AE.Log_AE;
  where subject='333';
run;

proc print data=AE.Treatment;
  where subject='333';
run;

Does that confirm that the subject exists in both datasets?

 

If so, please post the log from running:

data want;
  merge AE.Log_AE  (where=(subject='333') in=a)
             AE.Treatment (where=(subject='333') in=b)
  ;
  by subject;
  put (subject a b)(=) ;
run;
The Boston Area SAS Users Group (BASUG) is hosting an in person Meeting & Training on June 27!
Full details and registration info at https://www.basug.org/events.

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 962 views
  • 8 likes
  • 3 in conversation