BookmarkSubscribeRSS Feed
Walternate
Obsidian | Level 7

Hi,

 

I have a dataset at the person level but with duplicate rows. It has ID and character variables A, B, and C. I wanted unique rows, so I ran this code:

 

proc sort nodupkey data=have;

by ID char_A char_B char_C;

run;

 

It worked without producing an error message, but when looking through the data I noticed that at least one duplicate row remained.

 

ID   Char_A        Char_B      Char_C

1         abc- d       def_g         ghi

1         abc- d       def_g         ghi

 

I'm not sure why this row remained in the data, as it looks like most of the duplicate rows were correctly deleted. Is there a way to troubleshoot and figure out whether there's some minor difference between the character variables or some other reason that the duplicate row wasn't removed?

 

Thanks!

 

 

 

1 REPLY 1
data_null__
Jade | Level 19

Display the values of the BY variables for the suspect observations using $HEX format, I expect you will find they are different.  There is probably a character that is displayed as a space but is not, or you have a different number of leading spaces.

Catch up on SAS Innovate 2026

Nearly 200 sessions are now available on demand with the SAS Innovate Digital Pass.

Explore Now →
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 1369 views
  • 1 like
  • 2 in conversation