BookmarkSubscribeRSS Feed
GN0001
Barite | Level 11
Hello team,
I have duplicated rows and I need to remove them. I put.
Proc sort data=mydata nodupkey;
By _all_;
Run;
It doesn’t remove the observations
Regards
Blueblue
Blue Blue
5 REPLIES 5
ballardw
Super User

@HB wrote:

I think you need out=

 

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/n1ab4sjiq6wxkvn1npo1pq1cvhxt.htm 


Not needed though a good idea. See this log:

149  /* make a temporary data set to test the nodupkey with*/
150  data class;
151     set sashelp.class;
152  run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.CLASS has 19 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


153
154  proc sort data=class nodupkey;
155     by sex age;
156  run;

NOTE: There were 19 observations read from the data set WORK.CLASS.
NOTE: 8 observations with duplicate key values were deleted.
NOTE: The data set WORK.CLASS has 11 observations and 5 variables.

I used the Sashelp.class data set so others can test the code, making a copy in the Work library in the data step.

Then used Proc Sort with the Nodupkey option and no Out= option.

Notice that the resulting work.class data set now has 11 observations, not the 19 from copying Sashelp.class.

Proc sort will happily corrupt your source data set. So If you think that you may ever want to use the data from work.class from before the source then you should use an Out=  option to create a new data set.

 

 

HB
Barite | Level 11 HB
Barite | Level 11
"Proc sort will happily corrupt your source data"

LOL.
ballardw
Super User

When you use BY _ALL_ there is a chance that values that look the same at first glance actually aren't. Formats for numeric values typically round. So a value of 1.999999999 and a format of BEST5. the displayed value will be 2.

 

Sometimes you have leading spaces in character values and depending on how you display them you can think they are the same but the leading space is significant when used as a By variable. Formats also come into play with character values as well. If the Format is $5. And you have values Blackstone and Blackheart they both, because of the format, display as Black. So you may want to check character variable length and format lengths.

 

Or the formats that may create groups.

 

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1216 views
  • 2 likes
  • 4 in conversation