BookmarkSubscribeRSS Feed
GN0001
Barite | Level 11
Hello team,
I have duplicated rows and I need to remove them. I put.
Proc sort data=mydata nodupkey;
By _all_;
Run;
It doesn’t remove the observations
Regards
Blueblue
Blue Blue
5 REPLIES 5
ballardw
Super User

@HB wrote:

I think you need out=

 

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/n1ab4sjiq6wxkvn1npo1pq1cvhxt.htm 


Not needed though a good idea. See this log:

149  /* make a temporary data set to test the nodupkey with*/
150  data class;
151     set sashelp.class;
152  run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.CLASS has 19 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


153
154  proc sort data=class nodupkey;
155     by sex age;
156  run;

NOTE: There were 19 observations read from the data set WORK.CLASS.
NOTE: 8 observations with duplicate key values were deleted.
NOTE: The data set WORK.CLASS has 11 observations and 5 variables.

I used the Sashelp.class data set so others can test the code, making a copy in the Work library in the data step.

Then used Proc Sort with the Nodupkey option and no Out= option.

Notice that the resulting work.class data set now has 11 observations, not the 19 from copying Sashelp.class.

Proc sort will happily corrupt your source data set. So If you think that you may ever want to use the data from work.class from before the source then you should use an Out=  option to create a new data set.

 

 

HB
Barite | Level 11 HB
Barite | Level 11
"Proc sort will happily corrupt your source data"

LOL.
ballardw
Super User

When you use BY _ALL_ there is a chance that values that look the same at first glance actually aren't. Formats for numeric values typically round. So a value of 1.999999999 and a format of BEST5. the displayed value will be 2.

 

Sometimes you have leading spaces in character values and depending on how you display them you can think they are the same but the leading space is significant when used as a By variable. Formats also come into play with character values as well. If the Format is $5. And you have values Blackstone and Blackheart they both, because of the format, display as Black. So you may want to check character variable length and format lengths.

 

Or the formats that may create groups.

 

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 480 views
  • 2 likes
  • 4 in conversation