BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Greek
Obsidian | Level 7

Hello!

I am new to SAS and even though I use it daily, I still haven't been able to have the basics down. I have a data set with repetitive values and I would like to get rid of them. For example, I would like to transform the first table below into the second table below. I am trying to count each employee only one time. Does anybody know how to delete observations vertically?

observationemployeeYSBSPay
117100000
217100000
317100000
428120000
528120000
628120000
728120000
828120000
9311110000
10411123000

observationemployeeYSBSPay
117100000
228120000
3311110000
4411123000


1 ACCEPTED SOLUTION

Accepted Solutions
Vince28_Statcan
Quartz | Level 8

the 2 simple approaches when ALL variables of records are idential to kill such duplicates are

proc sql;

     create table want as

     select distinct *

     from have;

quit;

distinct * only selects the first occurence of fully identical records, sorted or not.

A sas procedure alternative is

proc sort data=have out=want nodups;

     by employee ysbs;

run;

If you don't use the out= newdataset option, this procedure will alter your existing dataset and remove the duplicates in it making it unable to recover. If it's an intermediate dataset it can be worthwhile to do so to save memory. The SQL approach has the advantage of not wasting computing time on re-sorting the data.

I'm sure other people will propose different approaches, this is how I do it.

Vincent

View solution in original post

3 REPLIES 3
Vince28_Statcan
Quartz | Level 8

the 2 simple approaches when ALL variables of records are idential to kill such duplicates are

proc sql;

     create table want as

     select distinct *

     from have;

quit;

distinct * only selects the first occurence of fully identical records, sorted or not.

A sas procedure alternative is

proc sort data=have out=want nodups;

     by employee ysbs;

run;

If you don't use the out= newdataset option, this procedure will alter your existing dataset and remove the duplicates in it making it unable to recover. If it's an intermediate dataset it can be worthwhile to do so to save memory. The SQL approach has the advantage of not wasting computing time on re-sorting the data.

I'm sure other people will propose different approaches, this is how I do it.

Vincent

Greek
Obsidian | Level 7

It worked! Smiley Happy

Thank you so much!

Tom
Super User Tom
Super User

Note that NODUPS does NOT do the same thing as DISTINCT * in SQL unless you include ALL variables in the sort order.

PROC SORT only compares adjacent records for duplication, so if there is an intervening record that is different for one of the non BY variables then the non adjacent duplicate records will not be removed.

If you do not care about the non BY variables and want to just get a single observation per by group then use the NODUPKEY option instead.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 693 views
  • 0 likes
  • 3 in conversation