Help using Base SAS procedures

Vertical data manipulation

Accepted Solution Solved
Reply
Contributor
Posts: 45
Accepted Solution

Vertical data manipulation

Hello!

I am new to SAS and even though I use it daily, I still haven't been able to have the basics down. I have a data set with repetitive values and I would like to get rid of them. For example, I would like to transform the first table below into the second table below. I am trying to count each employee only one time. Does anybody know how to delete observations vertically?

observationemployeeYSBSPay
117100000
217100000
317100000
428120000
528120000
628120000
728120000
828120000
9311110000
10411123000

observationemployeeYSBSPay
117100000
228120000
3311110000
4411123000



Accepted Solutions
Solution
‎07-30-2013 10:23 AM
Super Contributor
Posts: 339

Re: Vertical data manipulation

the 2 simple approaches when ALL variables of records are idential to kill such duplicates are

proc sql;

     create table want as

     select distinct *

     from have;

quit;

distinct * only selects the first occurence of fully identical records, sorted or not.

A sas procedure alternative is

proc sort data=have out=want nodups;

     by employee ysbs;

run;

If you don't use the out= newdataset option, this procedure will alter your existing dataset and remove the duplicates in it making it unable to recover. If it's an intermediate dataset it can be worthwhile to do so to save memory. The SQL approach has the advantage of not wasting computing time on re-sorting the data.

I'm sure other people will propose different approaches, this is how I do it.

Vincent

View solution in original post


All Replies
Solution
‎07-30-2013 10:23 AM
Super Contributor
Posts: 339

Re: Vertical data manipulation

the 2 simple approaches when ALL variables of records are idential to kill such duplicates are

proc sql;

     create table want as

     select distinct *

     from have;

quit;

distinct * only selects the first occurence of fully identical records, sorted or not.

A sas procedure alternative is

proc sort data=have out=want nodups;

     by employee ysbs;

run;

If you don't use the out= newdataset option, this procedure will alter your existing dataset and remove the duplicates in it making it unable to recover. If it's an intermediate dataset it can be worthwhile to do so to save memory. The SQL approach has the advantage of not wasting computing time on re-sorting the data.

I'm sure other people will propose different approaches, this is how I do it.

Vincent

Contributor
Posts: 45

Re: Vertical data manipulation

It worked! Smiley Happy

Thank you so much!

Super User
Super User
Posts: 6,498

Re: Vertical data manipulation

Note that NODUPS does NOT do the same thing as DISTINCT * in SQL unless you include ALL variables in the sort order.

PROC SORT only compares adjacent records for duplication, so if there is an intervening record that is different for one of the non BY variables then the non adjacent duplicate records will not be removed.

If you do not care about the non BY variables and want to just get a single observation per by group then use the NODUPKEY option instead.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 176 views
  • 0 likes
  • 3 in conversation