Solved: Vertical data manipulation

Greek · Posted 07-30-2013 10:10 AM

Hello!

I am new to SAS and even though I use it daily, I still haven't been able to have the basics down. I have a data set with repetitive values and I would like to get rid of them. For example, I would like to transform the first table below into the second table below. I am trying to count each employee only one time. Does anybody know how to delete observations vertically?

observation	employee	YSBS	Pay
1	1	7	100000
2	1	7	100000
3	1	7	100000
4	2	8	120000
5	2	8	120000
6	2	8	120000
7	2	8	120000
8	2	8	120000
9	3	11	110000
10	4	11	123000

observation	employee	YSBS	Pay
1	1	7	100000
2	2	8	120000
3	3	11	110000
4	4	11	123000

Vince28_Statcan · Posted 07-30-2013 10:23 AM

the 2 simple approaches when ALL variables of records are idential to kill such duplicates are

proc sql;

create table want as

select distinct *

from have;

quit;

distinct * only selects the first occurence of fully identical records, sorted or not.

A sas procedure alternative is

proc sort data=have out=want nodups;

by employee ysbs;

run;

If you don't use the out= newdataset option, this procedure will alter your existing dataset and remove the duplicates in it making it unable to recover. If it's an intermediate dataset it can be worthwhile to do so to save memory. The SQL approach has the advantage of not wasting computing time on re-sorting the data.

I'm sure other people will propose different approaches, this is how I do it.

Vincent

View solution in original post

Vince28_Statcan · Posted 07-30-2013 10:23 AM

the 2 simple approaches when ALL variables of records are idential to kill such duplicates are

proc sql;

create table want as

select distinct *

from have;

quit;

distinct * only selects the first occurence of fully identical records, sorted or not.

A sas procedure alternative is

proc sort data=have out=want nodups;

by employee ysbs;

run;

If you don't use the out= newdataset option, this procedure will alter your existing dataset and remove the duplicates in it making it unable to recover. If it's an intermediate dataset it can be worthwhile to do so to save memory. The SQL approach has the advantage of not wasting computing time on re-sorting the data.

I'm sure other people will propose different approaches, this is how I do it.

Vincent

Greek · Posted 07-30-2013 10:31 AM

It worked!

Thank you so much!

Tom · Posted 07-30-2013 11:19 AM

Note that NODUPS does NOT do the same thing as DISTINCT * in SQL unless you include ALL variables in the sort order.

PROC SORT only compares adjacent records for duplication, so if there is an intervening record that is different for one of the non BY variables then the non adjacent duplicate records will not be removed.

If you do not care about the non BY variables and want to just get a single observation per by group then use the NODUPKEY option instead.

Vertical data manipulation

Re: Vertical data manipulation

Re: Vertical data manipulation

Re: Vertical data manipulation

Re: Vertical data manipulation

Vertical data manipulation

Re: Vertical data manipulation

Re: Vertical data manipulation

Re: Vertical data manipulation

Re: Vertical data manipulation

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away