Help using Base SAS procedures

Getting one record from near duplicate records

Reply
N/A
Posts: 0

Getting one record from near duplicate records

Hi All

I have a dataset that has near duplicate records - not exactly duplicates - take for example

1234 PHay ABSS EFGH $20524 20Jun2008
1234 PHay ABCD EFGH $11000 25Jun2008

Now, what I am trying to do is take only the latest record by date (in this case I want to retreive only the second field) - how can I acheive this?

I tried the Max() SQL Function, but then I need to use a function on all of the columns, as I need to get all the columns.

If I used MAX(Date), MAX(Salary), etc I would get back the first records' $20524 value, which is not correct.

Any help much appreciated.

Thanks!! Message was edited by: sdcruz
Super User
Posts: 5,256

Re: Getting one record from near duplicate records

If you want to use SQL, here's on solution:

26 proc sql;
27 select id, name, str1, str2, salary, date
28 from salary
29 group by id
30 having date = max(date);
NOTE: The query requires remerging summary statistics back with the original data.
31 quit;

As you see, this creates one summary query that is merged with original query, so this might be the best method if your data is huge. Other methods could involve two subsequent PROC SORT, the second one using NODUPKEY on your id column.

/Linus
Data never sleeps
Respected Advisor
Posts: 3,892

Re: Getting one record from near duplicate records

That should do the job:

proc sort data=yourdata;
by id date;
run;

data want;
set yourdata;
by id date;
if last.id then output;
run;
N/A
Posts: 0

Re: Getting one record from near duplicate records

Hi Guys

Thanks alot for your help - yes I resolved this by using the PROC SQL procedure as outlined by Linus.

Thanks again - much appreciated!

Cheers
Shelton.
Ask a Question
Discussion stats
  • 3 replies
  • 121 views
  • 0 likes
  • 3 in conversation