Re: Selecting distinct rows by first date

ammarhm · Posted 11-21-2014 09:42 AM

EDITED: PLEASE SEE BELOW IN RED

Hi all

Could anyone please help me with this:

I have a data set that looks like this:

Person	Earning	Site	Date
1	12	A	2001-01-23
1	34	B	2001-02-03
1	23	B	2001-02-25
2	45	B	2001-03-04
2	32	A	2001-03-27
3	7	A	2001-04-04
3	34	B	2001-05-05
3	65	B	2001-05-26
3	33	A	2001-06-06
3	23	B	2001-07-02
4	33	B	2001-02-03
4	12	A	2001-03-05

What I am trying to do is the following

Person	Earning	Site	Date
1	Earning first date person 1	Site at first earning	First earning date person 1
2	Earning first date person 2	Site at first earning	First earning date person 2
3	Earning first date person 3	Site at first earning	First earning date person 3
4	Earning first date person 4	Site at first earning	First earning date person 4

This is what I have done:

proc sql;

create table want as

select distinct

Person,

Earning,

Site,

Date

from have

group by Person

having date=min(date);

Quit;

In other words, i am trying to select the distinct rows per person by the first earning date.

The problem is that there are still some duplicates. When I look at the new tables I see that the duplicates have the same "Person" but different "Site" which I really dont understand,

MY MISTAKE, ACTUALLY THE PROBLEM SEEMED TO BE THAT THE DUPLICATES HAD THE SAME DATE BUT DIFFERENT SITE, ANY WHAY OF SOLVING THIS (I DONT CARE ABOUT WHICH SITE WOULD BE SELECTED IN THAT CASE)

Any help is appreciated

Best regards

RW9 · Posted 11-21-2014 09:51 AM

Well, I would do:

proc sql;

create table WANT as

select A.*

from HAVE A

right join (select PERSON,min(DATE) as MDATE from HAVE) B

on A.PERSON=B.PERSON

and A.DATE=B.MDATE;

quit;

You could also do it with a having clause (although I have not checked this way):

proc sql;

create table WANT as

select *

from HAVE

having date=min(date);

quit;

ammarhm · Posted 11-21-2014 10:17 AM

Not sure if this will work, as I need the min date per person and not for all the group (?)

Best regards

RW9 · Posted 11-21-2014 10:29 AM

Yep, forgot the group by:

proc sql;

create table WANT as

select A.*

from HAVE A

right join (select PERSON,min(DATE) as MDATE from HAVE group by PERSON) B

on A.PERSON=B.PERSON

and A.DATE=B.MDATE;

quit;

ammarhm · Posted 11-21-2014 10:50 AM

@ RW9

Well, interesting way of doing it, will do I am sure, but it dose not seem very efficient way of solving it, especially when dealing with several million records

Best regards

RW9 · Posted 11-21-2014 11:58 AM

Maybe I am overthinking this. Why not:

proc sort data=have;

by person date;

run;

data want;

set have;

by person;

if first.person output;

run;

DBailey · Posted 11-21-2014 10:43 AM

proc sql;

create table WANT as

select A.*

from HAVE A

where a.date = (select min(date) from have where person = a.person);

quit;

ammarhm · Posted 11-21-2014 10:48 AM

Again, I need the first date by person, this code would in this case just select the first date in all the table

Best regards

DBailey · Posted 11-24-2014 10:29 AM

no...it uses a correlated subquery to tie the minimum to the person in the record. When it process a record, it executes the correlated subquery to find the minimum date for that person.

Ksharp · Posted 11-25-2014 08:38 AM

Yeah. But if min is also tied.

Didn't RW9's data step code work ?

Catch up on SAS Innovate 2026