Programming the statistical procedures from SAS

Selecting distinct rows by first date

Reply
Regular Contributor
Posts: 163

Selecting distinct rows by first date

EDITED: PLEASE SEE BELOW IN RED

Hi all

Could anyone please help me with this:

I have a data set that looks like this:

                

PersonEarningSiteDate
112A2001-01-23
134B2001-02-03
123B2001-02-25
245B2001-03-04
232A2001-03-27
37A2001-04-04
334B2001-05-05
365B2001-05-26
333A2001-06-06
323B2001-07-02
433B2001-02-03
412A2001-03-05

What I am trying to do is the following

PersonEarningSiteDate
1Earning first date person 1Site at first earningFirst earning date person 1
2Earning first date person 2Site at first earningFirst earning date person 2
3Earning first date person 3Site at first earningFirst earning date person 3
4Earning first date person 4Site at first earningFirst earning date person 4

This is what I have done:

proc sql;

create table want as

select distinct

  Person,

  Earning,

  Site,

  Date

  from have

  group by Person

having date=min(date);

Quit;

In other words, i am trying to select the distinct rows per person by the first earning date.

The problem is that there are still some duplicates. When I look at the new tables I see that the duplicates have the same "Person" but different "Site" which I really dont understand,

MY MISTAKE, ACTUALLY THE PROBLEM SEEMED TO BE THAT THE DUPLICATES HAD THE SAME DATE BUT DIFFERENT SITE, ANY WHAY OF SOLVING THIS (I DONT CARE ABOUT WHICH SITE WOULD BE SELECTED IN THAT CASE)

Any help is appreciated

Best regards

Super User
Super User
Posts: 7,565

Re: Selecting distinct rows by first date

Well, I would do:

proc sql;

     create table WANT as

     select    A.*

     from       HAVE A

     right join (select PERSON,min(DATE) as MDATE from HAVE) B

     on          A.PERSON=B.PERSON

     and        A.DATE=B.MDATE;

quit;

You could also do it with a having clause (although I have not checked this way):

proc sql;

     create table WANT as

     select     *

     from       HAVE

     having    date=min(date);

quit;

Regular Contributor
Posts: 163

Re: Selecting distinct rows by first date

Not sure if this will work, as I need the min date per person and not for all the group (?)

Best regards

Super User
Super User
Posts: 7,565

Re: Selecting distinct rows by first date

Yep, forgot the group by:

proc sql;

     create table WANT as

     select    A.*

     from       HAVE A

     right join (select PERSON,min(DATE) as MDATE from HAVE group by PERSON) B

     on          A.PERSON=B.PERSON

     and        A.DATE=B.MDATE;

quit;

Regular Contributor
Posts: 163

Re: Selecting distinct rows by first date

@ RW9

Well, interesting way of doing it, will do I am sure, but it dose not seem very efficient way of solving it, especially when dealing with several million records

Best regards

Super User
Super User
Posts: 7,565

Re: Selecting distinct rows by first date

Maybe I am overthinking this.  Why not:

proc sort data=have;

     by person date;

run;

data want;

     set have;

     by person;

     if first.person output;

run;

Super Contributor
Posts: 578

Re: Selecting distinct rows by first date

proc sql;

     create table WANT as

     select    A.*

     from       HAVE A

     where a.date = (select min(date) from have where person = a.person);

quit;

Regular Contributor
Posts: 163

Re: Selecting distinct rows by first date

Again, I need the first date by person, this code would in this case just select the first date in all the table

Best regards

Super Contributor
Posts: 578

Re: Selecting distinct rows by first date

no...it uses a correlated subquery to tie the minimum to the person in the record.  When it process a record, it executes the correlated subquery to find the minimum date for that person.

Super User
Posts: 9,769

Re: Selecting distinct rows by first date

Yeah. But if min is also tied.

Didn't RW9's data step code work ?

Ask a Question
Discussion stats
  • 9 replies
  • 848 views
  • 3 likes
  • 4 in conversation