EDITED: PLEASE SEE BELOW IN RED
Hi all
Could anyone please help me with this:
I have a data set that looks like this:
Person | Earning | Site | Date |
---|---|---|---|
1 | 12 | A | 2001-01-23 |
1 | 34 | B | 2001-02-03 |
1 | 23 | B | 2001-02-25 |
2 | 45 | B | 2001-03-04 |
2 | 32 | A | 2001-03-27 |
3 | 7 | A | 2001-04-04 |
3 | 34 | B | 2001-05-05 |
3 | 65 | B | 2001-05-26 |
3 | 33 | A | 2001-06-06 |
3 | 23 | B | 2001-07-02 |
4 | 33 | B | 2001-02-03 |
4 | 12 | A | 2001-03-05 |
What I am trying to do is the following
Person | Earning | Site | Date |
---|---|---|---|
1 | Earning first date person 1 | Site at first earning | First earning date person 1 |
2 | Earning first date person 2 | Site at first earning | First earning date person 2 |
3 | Earning first date person 3 | Site at first earning | First earning date person 3 |
4 | Earning first date person 4 | Site at first earning | First earning date person 4 |
This is what I have done:
proc sql;
create table want as
select distinct
Person,
Earning,
Site,
Date
from have
group by Person
having date=min(date);
Quit;
In other words, i am trying to select the distinct rows per person by the first earning date.
The problem is that there are still some duplicates. When I look at the new tables I see that the duplicates have the same "Person" but different "Site" which I really dont understand,
MY MISTAKE, ACTUALLY THE PROBLEM SEEMED TO BE THAT THE DUPLICATES HAD THE SAME DATE BUT DIFFERENT SITE, ANY WHAY OF SOLVING THIS (I DONT CARE ABOUT WHICH SITE WOULD BE SELECTED IN THAT CASE)
Any help is appreciated
Best regards
Well, I would do:
proc sql;
create table WANT as
select A.*
from HAVE A
right join (select PERSON,min(DATE) as MDATE from HAVE) B
on A.PERSON=B.PERSON
and A.DATE=B.MDATE;
quit;
You could also do it with a having clause (although I have not checked this way):
proc sql;
create table WANT as
select *
from HAVE
having date=min(date);
quit;
Not sure if this will work, as I need the min date per person and not for all the group (?)
Best regards
Yep, forgot the group by:
proc sql;
create table WANT as
select A.*
from HAVE A
right join (select PERSON,min(DATE) as MDATE from HAVE group by PERSON) B
on A.PERSON=B.PERSON
and A.DATE=B.MDATE;
quit;
@ RW9
Well, interesting way of doing it, will do I am sure, but it dose not seem very efficient way of solving it, especially when dealing with several million records
Best regards
Maybe I am overthinking this. Why not:
proc sort data=have;
by person date;
run;
data want;
set have;
by person;
if first.person output;
run;
proc sql;
create table WANT as
select A.*
from HAVE A
where a.date = (select min(date) from have where person = a.person);
quit;
Again, I need the first date by person, this code would in this case just select the first date in all the table
Best regards
no...it uses a correlated subquery to tie the minimum to the person in the record. When it process a record, it executes the correlated subquery to find the minimum date for that person.
Yeah. But if min is also tied.
Didn't RW9's data step code work ?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.