Hi all
Could anyone please help me with this:
I have a data set that looks like this:
Person | Earning | Date |
---|---|---|
1 | 12 | 2001-01-23 |
1 | 34 | 2001-02-03 |
1 | 23 | 2001-02-25 |
2 | 45 | 2001-03-04 |
2 | 32 | 2001-03-27 |
3 | 7 | 2001-04-04 |
3 | 34 | 2001-05-05 |
3 | 65 | 2001-05-26 |
3 | 33 | 2001-06-06 |
3 | 23 | 2001-07-02 |
4 | 33 | 2001-02-03 |
4 | 12 | 2001-03-05 |
What I am trying to do is the following
Person | Sum Earning | Earning Period |
---|---|---|
1 | total sum for person 1 | Last earning date - First earning day person 1 |
2 | total sum for person 2 | Last earning date - First earning day person 2 |
3 | total sum for person 3 | Last earning date - First earning day person 3 |
4 | total sum for person 4 | Last earning date - First earning day person 4 |
I a guess this would be done through arrays but I am really clueless....
Could anyone please help me with the code??
Kind regards
ammarhm
I hope this is what you need as the output.
data have;
input person earning Date;
format Date yymmdd10.;
informat Date yymmdd10.;
datalines;
1 12 2001-01-23
1 34 2001-02-03
1 23 2001-02-25
2 45 2001-03-04
2 32 2001-03-27
3 7 2001-04-04
3 34 2001-05-05
3 65 2001-05-26
3 33 2001-06-06
3 23 2001-07-02
4 33 2001-02-03
4 12 2001-03-05
;
run;
proc sql;
create table want1 as
select distinct person, sum(earning) as sum_earning,max(date)-min(date) as earning_period
from have
group by person;
quit;
proc sql;
create table want2 as
select distinct person, earning as last_earning
from have
group by person
having date=max(date);
quit;
data want;
merge want1(in=a) want2(in=b);
by person;
if a=b;
run;
You don't need arrays.
Great one data_null
Much appreciated
Just another question
Would you be so kind and tell me how to add another column to the output dataset showing the last earning per person?
Kind regards
To add the last earning date column
data have;
input person earning Date;
format Date yymmdd10.;
informat Date yymmdd10.;
datalines;
1 12 2001-01-23
1 34 2001-02-03
1 23 2001-02-25
2 45 2001-03-04
2 32 2001-03-27
3 7 2001-04-04
3 34 2001-05-05
3 65 2001-05-26
3 33 2001-06-06
3 23 2001-07-02
4 33 2001-02-03
4 12 2001-03-05
;
run;
Edited:
proc sql;
create table want1 as
select distinct person, sum(earning) as sum_earning,max(date)-min(date) as earning_period,
max(date) as max_date format yymmdd10.
from have
group by person;
quit;
Thanks Hima
I was looking into having the last earning (not the last earning date)
Any suggestions? The proc sql is a great option
Kind regards
You mean the last earning for person 1 is 23 person 2 is 32 etc...
Yes exactly, thanks in advance
The real reason is that I want to modify the table so I would get the sum of all earnings per person excluding the last one, so I thought I might do a sum(earning)-last earning
you get the point, i am sure
I hope this is what you need as the output.
data have;
input person earning Date;
format Date yymmdd10.;
informat Date yymmdd10.;
datalines;
1 12 2001-01-23
1 34 2001-02-03
1 23 2001-02-25
2 45 2001-03-04
2 32 2001-03-27
3 7 2001-04-04
3 34 2001-05-05
3 65 2001-05-26
3 33 2001-06-06
3 23 2001-07-02
4 33 2001-02-03
4 12 2001-03-05
;
run;
proc sql;
create table want1 as
select distinct person, sum(earning) as sum_earning,max(date)-min(date) as earning_period
from have
group by person;
quit;
proc sql;
create table want2 as
select distinct person, earning as last_earning
from have
group by person
having date=max(date);
quit;
data want;
merge want1(in=a) want2(in=b);
by person;
if a=b;
run;
Yes that can be done with PROC SUMMARY IDGROUP.
NULL ,
You can use RANGE as well.No need to overwrite the table .
proc summary data=person nway; class person; output out=summary sum(earning)= range(date)= /autoname; run;
Xia Keshan
Good point. I have modified.
proc sort data=have;
by person date;
run;
data want(drop=date earning);
set have;
by person date;
retain first_earning_date;
format first_earning_date last_earning_date ddmmyy10.;
if first.person then do;
sum_earning=0;
first_earning_date=date;
end;
sum_earning+earning;
if last.person then do;
last_earning_date=date;
output;
end;
run;
proc print data=want;
run;
data have;
input person earning Date;
format Date yymmdd10.;
informat Date yymmdd10.;
datalines;
1 12 2001-01-23
1 34 2001-02-03
1 23 2001-02-25
2 45 2001-03-04
2 32 2001-03-27
3 7 2001-04-04
3 34 2001-05-05
3 65 2001-05-26
3 33 2001-06-06
3 23 2001-07-02
4 33 2001-02-03
4 12 2001-03-05
;
run;
proc sql;
create table want as
select distinct person, sum(earning) as sum_earning,max(date)-min(date) as earning_period
from have
group by person;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.