* basic options;
OPTIONS MISSING=' ' SORTPGM=BEST ERRORS=1 LINESIZE=132 CENTER
PAGESIZE=42 NUMBER OBS=MAX;
data last_period;
input id hours;
cards;
1 1
2 1
3 1
4 2
5 3
6 3
;
data this_period;
input id hours;
cards;
1 1
2 2
3 3
5 5
6 5
;
* I want to find the average gain in hours for each id that shows up in both files;
* for this data, ID 1 has gained 0 hours from last period to this period, ID 2 has gained 1 hours, ID 3 has gained 2 hours;
* 4 is not in the second file and should be ignored;
* ID's 5 and 6 have both gained 2 hours;
* average gain should be ( 0 + 1 + 2 + 2 + 2 ) / 5 = 1.4
* using PROC SQL I can do;
proc sql;
create table average_gain as
select avg(b.hours-a.hours) as hours_gained
from last_period a inner join
this_period b on
a.id = b.id;
quit;
proc print data=average_gain noobs;
var hours_gained;
title1 'Average additonal hours gained by PROC SQL';
run;
* this works perfectly and, exactly as predicted, prints out 1.4;
* I want to do the same thing without PROC SQL;
* I think I should be able to do better than the following;
proc sort data=last_period;
by id;
run;
proc sort data=this_period;
by id;
run;
data this_period2;
set this_period;
keep id hours1;
hours1 = hours;
run;
data average_gain2;
merge last_period (in = x) this_period2 (in = y);
by id;
if x and y;
run;
data average_gain3;
set average_gain2;
hours_gained = hours1 - hours;
run;
proc means data= average_gain3 mean;
var hours_gained;
title1 'Average additonal hours gained by PROC MEANS';
run;
* prints out 1.4000000
* Am I doing this efficiently? How would you do it? Any ideas for me?;
<after the sort which is needed for merge>
data combined;
merge last_period (in=x) this_period(in=y rename=(hours=hours1));
by id;
if x and y;
hours_gained=hours1-hours;
run;
then proc means;
<after the sort which is needed for merge>
data combined;
merge last_period (in=x) this_period(in=y rename=(hours=hours1));
by id;
if x and y;
hours_gained=hours1-hours;
run;
then proc means;
Okay. That's a bunch better.
I woudl have never figured out this part:
this_period(in=y rename=(hours=hours1));
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.