Slope Calculation

Frequent Contributor
Posts: 94

Slope Calculation

Attached file is a small subset of my large data file.  It is the data for one employee over months.

Column A = Month the work was completed

Column B = An integer assigned to the month - this is used to calculate the slope via excel

Column C = Employee #.

Column L = The slope I calculated in Excel.  The formula is:  =SLOPE(IF(\$C\$2:\$C\$25000=C2,\$M\$2:\$M\$25000),IF(\$C\$2:\$C\$25000=C2,\$B\$2:\$B\$25000))

As I mentioned the excel file is quite large. Thus, it is a very slow to try to calculate the slope with these many data rows and I hope to be able to do this in SAS EG.  I am looking for help / direction on how to calculate the SLOPE in SAS EG.

There are up to 12-months worth of data for each employee.  I would like to be able to write a program that says, for each employee_ID, take the monthly_weighted_approval_rate for all the months the employee worked and calculate the slope, and, if possible, the R-Square.

I researched the forum and found PROC REG and PROC GLM, but frankly I do not know where to how to start with these.

Any help is appreciated!

Thanks,

Jenne

Posts: 5,614

Re: Slope Calculation

Your workbook has only 70 lines of non missing data. This is how to do the regressions with proc reg :

libname xl excel "&SASForum.\datasets\slope.xls";

/* Keep a single value of Monthly_Weighted_Approval per month */
proc sql;
create table test as
select Emp_ID, Event_month, mean(Monthly_Weighted_Approval) as MWA
from xl.'sheet2\$'n
where Monthly_Weighted_Approval is not missing
group by Emp_ID, Event_month;
quit;

libname xl clear;

/* Do the regression, request output statistics in PE and FS datasets */
proc reg data=test;
by Emp_ID notsorted;
model MWA = Event_Month;
ods output parameterEstimates=PE FitStatistics=FS;
run;

/* Join statistics into a table.
Multiply the slope by 30 to express it per month */
proc sql;
create table stats as
select PE.Emp_ID, PE.Estimate*30 as slope, PE.ProbT, FS.nValue2 as AdjRSquare
from PE, FS
where pe.Emp_ID=fs.Emp_ID and PE.Variable="Event_Month" and FS.Label2="Adj R-Sq";
select * from stats;
quit;

PG

Corrected by PG (replaced divide by multiply)

PG
Frequent Contributor
Posts: 94

Re: Slope Calculation

Thank you PGStats!  Your response is very helpful and gives me the other stats I need.

However, when ran, I am not getting the same answer had I calculated manually via excel formula.

1.    The slope in table STATS is different compared to the result using the excel formula:  =SLOPE(IF(\$C\$2:C\$45=C2,\$L\$2:\$L\$45),IF(\$C\$2:\$C\$45=C2,\$B\$2:\$B\$45))

• Using the Excel formula, slope = -0.0185881530393469
• Using the code above, the slope = -0.000023351791:

 OPR_ID slope Probt AdjRSquare abc123 -0.000023351791 0.04577284 0.4313875

2.  "Divide the slope by 30 to express it per month" -  Why is this done?

Posts: 5,614

Re: Slope Calculation

Oops, major oops! Sorry. It should be MULTIPLIED by 30. I already did the correction in the code pasted above.

Variable Event_month is imported as a SAS date which is expressed in days. The slope from the MWA = Event_Month regression is thus expressed in MWA per day. MWA per month is thus 30 times as much. You shouldn't expect the slopes to be exactly the same as your Excel regression is on a varying number of observations every month. In my suggested code, only one value is kept per month. The two slopes should be close though.

PG

PG
Frequent Contributor
Posts: 94

Re: Slope Calculation

Hi PG - This worked!  Thank you!  Few more follow up questions:

2.  What is the code to calculate R-Sqare instead of AR2?

3.  I got this warning when I ran the /* Do the regression, request output statistics in PE and FS datasets */ portion of the code.  Thoughts?

WARNING: The range of variable Event_Month is so small relative to its mean that there may be loss of accuracy in the computations.

You may need to rescale the variable to have a larger value of RANGE/abs(MEAN), for example, by using PROC STANDARD M=0;

Thank you again!!!

Posts: 2,127

Re: Slope Calculation

Jen,

Adjusted R-Squared is the R-squared penalized for the number of variable in the model.  When you have just one predictor, I recall it as being the same as R-Squared.

The formula's for both are in the PROC REG documentation on support.sas.com.

BTW, since you posted this to the EGuide forum, you can do the pieces that PG described using the EGuide GUI as well as with code.

Doc Muhlbaier

Duke

Frequent Contributor
Posts: 94

Re: Slope Calculation

Hi Duke,

Thanks for the direction to the support.sas.com documentation.

Using the code that PG described, I was able to get a slope very close to the one calculated via excel.  However, the R-squared I got from the Linear Regression via excel and the Adjusted R-Squared from PG's code are very different.  I will see if I can find the code in the documentation and then do a comparison.

I haven't tried the process via the EG GUI interface yet.  My project process is via the GUI except the slope calculation.  I am on a time crunch to get this done for work, but will try it via the GUI later in the week and let you know.

Thanks!

Frequent Contributor
Posts: 94

Re: Slope Calculation

Hi Duke - Did I mess up and should have posted my question to a different forum?

Thanks!

Posts: 2,127

Re: Slope Calculation

Jen,

On the forum question, your question fit here or it could have fit on the Stat Procedures forum.  I made the side comment because many analysts (as opposed to stat programmers) need to use the GUI because they aren't very knowledgeable coders.  EGuide is getting more "programmer friendly," though you will see from some other posts that many think it has a ways to go.

Doc

Posts: 5,614

Re: Slope Calculation

I guess question 1 was already answered. As for question 2 and 3, the following version including calculations with and without duplicate

monthly weighted approval rates should answer :

libname xl excel "&SASForum.\datasets\slope.xls";

/* Flag the first observation of every month (variable first).
Express the month as the number of months since the beginning of the century to avoid
loss of accuracy */

data test(keep=Emp_ID month Monthly_Weighted_Approval first);
set xl.'sheet2\$'n;
by Event_month notsorted;
if Monthly_Weighted_Approval ne .;
first = first.Event_month;
month = intck("MONTH", '01JAN2000'd, Event_Month);
run;

libname xl clear;

/* Do the regression on a single observation per month,
request output statistics in PEfirst and FSfirst datasets */
proc reg data=test(where=(first)) plots=none;
by Emp_ID notsorted;
model Monthly_Weighted_Approval = month;
ods output parameterEstimates=PEfirst FitStatistics=FSfirst;
run;

/* Do the regression on all duplicate observations,
request output statistics in PEall and FSall datasets */
proc reg data=test plots=none;
by Emp_ID notsorted;
model Monthly_Weighted_Approval = month;
ods output parameterEstimates=PEall FitStatistics=FSall;
run;

/* Join statistics into tables. */
proc sql;
create table statsFirst as
select PE.Emp_ID, PE.Estimate as slope, PE.ProbT, FS.nValue2 as RSquare
from PEfirst as PE, FSfirst as FS
where pe.Emp_ID=fs.Emp_ID and PE.Variable="Month" and FS.Label2="R-Square";
create table statsAll as
select PE.Emp_ID, PE.Estimate as slope, PE.ProbT, FS.nValue2 as RSquare
from PEall as PE, FSall as FS
where pe.Emp_ID=fs.Emp_ID and PE.Variable="Month" and FS.Label2="R-Square";

title "One observation per month";
select * from statsFirst;
title "All duplicate observations";
select * from statsAll;
quit;

PG

PG
Discussion stats
• 9 replies
• 4407 views
• 6 likes
• 3 in conversation