I am learning this program for the first time for one of my biology classes. My professor hasn't really explained much to us. He just puts the codes in sas and the whole class rushes to scribble down the codes. Needless to say I'm completely lost. I'm not sure what each of these codes are for or if I'm even using them correctly. Is there any current list of sas codes for biology with explanations as to why we use the each specific test? I have a short list of tests he mentioned in passing during class.
Tests:
One Sample T-test
2 Sample T-test
Paired T-test
Wilcoxon Test
ANOVA 1 and 2 way
Factoral ANOVA
Mannwhitney U test
Chi-squared
Fischer's Exact
Tukey HSD and LSD
Regression
Correlation - Spearman
I've been looking though the SAS help webside and I'm still not finding all the answers I need. I have this final exam on Wednesday, if anyone could help point me in a direction or knows the answer it would be appreciated. Thank you!
You can find at least some of these references in:
These aren't specific to biology at all, but each stat listed does have a link to the SAS procedure or technique you can use to calculate it.
If you line up each test with a SAS Procedure you could read the overview section and put together a list of definitions.
I would recommend the STATSOFT textbook though, to get an idea of what each test is quickly. Have you taken a statistics course at Unoversity? A first level stats course covers most of what you're mentioning.
Most are under Basic and Descriptive statistics but you can also look them up in the Glossary.
http://www.statsoft.com/Textbook/Basic-Statistics
This is a lot to learn for Wednesday...Good Luck
I did take stats at my university but our professor pushed more for excel or doing the problems by hand. I have a basic foundation of terms and procedures. The SAS system is completely new to me and some of the coding I'm not quite sure why it's needed.
Thank you for all the responses with the links. I am looking through them now. If there was a better background from my professor I might understand the output a little better.
Another way to do this is to look at what PROCs your code contains and review the documentation for those procedures. Specifically, walking through the examples is a good idea. I've tried to identify the relevant tests for you below, but reviewing the code samples is a better bet.
Tests:
PROC TTEST
One Sample T-test
2 Sample T-test
Paired T-test
PROC ANOVA
ANOVA 1 and 2 way
Factoral ANOVA
PROC NPAR1WAY
Mannwhitney U test
PROC FREQ
Chi-squared
Fischer's Exact
Not sure??
Tukey HSD and LSD
PROC REG/GLM?
Regression
PROC CORR
Correlation - Spearman
That is extremely helpful! In the part of the code that our professor posted it included:
Data assumps;
set assumps;
absresid = abs(resid);
Run;
Proc plot data = assumps vpercent = 50 50;
plot resid * pred
absresid * pred / VREF = 0
Quit;
proc univariate normal plot data= assumps;
var resid;
quit;
This is all following a proc GLM function including the statement [output out = assumps p=pred r=resid";] I know the proc univariate function is supposed to help determine Normality, but Im not sure what these functions actually do. There are nowhere in my notes and just appeared oneday in the SAS code he brought up so we could copy. Are these codes supposed to be included in ever GLM function?
Learning to use the documentation is worth some time, in general when learning the language.
PROC GLM - OUTPUT statement.
OUT=<SAS-data-set> -> This is the name of the data set created.
P(PREDICTED) = <variable-name> -> name of the variable created that corresponds to the residual value
R(RESIDUAL) = <variable-name> -> name of the variable created that corresponds to the residual value
So the GLM procedure creates a dataset called ASSUMPS that contains at least two variables, one for the predicted value and one for the residual value of the model.
A data step can be used for a variety of things, in this case it's creating a variable using the ABS() function.
ABS() -> Returns the absolute value.
So in this step you're creating a variable called ABSRESID that contains the absolute value of the residual.
Data assumps;
set assumps;
absresid = abs(resid);
Run;
Are you sure it was PROC PLOT and not SGPLOT? I can't find a reference to PROC PLOT but believe it was used back in the day.
This procedure is creating two graphs, one for residual by predicted and one for the absolute value of the residual by the predicted value.
It's also missing a semi-colon.
Proc plot data = assumps vpercent = 50 50;
plot resid * pred absresid * pred / VREF = 0;
Quit;
The next step, PROC UNIVARIATE is looking to see if the variable RESID is normally distributed. This is the assumption for linear regression - the error is normally distributed.
proc univariate normal plot data= assumps;
var resid;
quit;
If you need to do this in real life, I would consider the following. This would produce a diagnostic and residuals plots that would help both assess normality and any issues with your regressions - ie outliers.
proc glm data=sashelp.class plots=(diagnostics residuals);
model weight = height age;
run;quit;
If you want help understanding the output you can look at either the examples in the PROC or at this page, which has annotated output for many common procs you're looking at, based on your questions.
A warning about coding following someone else's examples:
@dleighton wrote:
That is extremely helpful! In the part of the code that our professor posted it included:
Data assumps;
set assumps;
absresid = abs(resid);
Run;
Please be extremely cautious, I would recommend almost never, use the code structure Data datasetname; Set datasetname;
The step replaces the original data. Depending on possible code logic errors (high probability when learning a programming language) or typos it is very easy to destroy your original data. If you use this approach you should make sure that you can always get back to the original data. One example from code I inherited involved recodeing data from a 1 and 2 value coding to a 0 and 1.
data example; set example; var = var -1; /* other code was here*/ run;
Looks simple and much like your professor's example. However the code had a change needed in the "other code" indicated above. So the change was made and rerun. Now the values that had been 2, after the FIRST pass were reduced to 1 and with the second pass further reduced to 0. So all values of var became either missing or 0.
You have been warned. Less obvious are if the codes switch 0 to 1 and 1 to 0 (which I also inherited). That resulted in the rate of a reportable item changing from 56 percent to 44 percent and everyone in the organization thought there had been a drastic change from the previous year 55 percent.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.