Re: SAS Code list and explanations

dleighton · Posted 12-11-2016 11:03 AM

I am learning this program for the first time for one of my biology classes. My professor hasn't really explained much to us. He just puts the codes in sas and the whole class rushes to scribble down the codes. Needless to say I'm completely lost. I'm not sure what each of these codes are for or if I'm even using them correctly. Is there any current list of sas codes for biology with explanations as to why we use the each specific test? I have a short list of tests he mentioned in passing during class.

Tests:

One Sample T-test

2 Sample T-test

Paired T-test

Wilcoxon Test

ANOVA 1 and 2 way

Factoral ANOVA

Mannwhitney U test

Chi-squared

Fischer's Exact

Tukey HSD and LSD

Regression

Correlation - Spearman

I've been looking though the SAS help webside and I'm still not finding all the answers I need. I have this final exam on Wednesday, if anyone could help point me in a direction or knows the answer it would be appreciated. Thank you!

ChrisHemedinger · Posted 12-11-2016 12:08 PM

You can find at least some of these references in:

Usage Note 30333: FASTats: Frequently Asked-For Statistics

These aren't specific to biology at all, but each stat listed does have a link to the SAS procedure or technique you can use to calculate it.

Become an Explorer! Join SAS Analytics Explorers to learn and complete challenges that earn rewards!

Reeza · Posted 12-11-2016 12:49 PM

If you line up each test with a SAS Procedure you could read the overview section and put together a list of definitions.

I would recommend the STATSOFT textbook though, to get an idea of what each test is quickly. Have you taken a statistics course at Unoversity? A first level stats course covers most of what you're mentioning.

Most are under Basic and Descriptive statistics but you can also look them up in the Glossary.

http://www.statsoft.com/Textbook/Basic-Statistics

This is a lot to learn for Wednesday...Good Luck

dleighton · Posted 12-11-2016 12:58 PM

I did take stats at my university but our professor pushed more for excel or doing the problems by hand. I have a basic foundation of terms and procedures. The SAS system is completely new to me and some of the coding I'm not quite sure why it's needed.

Thank you for all the responses with the links. I am looking through them now. If there was a better background from my professor I might understand the output a little better.

Cynthia_sas · Posted 12-11-2016 01:03 PM

Hi:
The Statistics 1 e-learning class is free as self-paced e-learning for any adult learner. You can activate the class by setting up a SAS Profile and clicking the Start button on this page: https://support.sas.com/edu/elearning.html?ctry=us&productType=library and the first lesson has an overview of statistics.

cynthia

Reeza · Posted 12-11-2016 02:21 PM

Another way to do this is to look at what PROCs your code contains and review the documentation for those procedures. Specifically, walking through the examples is a good idea. I've tried to identify the relevant tests for you below, but reviewing the code samples is a better bet.

Tests:

PROC TTEST
One Sample T-test
2 Sample T-test
Paired T-test

PROC ANOVA
ANOVA 1 and 2 way
Factoral ANOVA

PROC NPAR1WAY
Mannwhitney U test

PROC FREQ
Chi-squared
Fischer's Exact

Not sure??
Tukey HSD and LSD

PROC REG/GLM?
Regression

PROC CORR

Correlation - Spearman

dleighton · Posted 12-11-2016 02:41 PM

@Reeza

That is extremely helpful! In the part of the code that our professor posted it included:

Data assumps;

set assumps;

absresid = abs(resid);

Run;

Proc plot data = assumps vpercent = 50 50;

plot resid * pred

absresid * pred / VREF = 0

Quit;

proc univariate normal plot data= assumps;

var resid;

quit;

This is all following a proc GLM function including the statement [output out = assumps p=pred r=resid";] I know the proc univariate function is supposed to help determine Normality, but Im not sure what these functions actually do. There are nowhere in my notes and just appeared oneday in the SAS code he brought up so we could copy. Are these codes supposed to be included in ever GLM function?

Reeza · Posted 12-11-2016 04:23 PM

Learning to use the documentation is worth some time, in general when learning the language.

PROC GLM - OUTPUT statement.

http://support.sas.com/documentation/cdl/en/statug/68162/HTML/default/viewer.htm#statug_glm_syntax17...

OUT=<SAS-data-set> -> This is the name of the data set created.

P(PREDICTED) = <variable-name> -> name of the variable created that corresponds to the residual value

R(RESIDUAL) = <variable-name> -> name of the variable created that corresponds to the residual value

So the GLM procedure creates a dataset called ASSUMPS that contains at least two variables, one for the predicted value and one for the residual value of the model.

A data step can be used for a variety of things, in this case it's creating a variable using the ABS() function.

http://support.sas.com/documentation/cdl/en/lefunctionsref/69762/HTML/default/viewer.htm#p0xkrj83an7...

ABS() -> Returns the absolute value.

So in this step you're creating a variable called ABSRESID that contains the absolute value of the residual.

Data assumps;
set assumps;
absresid = abs(resid);
Run;

Are you sure it was PROC PLOT and not SGPLOT? I can't find a reference to PROC PLOT but believe it was used back in the day.

This procedure is creating two graphs, one for residual by predicted and one for the absolute value of the residual by the predicted value.

It's also missing a semi-colon.

Proc plot data = assumps vpercent = 50 50;
plot resid * pred absresid * pred / VREF = 0;
Quit;

The next step, PROC UNIVARIATE is looking to see if the variable RESID is normally distributed. This is the assumption for linear regression - the error is normally distributed.

proc univariate normal plot data= assumps;
var resid;
quit;

If you need to do this in real life, I would consider the following. This would produce a diagnostic and residuals plots that would help both assess normality and any issues with your regressions - ie outliers.

proc glm data=sashelp.class plots=(diagnostics residuals);
model weight = height age;
run;quit;

If you want help understanding the output you can look at either the examples in the PROC or at this page, which has annotated output for many common procs you're looking at, based on your questions.

http://www.ats.ucla.edu/stat/AnnotatedOutput/

ballardw · Posted 12-12-2016 10:47 AM

A warning about coding following someone else's examples:

@dleighton wrote:

@Reeza

That is extremely helpful! In the part of the code that our professor posted it included:

Data assumps;

set assumps;

absresid = abs(resid);

Run;

Please be extremely cautious, I would recommend almost never, use the code structure Data datasetname; Set datasetname;

The step replaces the original data. Depending on possible code logic errors (high probability when learning a programming language) or typos it is very easy to destroy your original data. If you use this approach you should make sure that you can always get back to the original data. One example from code I inherited involved recodeing data from a 1 and 2 value coding to a 0 and 1.

data example;
   set example;
   var = var -1;
   /* other code was here*/
run;

Looks simple and much like your professor's example. However the code had a change needed in the "other code" indicated above. So the change was made and rerun. Now the values that had been 2, after the FIRST pass were reduced to 1 and with the second pass further reduced to 0. So all values of var became either missing or 0.

You have been warned. Less obvious are if the codes switch 0 to 1 and 1 to 0 (which I also inherited). That resulted in the rate of a reportable item changing from 56 percent to 44 percent and everyone in the organization thought there had been a drastic change from the previous year 55 percent.

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away