BookmarkSubscribeRSS Feed
dleighton
Calcite | Level 5

I am learning this program for the first time for one of my biology classes. My professor hasn't really explained much to us. He just puts the codes in sas and the whole class rushes to scribble down the codes. Needless to say I'm completely lost. I'm not sure what each of these codes are for or if I'm even using them correctly. Is there any current list of sas codes for biology with explanations as to why we use the each specific test? I have a short list of tests he mentioned in passing during class. 

 

Tests: 

One Sample T-test

2 Sample T-test

Paired T-test 

Wilcoxon Test

ANOVA 1 and 2 way

Factoral ANOVA

Mannwhitney U test

Chi-squared

Fischer's Exact

Tukey HSD and LSD 

Regression

Correlation - Spearman 

 

 I've been looking though the SAS help webside and I'm still not finding all the answers I need. I have this final exam on Wednesday, if anyone could help point me in a direction or knows the answer it would be appreciated. Thank you!

8 REPLIES 8
ChrisHemedinger
Community Manager

You can find at least some of these references in:

 

Usage Note 30333: FASTats: Frequently Asked-For Statistics

 

These aren't specific to biology at all, but each stat listed does have a link to the SAS procedure or technique you can use to calculate it.

 

It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.
Reeza
Super User

If you line up each test with a SAS Procedure you could read the overview section and put together a list of definitions. 

 

I would recommend the STATSOFT textbook though, to get an idea of what each test is quickly. Have you taken a statistics course at Unoversity? A first level stats course covers most of what you're mentioning. 

 

Most are under Basic and Descriptive statistics but you can also look them up in the Glossary. 

 

http://www.statsoft.com/Textbook/Basic-Statistics

 

This is a lot to learn for Wednesday...Good Luck 

dleighton
Calcite | Level 5

I did take stats at my university but our professor pushed more for excel or doing the problems by hand. I have a basic foundation of terms and procedures. The SAS system is completely new to me and some of the coding I'm not quite sure why it's needed. 

 

Thank you for all the responses with the links. I am looking through them now. If there was a better background from my professor I might understand the output a little better. 

Cynthia_sas
SAS Super FREQ
Hi:
The Statistics 1 e-learning class is free as self-paced e-learning for any adult learner. You can activate the class by setting up a SAS Profile and clicking the Start button on this page: https://support.sas.com/edu/elearning.html?ctry=us&productType=library and the first lesson has an overview of statistics.

cynthia
Reeza
Super User

Another way to do this is to look at what PROCs your code contains and review the documentation for those procedures. Specifically, walking through the examples is a good idea. I've tried to identify the relevant tests for you below, but reviewing the code samples is a better bet. 

 

Tests:

PROC TTEST
One Sample T-test
2 Sample T-test
Paired T-test

 

PROC ANOVA
ANOVA 1 and 2 way
Factoral ANOVA

 

PROC NPAR1WAY
Mannwhitney U test

 

PROC FREQ
Chi-squared
Fischer's Exact

 

Not sure??
Tukey HSD and LSD

 

PROC REG/GLM? 
Regression

 

PROC CORR

Correlation - Spearman

dleighton
Calcite | Level 5

@Reeza

 

     That is extremely helpful! In the part of the code that our professor posted it included:

                     Data assumps;

                     set assumps;

                     absresid = abs(resid);

                     Run;

 

                     Proc plot data = assumps vpercent = 50 50;

                     plot resid * pred

                                       absresid * pred / VREF = 0

                     Quit; 

   

                    proc univariate normal plot data= assumps;

                    var resid;

                    quit; 

 

 This is all following a  proc GLM function including the statement [output out = assumps p=pred r=resid";] I know the proc univariate function is supposed to help determine Normality, but Im not sure what these functions actually do. There are nowhere in my notes and just appeared oneday in the SAS code he brought up so we could copy. Are these codes supposed to be included in ever GLM function? 

 

Reeza
Super User

Learning to use the documentation is worth some time, in general when learning the language.

 

PROC GLM - OUTPUT statement.

http://support.sas.com/documentation/cdl/en/statug/68162/HTML/default/viewer.htm#statug_glm_syntax17...

 

OUT=<SAS-data-set> -> This is the name of the data set created.

P(PREDICTED) = <variable-name> -> name of the variable created that corresponds to the residual value

R(RESIDUAL) = <variable-name> -> name of the variable created that corresponds to the residual value

 

So the GLM procedure creates a dataset called ASSUMPS that contains at least two variables, one for the predicted value and one for the residual value of the model. 

 

 

A data step can be used for a variety of things, in this case it's creating a variable using the ABS() function.

http://support.sas.com/documentation/cdl/en/lefunctionsref/69762/HTML/default/viewer.htm#p0xkrj83an7...

 

ABS() -> Returns the absolute value. 

 

So in this step you're creating a variable called ABSRESID that contains the absolute value of the residual. 

 

 

Data assumps;
set assumps;
absresid = abs(resid);
Run;

 

 

Are you sure it was PROC PLOT and not SGPLOT? I can't find a reference to PROC PLOT but believe it was used back in the day.

This procedure is creating two graphs, one for residual by predicted and one for the absolute value of the residual by the predicted value. 

 

It's also  missing a semi-colon. 

 

Proc plot data = assumps vpercent = 50 50;
plot resid * pred absresid * pred / VREF = 0;
Quit; 

The next step, PROC UNIVARIATE is looking to see if the variable RESID is normally distributed. This is the assumption for linear regression - the error is normally distributed.

 

proc univariate normal plot data= assumps;
var resid;
quit; 

 

If you need to do this in real life, I would consider the following. This would produce a diagnostic and residuals plots that would help both assess normality and any issues with your regressions - ie outliers. 

 

proc glm data=sashelp.class plots=(diagnostics residuals);
model weight = height age;
run;quit;

If you want help understanding the output you can look at either the examples in the PROC or at this page, which has annotated output for many common procs you're looking at, based on your questions.

 

http://www.ats.ucla.edu/stat/AnnotatedOutput/

ballardw
Super User

A warning about coding following someone else's examples:


@dleighton wrote:

@Reeza

 

     That is extremely helpful! In the part of the code that our professor posted it included:

                     Data assumps;

                     set assumps;

                     absresid = abs(resid);

                     Run;

Please be extremely cautious, I would recommend almost never, use the code structure   Data datasetname; Set datasetname;

The step replaces the original data. Depending on possible code logic errors (high probability when learning a programming language) or typos it is very easy to destroy your original data. If you use this approach you should make sure that you can always get back to the original data. One example from code I inherited involved recodeing data from a 1 and 2 value coding to a 0 and 1.

 

data example;
   set example;
   var = var -1;
   /* other code was here*/
run;

Looks simple  and much like your professor's example. However the code had a change needed in the "other code" indicated above. So the change was made and rerun. Now the values that had been 2, after the FIRST pass were reduced to 1 and with the second pass further reduced to 0. So all values of var became either missing or 0.

 

 

You have been warned. Less obvious are if the codes switch 0 to 1 and 1 to 0 (which I also inherited). That resulted in the rate of a reportable item changing from 56 percent to 44 percent and everyone in the organization thought there had been a drastic change from the previous year 55 percent.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 6825 views
  • 1 like
  • 5 in conversation