Statistical Procedures

Programming the statistical procedures from SAS
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
saza
Quartz | Level 8

I was given a dataset called Cancer and was told to find the N, Mean, SD, Median, Min, and Max for the variables "Exposure" and "Mortality." I thought this was simple since there is a linear correlation between both variables so I used the code:

 

proc univariate data=cancer.cancer;
var exposure mortality;
run;

which was able to generate all the values I needed however, the only answer considered "right" was the N value I obtained, everything else was incorrect. Am I supposed to be using a different code? or Multiplying something? Will attach the dataset and problem for reference.

 

Synopsis: 

Since World War II, plutonium for use in atomic weapons has been produced at an Atomic Energy
Commission facility in Hanford, WA. One of the major safety issues has been the storage of radioactive
wastes. Over the years, significant quantities of these substances, including Sr90 and Cs137, have leaked into
the nearby Columbia River, which flows along the Washington-Oregon border into the Pacific Ocean.

To measure the health consequences of this contamination, an index of exposure was calculated for each of
the nine Oregon counties bordering either the Columbia River or the Pacific Ocean. This particular index was
based on several factors, including the county's stream distance from Hanford and the average distance of its
population from the water. As a covariate, the cancer mortality rate was determined for each county.


The SAS data set, cancer, is located on SAS on Demand and in your BIOS-517 library.

  • The data set contains the following variables:
    • County Name of county
    • Exposure Index of exposure
    • Mortality Cancer mortality per 100,000 person-yrs

You may assume that the dataset is clean, but you should still do univariate analyses to familiarize yourself
with each variable.

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

I am just guessing, but were your answers submitted electronically and graded by computer?  If so, you might search the assignment or computer instructions to determine how many decimal digits you should submit for non-integer values. For example, PROC UNIVARIATE might tell you that the mean of the Exposure variable is 4.61777778, but the software might be expecting 4.6178. Check with your peers and your instructor to determine the format for the answers.

 

Still, I am surprised you didn't get correct answers for MIN and MAX, which only have one correct answer.

View solution in original post

3 REPLIES 3
Reeza
Super User

What values did you get and what was the "correct" answer?

 

Did you read in a CSV file or different file type to create the cancer file attached or was it provided as is?

 


@saza wrote:

I was given a dataset called Cancer and was told to find the N, Mean, SD, Median, Min, and Max for the variables "Exposure" and "Mortality." I thought this was simple since there is a linear correlation between both variables so I used the code:

 

proc univariate data=cancer.cancer;
var exposure mortality;
run;

which was able to generate all the values I needed however, the only answer considered "right" was the N value I obtained, everything else was incorrect. Am I supposed to be using a different code? or Multiplying something? Will attach the dataset and problem for reference.

 

Synopsis: 

Since World War II, plutonium for use in atomic weapons has been produced at an Atomic Energy
Commission facility in Hanford, WA. One of the major safety issues has been the storage of radioactive
wastes. Over the years, significant quantities of these substances, including Sr90 and Cs137, have leaked into
the nearby Columbia River, which flows along the Washington-Oregon border into the Pacific Ocean.

To measure the health consequences of this contamination, an index of exposure was calculated for each of
the nine Oregon counties bordering either the Columbia River or the Pacific Ocean. This particular index was
based on several factors, including the county's stream distance from Hanford and the average distance of its
population from the water. As a covariate, the cancer mortality rate was determined for each county.


The SAS data set, cancer, is located on SAS on Demand and in your BIOS-517 library.

  • The data set contains the following variables:
    • County Name of county
    • Exposure Index of exposure
    • Mortality Cancer mortality per 100,000 person-yrs

You may assume that the dataset is clean, but you should still do univariate analyses to familiarize yourself
with each variable.

 

 

 


 

Rick_SAS
SAS Super FREQ

I am just guessing, but were your answers submitted electronically and graded by computer?  If so, you might search the assignment or computer instructions to determine how many decimal digits you should submit for non-integer values. For example, PROC UNIVARIATE might tell you that the mean of the Exposure variable is 4.61777778, but the software might be expecting 4.6178. Check with your peers and your instructor to determine the format for the answers.

 

Still, I am surprised you didn't get correct answers for MIN and MAX, which only have one correct answer.

ballardw
Super User

You really need to describe what you consider "wrong". The example data set only has 9 records with one character variable for the county name and then measure variables so there really isn't much going on in that data set.

 

It may be that you are seeing more decimal places in the output then you expect. So if a mean, as reported by proc univariate is 4.61777778 and you were told to expect something like 4.6 or 4.62 then the difference is a rounding choice someone made. Or did you expect to see more digits?

 

A variable such as Exposure in this context could well have an instrumental limit of accuracy and decimals past a certain point imply more precision than the instruments could measure. Or with deaths per 100,000 people if you carry too many decimals you start talking about fractional person-deaths. Except in movies like "Princess Bride" people tend to be dead or alive, not "mostly dead". So the mortality would seldom be reported with more than 2 or 3 decimals.

 

sas-innovate-white.png

Join us for our biggest event of the year!

Four days of inspiring keynotes, product reveals, hands-on learning opportunities, deep-dive demos, and peer-led breakouts. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 921 views
  • 0 likes
  • 4 in conversation