- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I was given a dataset called Cancer and was told to find the N, Mean, SD, Median, Min, and Max for the variables "Exposure" and "Mortality." I thought this was simple since there is a linear correlation between both variables so I used the code:
proc univariate data=cancer.cancer;
var exposure mortality;
run;
which was able to generate all the values I needed however, the only answer considered "right" was the N value I obtained, everything else was incorrect. Am I supposed to be using a different code? or Multiplying something? Will attach the dataset and problem for reference.
Synopsis:
Since World War II, plutonium for use in atomic weapons has been produced at an Atomic Energy
Commission facility in Hanford, WA. One of the major safety issues has been the storage of radioactive
wastes. Over the years, significant quantities of these substances, including Sr90 and Cs137, have leaked into
the nearby Columbia River, which flows along the Washington-Oregon border into the Pacific Ocean.
To measure the health consequences of this contamination, an index of exposure was calculated for each of
the nine Oregon counties bordering either the Columbia River or the Pacific Ocean. This particular index was
based on several factors, including the county's stream distance from Hanford and the average distance of its
population from the water. As a covariate, the cancer mortality rate was determined for each county.
The SAS data set, cancer, is located on SAS on Demand and in your BIOS-517 library.
- The data set contains the following variables:
- County Name of county
- Exposure Index of exposure
- Mortality Cancer mortality per 100,000 person-yrs
You may assume that the dataset is clean, but you should still do univariate analyses to familiarize yourself
with each variable.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am just guessing, but were your answers submitted electronically and graded by computer? If so, you might search the assignment or computer instructions to determine how many decimal digits you should submit for non-integer values. For example, PROC UNIVARIATE might tell you that the mean of the Exposure variable is 4.61777778, but the software might be expecting 4.6178. Check with your peers and your instructor to determine the format for the answers.
Still, I am surprised you didn't get correct answers for MIN and MAX, which only have one correct answer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What values did you get and what was the "correct" answer?
Did you read in a CSV file or different file type to create the cancer file attached or was it provided as is?
@saza wrote:
I was given a dataset called Cancer and was told to find the N, Mean, SD, Median, Min, and Max for the variables "Exposure" and "Mortality." I thought this was simple since there is a linear correlation between both variables so I used the code:
proc univariate data=cancer.cancer;
var exposure mortality;
run;which was able to generate all the values I needed however, the only answer considered "right" was the N value I obtained, everything else was incorrect. Am I supposed to be using a different code? or Multiplying something? Will attach the dataset and problem for reference.
Synopsis:
Since World War II, plutonium for use in atomic weapons has been produced at an Atomic Energy
Commission facility in Hanford, WA. One of the major safety issues has been the storage of radioactive
wastes. Over the years, significant quantities of these substances, including Sr90 and Cs137, have leaked into
the nearby Columbia River, which flows along the Washington-Oregon border into the Pacific Ocean.To measure the health consequences of this contamination, an index of exposure was calculated for each of
the nine Oregon counties bordering either the Columbia River or the Pacific Ocean. This particular index was
based on several factors, including the county's stream distance from Hanford and the average distance of its
population from the water. As a covariate, the cancer mortality rate was determined for each county.
The SAS data set, cancer, is located on SAS on Demand and in your BIOS-517 library.
- The data set contains the following variables:
- County Name of county
- Exposure Index of exposure
- Mortality Cancer mortality per 100,000 person-yrs
You may assume that the dataset is clean, but you should still do univariate analyses to familiarize yourself
with each variable.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am just guessing, but were your answers submitted electronically and graded by computer? If so, you might search the assignment or computer instructions to determine how many decimal digits you should submit for non-integer values. For example, PROC UNIVARIATE might tell you that the mean of the Exposure variable is 4.61777778, but the software might be expecting 4.6178. Check with your peers and your instructor to determine the format for the answers.
Still, I am surprised you didn't get correct answers for MIN and MAX, which only have one correct answer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You really need to describe what you consider "wrong". The example data set only has 9 records with one character variable for the county name and then measure variables so there really isn't much going on in that data set.
It may be that you are seeing more decimal places in the output then you expect. So if a mean, as reported by proc univariate is 4.61777778 and you were told to expect something like 4.6 or 4.62 then the difference is a rounding choice someone made. Or did you expect to see more digits?
A variable such as Exposure in this context could well have an instrumental limit of accuracy and decimals past a certain point imply more precision than the instruments could measure. Or with deaths per 100,000 people if you carry too many decimals you start talking about fractional person-deaths. Except in movies like "Princess Bride" people tend to be dead or alive, not "mostly dead". So the mortality would seldom be reported with more than 2 or 3 decimals.