Hello guys, i was working on these codes and I am struggling with creating a permanent dataset for two datasets
that I merged. Secondly, is the variance equality test a good way to answer the fifth question? Thanks..
I have included my code and my practice questions
*/ QUESTION 1 */ ;
DATA Black;
INFILE "C:\Users\iyink\Downloads\black22fall.txt" DLM='09'x FIRSTOBS=2;
INPUT ID sex race age height weight bpsystol bpdiast tcresult iron heartatk diabetes DOB;
INFORMAT DOB date9. ;
FORMAT DOB date9.;
RUN;
*/ QUESTION 2 */ ;
LIBNAME exam2 "C:\Users\iyink\Downloads";
RUN;
PROC PRINT DATA=perm2.white22fall;
RUN;
PROC SORT DATA=Black NODUPS; BY ID; RUN;
PROC SORT DATA=perm2.white22fall NODUPS; BY ID; RUN;
DATA MyCombined;
MERGE Black(IN=a) perm2.white22fall(IN=b);
BY ID;
IF a=1 OR b=1 THEN OUTPUT MyCombined;
RUN;
PROC PRINT DATA=MyCombined;
RUN;
*/ QUESTION 3 /*
yes we have outliers and extreme outliers for iron in this dataset
those with mild outliers = 28 persons
those with extreme outliers = 2 persons;
PROC MEANS DATA=MyCombined N NMISS MIN P25 P50 P75 MAX MEAN STD MAXDEC=1;
VAR iron;
RUN;
PROC SGPLOT DATA=MyCombined;
HBOX iron;
RUN;
PROC SGPLOT DATA=MyCombined;
HBOX iron/EXTREME;
RUN;
PROC PRINT DATA=MyCombined;
WHERE iron < 76 - 1.5*(120-76)
OR iron > 120 + 1.5*(120-76);
VAR ID iron;
RUN;
PROC PRINT DATA=MyCombined;
WHERE iron < 76 - 3*(120-76)
OR iron > 120 + 3*(120-76);
VAR ID iron;
RUN;
*/ QUESTION 4 */;
PROC MEANS DATA=MyCombined MEAN CLM MAXDEC=2;
VAR tcresult;
RUN;
PROC UNIVARIATE DATA=MyCombined MU0=200;
VAR tcresult;
RUN;
ODS GRAPHICS ON;
PROC TTEST DATA=MyCombined HO=200 PLOTS(ONLY) = (SUMMARYPLOT);
VAR tcresult;
RUN; ODS GRAPHICS OFF;
*/ Ha: CLevel ^= 120
H0: CLevel = 120
P-value from Student's test = <0.0001
Decision: Since, <.0001 < 0.05, so we reject H0.
Conclusion: Tcresult is significantly different from 200
95% CL of TCRESULT are (215.3, 219.9) which do not contain or cover 200, this also indicates that TCRESULT is significantly different from 200/;
*/ QUESTION 5*/;
PROC SORT DATA=MyCombined;
BY sex;
RUN;
PROC FREQ DATA=MyCombined;
BY sex;
TABLE heartatk/BINOMIAL (LEVEL=2);
RUN;
PROC FREQ DATA=MyCombined;
TABLE sex*heartatk/CHISQ RISKDIFF;
RUN;
*/ Prevalence of Men with Hearkattack = 58/851 = 0.0682 = 6.82% (95% CL: 5.12, 8.51)
Prevalence of Women with Heartattack = 31/956 = 0.0324 = 3.24% (2.12, 4.37) */
Prevalence of HeartAttack in general population = 89/1807 = 0.05
Ha: Proportion of Heartatk(=1) for man ^= Proportion of Heartatk(=1) for woman
H0: Proportion of Heartatk(=1) for man = Proportion of Heartatk(=1) for woman
P-value = 0.0005
The difference between the two proportions is 3.57%, with 95% CL ( %, %) and p-value=0.0005 < 0.05 THUS, Reject H0
Pvalue < 0.05 and 95% CL does not cover 0, so the difference is significantly diffferent from 0. We rject H0. Men and women do not have similar
proprtions.
Conclusion
;
ODS GRAPHICS ON;
PROC TTEST DATA=MyCombined PLOTS(ONLY) = (SUMMARYPLOT);
CLASS sex;
VAR heartatk;
RUN;
ODS GRAHICS OFF;
*/ EQUALITY OF VARIANCES TEST METHOD
H0: variances are the same
Ha: variances are different
P-value = <0.0001 < 0.05
Reject H0.
The equal variances are not equal
2-sample T TEST using the equal variance method
H0: Heartattack in man = Heartattack in woman
Ha: Heartattack in woman ^= Heartattack in woman
P-value = 0.0005 < 0.05
Reject H0.
Man and Wowan do not have similar Heartattack
95% CL of Heartattack for man: (0.0512, 0.0851) and 95% CL of Heartattack for woman: (0.0212, 0.0437).
Download black22fall.txt (tab delimited data file) and white22fall.sas7bdat. The same variables were collected from two sub populations independently.
- (10) Use a DATA step to read in black22fall.txt that is a tab delimited data file. Make this a temporary dataset and name it black.
- (10) Use a DATA step to combine the black dataset from B.1 and white22fall.sas7bdat. Make this a permanent dataset and name it MyCombined.
When I ran question 2. MyCombined came back to me as a temporary dataset and not a permanent one
Use MyCombined data to answer the following questions:
- (10) Is there any outlier or extreme outlier for the variable iron? Calculate/show the boundaries for outliers and extreme outliers, and identify these people with mild and extreme outlying values in the output, respectively.
- (10) The desirable cholesterol level is less than 200 mg/dL. Is this study population’s cholesterol (tcresult) different from 200? Write down the null and alternative hypotheses, p-value, decision (reject or fail to reject the null), and conclusion.
- (10) What is the prevalence of heart attack (heartatk = 1) for men and for women? Report the prevalence and 95% confidence interval. Do men and women have different prevalence? Write down the null and alternative hypotheses, p-value, decision (reject or fail to reject the null), and conclusion.
I used the variance equality test to answer the number5 question. I feel like I am supposed to use another procedure to solve this. What do you guys think?