BookmarkSubscribeRSS Feed
iyinope
Fluorite | Level 6
Hello guys, i was working on these codes and I am struggling with creating a permanent dataset for two datasets
that I merged. Secondly, is the variance equality test a good way to answer the fifth question? Thanks..
I have included my code and my practice questions




*/ QUESTION 1 */ ; DATA Black; INFILE "C:\Users\iyink\Downloads\black22fall.txt" DLM='09'x FIRSTOBS=2; INPUT ID sex race age height weight bpsystol bpdiast tcresult iron heartatk diabetes DOB; INFORMAT DOB date9. ; FORMAT DOB date9.; RUN; */ QUESTION 2 */ ; LIBNAME exam2 "C:\Users\iyink\Downloads"; RUN; PROC PRINT DATA=perm2.white22fall; RUN; PROC SORT DATA=Black NODUPS; BY ID; RUN; PROC SORT DATA=perm2.white22fall NODUPS; BY ID; RUN; DATA MyCombined; MERGE Black(IN=a) perm2.white22fall(IN=b); BY ID; IF a=1 OR b=1 THEN OUTPUT MyCombined; RUN; PROC PRINT DATA=MyCombined; RUN; */ QUESTION 3 /* yes we have outliers and extreme outliers for iron in this dataset those with mild outliers = 28 persons those with extreme outliers = 2 persons; PROC MEANS DATA=MyCombined N NMISS MIN P25 P50 P75 MAX MEAN STD MAXDEC=1; VAR iron; RUN; PROC SGPLOT DATA=MyCombined; HBOX iron; RUN; PROC SGPLOT DATA=MyCombined; HBOX iron/EXTREME; RUN; PROC PRINT DATA=MyCombined; WHERE iron < 76 - 1.5*(120-76) OR iron > 120 + 1.5*(120-76); VAR ID iron; RUN; PROC PRINT DATA=MyCombined; WHERE iron < 76 - 3*(120-76) OR iron > 120 + 3*(120-76); VAR ID iron; RUN; */ QUESTION 4 */; PROC MEANS DATA=MyCombined MEAN CLM MAXDEC=2; VAR tcresult; RUN; PROC UNIVARIATE DATA=MyCombined MU0=200; VAR tcresult; RUN; ODS GRAPHICS ON; PROC TTEST DATA=MyCombined HO=200 PLOTS(ONLY) = (SUMMARYPLOT); VAR tcresult; RUN; ODS GRAPHICS OFF; */ Ha: CLevel ^= 120 H0: CLevel = 120 P-value from Student's test = <0.0001 Decision: Since, <.0001 < 0.05, so we reject H0. Conclusion: Tcresult is significantly different from 200 95% CL of TCRESULT are (215.3, 219.9) which do not contain or cover 200, this also indicates that TCRESULT is significantly different from 200/; */ QUESTION 5*/; PROC SORT DATA=MyCombined; BY sex; RUN; PROC FREQ DATA=MyCombined; BY sex; TABLE heartatk/BINOMIAL (LEVEL=2); RUN; PROC FREQ DATA=MyCombined; TABLE sex*heartatk/CHISQ RISKDIFF; RUN; */ Prevalence of Men with Hearkattack = 58/851 = 0.0682 = 6.82% (95% CL: 5.12, 8.51) Prevalence of Women with Heartattack = 31/956 = 0.0324 = 3.24% (2.12, 4.37) */ Prevalence of HeartAttack in general population = 89/1807 = 0.05 Ha: Proportion of Heartatk(=1) for man ^= Proportion of Heartatk(=1) for woman H0: Proportion of Heartatk(=1) for man = Proportion of Heartatk(=1) for woman P-value = 0.0005 The difference between the two proportions is 3.57%, with 95% CL ( %, %) and p-value=0.0005 < 0.05 THUS, Reject H0 Pvalue < 0.05 and 95% CL does not cover 0, so the difference is significantly diffferent from 0. We rject H0. Men and women do not have similar proprtions. Conclusion ; ODS GRAPHICS ON; PROC TTEST DATA=MyCombined PLOTS(ONLY) = (SUMMARYPLOT); CLASS sex; VAR heartatk; RUN; ODS GRAHICS OFF; */ EQUALITY OF VARIANCES TEST METHOD H0: variances are the same Ha: variances are different P-value = <0.0001 < 0.05 Reject H0. The equal variances are not equal 2-sample T TEST using the equal variance method H0: Heartattack in man = Heartattack in woman Ha: Heartattack in woman ^= Heartattack in woman P-value = 0.0005 < 0.05 Reject H0. Man and Wowan do not have similar Heartattack 95% CL of Heartattack for man: (0.0512, 0.0851) and 95% CL of Heartattack for woman: (0.0212, 0.0437).



Download black22fall.txt (tab delimited data file) and white22fall.sas7bdat.  The same variables were collected from two sub populations independently.

 

  1. (10) Use a DATA step to read in black22fall.txt that is a tab delimited data file.  Make this a temporary dataset and name it black.

 

 

 

  1. (10) Use a DATA step to combine the black dataset from B.1 and white22fall.sas7bdat.  Make this a permanent dataset and name it MyCombined.
When I ran question 2. MyCombined came back to me as a temporary dataset and not a permanent one

 

 

 

Use MyCombined data to answer the following questions:

 

  1. (10) Is there any outlier or extreme outlier for the variable iron?  Calculate/show the boundaries for outliers and extreme outliers, and identify these people with mild and extreme outlying values in the output, respectively. 

                                                                                                           

 

 

  1. (10) The desirable cholesterol level is less than 200 mg/dL.  Is this study population’s cholesterol (tcresult) different from 200?  Write down the null and alternative hypotheses, p-value, decision (reject or fail to reject the null), and conclusion.

 

 

 

  1. (10) What is the prevalence of heart attack (heartatk = 1) for men and for women?  Report the prevalence and  95% confidence interval.  Do men and women have different prevalence?  Write down the null and alternative hypotheses, p-value, decision (reject or fail to reject the null), and conclusion.

I used the variance equality test to answer the number5 question. I feel like I am supposed to use another procedure to solve this. What do you guys think?
1 REPLY 1
ballardw
Super User

Permanent data set means to create a Library, Libname statement should have been covered in your class, and place the result there. That means to use the data set you use the libname.dataset on any DATA= options or Set statements.

 

Without seeing the data I am not sure that a MERGE would be the proper way to combine the datasets. That may result in common named variables having the values from only one data set. It may be that SET or Proc Append to stack the data is more appropriate.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 367 views
  • 0 likes
  • 2 in conversation