This analysis uses complex variables and multiple regression. I'll show the code and the log. CVDf1, CVDf2, CVDf3, CVDf4 got lost at the end.
data source;
set projects.source;
CVDf1=
- pmeat17KCsW * 5.49 * 0.0443
- rmeat17KCsW * 50.70 * 0.0560
- fish17KCsW * 10.01 * 0.0412
- milk17KCsW * 25.37 * 0.0398
- poultry16KCsW * 45.06 * 0.0867
- eggs16KCsW * 19.47 * 0.1544
- SFA16KCsW * 191.27 * 0.0603 * 0.46
- PUFA17KCsW * 82.24 * 0.1033 * 0.46
- TFA17KCsW * 13.40 * 0.0128 * 0.46
- Alcohol17KCsW * 81.71 * 0.0047
+ Sugarb17KCsW * 297.65 * 0.0136
+ potatoes16KCsW * 84.16 * 0.0024
- corn16KCsW * 34.67 * 0.0037
- fruits17KCsW * 40.39 * 0.1291
- Vegetables17KCsW * 80.14 * 0.0127
- nutsseeds17KCsW * 8.51 * 0.0797
- wgrains17KCsW * 55.65 * 0.0376
- legumes17KCsW * 51.66 * 0.0005
+ rice16KCsW * 141.23 * 0.0001
- swtpot16KCsW * 22.67 * 0.0270
;
CVDf2=
+ smoke17msW * 0.2046 * 0.08899
+ SLTobacco17msW * 0.0680 * 0.08179
+ kidneydz17msW * 0.056 * 0.037636
;
CVDf3=
+ T1DM17msW * 10.34 * 0.1169
+ T2DM17msW * 17.47 * 0.05193
;
run;quit;
proc corr data=source fisher;
label
CVDf1=Combination of 20 diet risk factors
CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3=Types 1 and 2 DM;
var CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW;
with CVD2017m;
run;quit;
* Works ok to here;
Proc reg data=source;
label
CVDf1=Combination of 20 diet risk factors
CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3=Types 1 and 2 DM;
;
model CVD2017msW=CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW
/ selection=STEPWISE
slentry=.25 slstay=.25;
run; quit;
* Works ok to here;
*CVD formula R2=0.4236;
data source;
set projects.source;
label
CVDf1=Combination of 20 diet risk factors
CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3=Types 1 and 2 DM
CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ;
CVDf4=
+ CVDf1 * 0.01476
+ CVDf2 * 16.45730
+ CVDf3 * 0.11176
+ SBP17msW * 0.17041
- sex_IDsW * 0.17070
;
run; quit;
proc corr data=source fisher;
label
CVDf1=Combination of 20 diet risk factors
CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3=Types 1 and 2 DM
CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ;
var CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW;
with CVD2017m;
run;quit;
*Here SAS can't find CVDf1 CVDf2 CVDf3 CVDf4;
Here is the log;
The CORR Procedure
1 With Variables: CVD2017m
5 Variables: CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum Label
CVD2017m 7846 543.66067 288.00939 4265562 73.47499 1844 CVD/100k/year ages 15-69
CVDf1 7846 0 21.61400 0 -74.89858 33.74918 Combination of 20 diet risk factors
CVDf2 7846 0 0.01796 0 -0.02383 0.04322 Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3 7846 0 1.94134 0 -2.18859 20.35708 Types 1 and 2 DM
SBP17msW 7846 4.9045E-13 1.00000 3.84807E-9 -2.43011 3.23505 Systolic BP mm Hg
sex_IDsW 7846 0 1.00000 0 -0.99994 0.99994 Sex male 1 and female 2
Pearson Correlation Coefficients, N = 7846
Prob > |r| under H0: Rho=0
CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW
CVD2017m
CVD/100k/year ages 15-69
0.41439
<.0001
0.41217
<.0001
0.31755
<.0001
0.19492
<.0001
-0.39527
<.0001
Pearson Correlation Statistics (Fisher's z Transformation)
Variable With Variable N Sample Correlation Fisher's z Bias Adjustment Correlation Estimate 95% Confidence Limits p Value for
H0:Rho=0
CVDf1 CVD2017m 7846 0.41439 0.44090 0.0000264 0.41437 0.395873 0.432532 <.0001
CVDf2 CVD2017m 7846 0.41217 0.43822 0.0000263 0.41215 0.393608 0.430349 <.0001
CVDf3 CVD2017m 7846 0.31755 0.32892 0.0000202 0.31753 0.297494 0.337290 <.0001
SBP17msW CVD2017m 7846 0.19492 0.19745 0.0000124 0.19491 0.173531 0.216106 <.0001
sex_IDsW CVD2017m 7846 -0.39527 -0.41803 -0.0000252 -0.39525 -0.413755 -0.376410 <.0001
The REG Procedure
Model: MODEL1
Dependent Variable: CVD2017msW CVD/100k/year ages 15-69
Number of Observations Read 7846
Number of Observations Used 7846
Stepwise Selection: Step 1
Variable CVDf1 Entered: R-Square = 0.1717 and C(p) = 3424.469
Analysis of Variance
Source DF Sum of
Squares Mean
Square F Value Pr > F
Model 1 1347.15123 1347.15123 1626.24 <.0001
Error 7844 6497.84877 0.82838
Corrected Total 7845 7845.00000
Variable Parameter
Estimate Standard
Error Type II SS F Value Pr > F
Intercept -5.1482E-14 0.01028 2.07946E-23 0.00 1.0000
CVDf1 0.01917 0.00047543 1347.15123 1626.24 <.0001
Bounds on condition number: 1, 1
Stepwise Selection: Step 2
Variable CVDf2 Entered: R-Square = 0.3247 and C(p) = 1345.693
Analysis of Variance
Source DF Sum of
Squares Mean
Square F Value Pr > F
Model 2 2547.22254 1273.61127 1885.50 <.0001
Error 7843 5297.77746 0.67548
Corrected Total 7845 7845.00000
Variable Parameter
Estimate Standard
Error Type II SS F Value Pr > F
Intercept -4.5343E-14 0.00928 1.61312E-23 0.00 1.0000
CVDf1 0.01823 0.00042990 1214.49377 1797.98 <.0001
CVDf2 21.80856 0.51740 1200.07130 1776.62 <.0001
Bounds on condition number: 1.0027, 4.0109
Stepwise Selection: Step 3
Variable CVDf3 Entered: R-Square = 0.3877 and C(p) = 491.2699
Analysis of Variance
Source DF Sum of
Squares Mean
Square F Value Pr > F
Model 3 3041.15785 1013.71928 1654.84 <.0001
Error 7842 4803.84215 0.61258
Corrected Total 7845 7845.00000
Variable Parameter
Estimate Standard
Error Type II SS F Value Pr > F
Intercept -4.4852E-14 0.00884 1.57839E-23 0.00 1.0000
CVDf1 0.01438 0.00043120 681.70150 1112.84 <.0001
CVDf2 23.56816 0.49661 1379.71384 2252.30 <.0001
CVDf3 0.13686 0.00482 493.93531 806.32 <.0001
Bounds on condition number: 1.1212, 9.7564
Stepwise Selection: Step 4
Variable SBP17msW Entered: R-Square = 0.4126 and C(p) = 153.4894
Analysis of Variance
Source DF Sum of
Squares Mean
Square F Value Pr > F
Model 4 3237.12365 809.28091 1377.11 <.0001
Error 7841 4607.87635 0.58766
Corrected Total 7845 7845.00000
Variable Parameter
Estimate Standard
Error Type II SS F Value Pr > F
Intercept -1.2225E-13 0.00865 1.17256E-22 0.00 1.0000
CVDf1 0.01453 0.00042241 695.29876 1183.16 <.0001
CVDf2 23.97928 0.48692 1425.21520 2425.22 <.0001
CVDf3 0.11914 0.00482 359.11233 611.08 <.0001
SBP17msW 0.16191 0.00887 195.96581 333.47 <.0001
Bounds on condition number: 1.1686, 17.406
Stepwise Selection: Step 5
Variable sex_IDsW Entered: R-Square = 0.4236 and C(p) = 6.0000
Analysis of Variance
Source DF Sum of
Squares Mean
Square F Value Pr > F
Model 5 3323.34051 664.66810 1152.45 <.0001
Error 7840 4521.65949 0.57674
Corrected Total 7845 7845.00000
Variable Parameter
Estimate Standard
Error Type II SS F Value Pr > F
Intercept -1.2845E-13 0.00857 1.29454E-22 0.00 1.0000
CVDf1 0.01476 0.00041890 716.32252 1242.01 <.0001
CVDf2 16.45730 0.78178 255.58277 443.15 <.0001
CVDf3 0.11176 0.00481 311.03348 539.29 <.0001
SBP17msW 0.17041 0.00881 215.71423 374.02 <.0001
sex_IDsW -0.17070 0.01396 86.21686 149.49 <.0001
Bounds on condition number: 2.6811, 43.454
All variables left in the model are significant at the 0.2500 level.
All variables have been entered into the model.
Summary of Stepwise Selection
Step Variable
Entered Variable
Removed Label Number
Vars In Partial
R-Square Model
R-Square C(p) F Value Pr > F
1 CVDf1 Combination of 20 diet risk factors 1 0.1717 0.1717 3424.47 1626.24 <.0001
2 CVDf2 Kidney Dz, Smoking tobacco, sublingual tobacco 2 0.1530 0.3247 1345.69 1776.62 <.0001
3 CVDf3 Types 1 and 2 DM 3 0.0630 0.3877 491.270 806.32 <.0001
4 SBP17msW Systolic BP mm Hg 4 0.0250 0.4126 153.489 333.47 <.0001
5 sex_IDsW Sex male 1 and female 2 5 0.0110 0.4236 6.0000 149.49 <.0001
The REG Procedure
Model: MODEL1
Dependent Variable: CVD2017msW CVD/100k/year ages 15-69
Panel of heat maps of residuals by regressors for CVD2017msW
.
The CORR Procedure
1 With Variables: CVD2017m
6 Variables: CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum Label
CVD2017m 7846 543.66067 288.00939 4265562 73.47499 1844 CVD/100k/year ages 15-69
CVDf1 0 . . . . . Combination of 20 diet risk factors
CVDf2 0 . . . . . Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3 0 . . . . . Types 1 and 2 DM
CVDf4 0 . . . . . Mult reg CVDf1 CVDf2 CVDf3 SBP sex
SBP17msW 7846 4.9045E-13 1.00000 3.84807E-9 -2.43011 3.23505 Systolic BP mm Hg
sex_IDsW 7846 0 1.00000 0 -0.99994 0.99994 Sex male 1 and female 2
Pearson Correlation Coefficients
Prob > |r| under H0: Rho=0
Number of Observations
CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW
CVD2017m
CVD/100k/year ages 15-69
.
.
0
.
.
0
.
.
0
.
.
0
0.19492
<.0001
7846
-0.39527
<.0001
7846
Pearson Correlation Statistics (Fisher's z Transformation)
Variable With Variable N Sample Correlation Fisher's z Bias Adjustment Correlation Estimate 95% Confidence Limits p Value for
H0:Rho=0
CVDf1 CVD2017m 0 . . . . . . .
CVDf2 CVD2017m 0 . . . . . . .
CVDf3 CVD2017m 0 . . . . . . .
CVDf4 CVD2017m 0 . . . . . . .
SBP17msW CVD2017m 7846 0.19492 0.19745 0.0000124 0.19491 0.173531 0.216106 <.0001
sex_IDsW CVD2017m 7846 -0.39527 -0.41803 -0.0000252 -0.39525 -0.413755 -0.376410 <.0001
I use SAS on Demand for Academics SAS9.4
Where did CVDf1, CVDf2, CVDf3, CVDf4 go?
Thanks.
You did not include your log, you included your output. I would have expected to see errors for the PROC CORR step because your LABEL statement looks problematic. I would have expected to see the text into quotations.
Also, if you put the LABEL in the data step you do not have to repeat it for each PROC.
This part of the code is the problem. Order of operations. You're now reading from the original source file, not the file you created in the work library called SOURCE. That means the variables you created previously no longer exist.
data source; set projects.source; label CVDf1=Combination of 20 diet risk factors CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3=Types 1 and 2 DM CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ; CVDf4= + CVDf1 * 0.01476 + CVDf2 * 16.45730 + CVDf3 * 0.11176 + SBP17msW * 0.17041 - sex_IDsW * 0.17070 ; run; quit
You probably want to do this instead:
data source2;
set source;
You don't seem to be able to follow how the data is flowing yet, so for now my recommendation would be to not re-use the same names ANYWHERE in your code. Use a unique name for each data set so that you can trace things.
@dkcundiffMD wrote:
This analysis uses complex variables and multiple regression. I'll show the code and the log.
CVDf1, CVDf2, CVDf3, CVDf4 got lost at the end.
data source; set projects.source; CVDf1= - pmeat17KCsW * 5.49 * 0.0443 - rmeat17KCsW * 50.70 * 0.0560 - fish17KCsW * 10.01 * 0.0412 - milk17KCsW * 25.37 * 0.0398 - poultry16KCsW * 45.06 * 0.0867 - eggs16KCsW * 19.47 * 0.1544 - SFA16KCsW * 191.27 * 0.0603 * 0.46 - PUFA17KCsW * 82.24 * 0.1033 * 0.46 - TFA17KCsW * 13.40 * 0.0128 * 0.46 - Alcohol17KCsW * 81.71 * 0.0047 + Sugarb17KCsW * 297.65 * 0.0136 + potatoes16KCsW * 84.16 * 0.0024 - corn16KCsW * 34.67 * 0.0037 - fruits17KCsW * 40.39 * 0.1291 - Vegetables17KCsW * 80.14 * 0.0127 - nutsseeds17KCsW * 8.51 * 0.0797 - wgrains17KCsW * 55.65 * 0.0376 - legumes17KCsW * 51.66 * 0.0005 + rice16KCsW * 141.23 * 0.0001 - swtpot16KCsW * 22.67 * 0.0270 ; CVDf2= + smoke17msW * 0.2046 * 0.08899 + SLTobacco17msW * 0.0680 * 0.08179 + kidneydz17msW * 0.056 * 0.037636 ; CVDf3= + T1DM17msW * 10.34 * 0.1169 + T2DM17msW * 17.47 * 0.05193 ; run;quit; proc corr data=source fisher; label CVDf1=Combination of 20 diet risk factors CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3=Types 1 and 2 DM; var CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW; with CVD2017m; run;quit;
* Works ok to here;
Proc reg data=source; label CVDf1=Combination of 20 diet risk factors CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3=Types 1 and 2 DM; ; model CVD2017msW=CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW / selection=STEPWISE slentry=.25 slstay=.25; run; quit; * Works ok to here;
*CVD formula R2=0.4236; data source; set projects.source; label CVDf1=Combination of 20 diet risk factors CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3=Types 1 and 2 DM CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ; CVDf4= + CVDf1 * 0.01476 + CVDf2 * 16.45730 + CVDf3 * 0.11176 + SBP17msW * 0.17041 - sex_IDsW * 0.17070 ; run; quit; proc corr data=source fisher; label CVDf1=Combination of 20 diet risk factors CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3=Types 1 and 2 DM CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ; var CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW; with CVD2017m; run;quit;
*Here SAS can't find CVDf1 CVDf2 CVDf3 CVDf4;Here is the log;
The CORR Procedure 1 With Variables: CVD2017m 5 Variables: CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum Label CVD2017m 7846 543.66067 288.00939 4265562 73.47499 1844 CVD/100k/year ages 15-69 CVDf1 7846 0 21.61400 0 -74.89858 33.74918 Combination of 20 diet risk factors CVDf2 7846 0 0.01796 0 -0.02383 0.04322 Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3 7846 0 1.94134 0 -2.18859 20.35708 Types 1 and 2 DM SBP17msW 7846 4.9045E-13 1.00000 3.84807E-9 -2.43011 3.23505 Systolic BP mm Hg sex_IDsW 7846 0 1.00000 0 -0.99994 0.99994 Sex male 1 and female 2 Pearson Correlation Coefficients, N = 7846 Prob > |r| under H0: Rho=0 CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW CVD2017m CVD/100k/year ages 15-69 0.41439 <.0001 0.41217 <.0001 0.31755 <.0001 0.19492 <.0001 -0.39527 <.0001 Pearson Correlation Statistics (Fisher's z Transformation) Variable With Variable N Sample Correlation Fisher's z Bias Adjustment Correlation Estimate 95% Confidence Limits p Value for H0:Rho=0 CVDf1 CVD2017m 7846 0.41439 0.44090 0.0000264 0.41437 0.395873 0.432532 <.0001 CVDf2 CVD2017m 7846 0.41217 0.43822 0.0000263 0.41215 0.393608 0.430349 <.0001 CVDf3 CVD2017m 7846 0.31755 0.32892 0.0000202 0.31753 0.297494 0.337290 <.0001 SBP17msW CVD2017m 7846 0.19492 0.19745 0.0000124 0.19491 0.173531 0.216106 <.0001 sex_IDsW CVD2017m 7846 -0.39527 -0.41803 -0.0000252 -0.39525 -0.413755 -0.376410 <.0001 The REG Procedure Model: MODEL1 Dependent Variable: CVD2017msW CVD/100k/year ages 15-69 Number of Observations Read 7846 Number of Observations Used 7846 Stepwise Selection: Step 1 Variable CVDf1 Entered: R-Square = 0.1717 and C(p) = 3424.469 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 1 1347.15123 1347.15123 1626.24 <.0001 Error 7844 6497.84877 0.82838 Corrected Total 7845 7845.00000 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept -5.1482E-14 0.01028 2.07946E-23 0.00 1.0000 CVDf1 0.01917 0.00047543 1347.15123 1626.24 <.0001 Bounds on condition number: 1, 1 Stepwise Selection: Step 2 Variable CVDf2 Entered: R-Square = 0.3247 and C(p) = 1345.693 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 2 2547.22254 1273.61127 1885.50 <.0001 Error 7843 5297.77746 0.67548 Corrected Total 7845 7845.00000 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept -4.5343E-14 0.00928 1.61312E-23 0.00 1.0000 CVDf1 0.01823 0.00042990 1214.49377 1797.98 <.0001 CVDf2 21.80856 0.51740 1200.07130 1776.62 <.0001 Bounds on condition number: 1.0027, 4.0109 Stepwise Selection: Step 3 Variable CVDf3 Entered: R-Square = 0.3877 and C(p) = 491.2699 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 3 3041.15785 1013.71928 1654.84 <.0001 Error 7842 4803.84215 0.61258 Corrected Total 7845 7845.00000 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept -4.4852E-14 0.00884 1.57839E-23 0.00 1.0000 CVDf1 0.01438 0.00043120 681.70150 1112.84 <.0001 CVDf2 23.56816 0.49661 1379.71384 2252.30 <.0001 CVDf3 0.13686 0.00482 493.93531 806.32 <.0001 Bounds on condition number: 1.1212, 9.7564 Stepwise Selection: Step 4 Variable SBP17msW Entered: R-Square = 0.4126 and C(p) = 153.4894 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 4 3237.12365 809.28091 1377.11 <.0001 Error 7841 4607.87635 0.58766 Corrected Total 7845 7845.00000 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept -1.2225E-13 0.00865 1.17256E-22 0.00 1.0000 CVDf1 0.01453 0.00042241 695.29876 1183.16 <.0001 CVDf2 23.97928 0.48692 1425.21520 2425.22 <.0001 CVDf3 0.11914 0.00482 359.11233 611.08 <.0001 SBP17msW 0.16191 0.00887 195.96581 333.47 <.0001 Bounds on condition number: 1.1686, 17.406 Stepwise Selection: Step 5 Variable sex_IDsW Entered: R-Square = 0.4236 and C(p) = 6.0000 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 5 3323.34051 664.66810 1152.45 <.0001 Error 7840 4521.65949 0.57674 Corrected Total 7845 7845.00000 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept -1.2845E-13 0.00857 1.29454E-22 0.00 1.0000 CVDf1 0.01476 0.00041890 716.32252 1242.01 <.0001 CVDf2 16.45730 0.78178 255.58277 443.15 <.0001 CVDf3 0.11176 0.00481 311.03348 539.29 <.0001 SBP17msW 0.17041 0.00881 215.71423 374.02 <.0001 sex_IDsW -0.17070 0.01396 86.21686 149.49 <.0001 Bounds on condition number: 2.6811, 43.454 All variables left in the model are significant at the 0.2500 level. All variables have been entered into the model. Summary of Stepwise Selection Step Variable Entered Variable Removed Label Number Vars In Partial R-Square Model R-Square C(p) F Value Pr > F 1 CVDf1 Combination of 20 diet risk factors 1 0.1717 0.1717 3424.47 1626.24 <.0001 2 CVDf2 Kidney Dz, Smoking tobacco, sublingual tobacco 2 0.1530 0.3247 1345.69 1776.62 <.0001 3 CVDf3 Types 1 and 2 DM 3 0.0630 0.3877 491.270 806.32 <.0001 4 SBP17msW Systolic BP mm Hg 4 0.0250 0.4126 153.489 333.47 <.0001 5 sex_IDsW Sex male 1 and female 2 5 0.0110 0.4236 6.0000 149.49 <.0001 The REG Procedure Model: MODEL1 Dependent Variable: CVD2017msW CVD/100k/year ages 15-69 Panel of heat maps of residuals by regressors for CVD2017msW . The CORR Procedure 1 With Variables: CVD2017m 6 Variables: CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum Label CVD2017m 7846 543.66067 288.00939 4265562 73.47499 1844 CVD/100k/year ages 15-69 CVDf1 0 . . . . . Combination of 20 diet risk factors CVDf2 0 . . . . . Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3 0 . . . . . Types 1 and 2 DM CVDf4 0 . . . . . Mult reg CVDf1 CVDf2 CVDf3 SBP sex SBP17msW 7846 4.9045E-13 1.00000 3.84807E-9 -2.43011 3.23505 Systolic BP mm Hg sex_IDsW 7846 0 1.00000 0 -0.99994 0.99994 Sex male 1 and female 2 Pearson Correlation Coefficients Prob > |r| under H0: Rho=0 Number of Observations CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW CVD2017m CVD/100k/year ages 15-69 . . 0 . . 0 . . 0 . . 0 0.19492 <.0001 7846 -0.39527 <.0001 7846 Pearson Correlation Statistics (Fisher's z Transformation) Variable With Variable N Sample Correlation Fisher's z Bias Adjustment Correlation Estimate 95% Confidence Limits p Value for H0:Rho=0 CVDf1 CVD2017m 0 . . . . . . . CVDf2 CVD2017m 0 . . . . . . . CVDf3 CVD2017m 0 . . . . . . . CVDf4 CVD2017m 0 . . . . . . . SBP17msW CVD2017m 7846 0.19492 0.19745 0.0000124 0.19491 0.173531 0.216106 <.0001 sex_IDsW CVD2017m 7846 -0.39527 -0.41803 -0.0000252 -0.39525 -0.413755 -0.376410 <.0001
I use SAS on Demand for Academics SAS9.4
Where did CVDf1, CVDf2, CVDf3, CVDf4 go?
Thanks.
I don't have the data and submit log, so I'm guessing, but I think that in the second data step, the dataset to be set is work.source, not projects.source.
The other thing I noticed is that you are repeating the same label statement over and over again.
I think if you specify it in the first data step, you don't need to specify it every time.
I am sorry if this is intentionally done this way.
It was helpful to use work.source instead of source. However, I still lose CVDf1, CVDf2, CVDf3, and CVDf4 after a regression step. The final answer is CVDf5 and it gets that answer. That answer depends on CVDf1, CVDf2, CVDf3, and CVDf4. If I stop before the final CVDf5 data step followed by proc corr, then CVDf1, CVDf2, CVDf3, and CVDf4 are included in the work.souce. If I take it to the final step to derive CVDf5, then CVDf5 is included in the output but not CVDf1, CVDf2, CVDf3, and CVDf4. The log says: WARNING: Variable CVDF1 not found in data set WORK.SOURCE.
How does it lose CVDf1, CVDf2, CVDf3, and CVDf4 in the final proc corr? The last data step must have those variables because CVDf5, the final answer, depends on them.
Thanks
SAS code:
data work.source;
set projects.source;
CVDf1=
- pmeat17KCsW * 5.49 * 0.0443
- rmeat17KCsW * 50.70 * 0.0560
- fish17KCsW * 10.01 * 0.0412
- milk17KCsW * 25.37 * 0.0398
- poultry16KCsW * 45.06 * 0.0867
- eggs16KCsW * 19.47 * 0.1544
- SFA16KCsW * 191.27 * 0.0603 * 0.46
- PUFA17KCsW * 82.24 * 0.1033 * 0.46
- TFA17KCsW * 13.40 * 0.0128 * 0.46
- Alcohol17KCsW * 81.71 * 0.0047
+ Sugarb17KCsW * 297.65 * 0.0136
+ potatoes16KCsW * 84.16 * 0.0024
- corn16KCsW * 34.67 * 0.0037
- fruits17KCsW * 40.39 * 0.1291
- Vegetables17KCsW * 80.14 * 0.0127
- nutsseeds17KCsW * 8.51 * 0.0797
- wgrains17KCsW * 55.65 * 0.0376
- legumes17KCsW * 51.66 * 0.0005
+ rice16KCsW * 141.23 * 0.0001
- swtpot16KCsW * 22.67 * 0.0270
;
CVDf2=
+ smoke17msW * 0.2046 * 0.08899
+ SLTobacco17msW * 0.0680 * 0.08179
+ kidneydz17msW * 0.056 * 0.037636
;
CVDf3=
+ T1DM17msW * 10.34 * 0.1169
+ T2DM17msW * 17.47 * 0.05193
;
run;quit;
proc corr data=work.source fisher;
label
CVDf1=Combination of 20 diet risk factors
CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3=Types 1 and 2 DM;
var CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW;
with CVD2017m;
run;quit;
Proc reg data=work.source;
label
CVDf1=Combination of 20 diet risk factors
CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3=Types 1 and 2 DM;
;
model CVD2017msW=CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW
/ selection=STEPWISE
slentry=.25 slstay=.25;
run; quit;
*
CVDf1 0.01476 0.00041890 716.32252 1242.01 <.0001
CVDf2 16.45730 0.78178 255.58277 443.15 <.0001
CVDf3 0.11176 0.00481 311.03348 539.29 <.0001
SBP17msW 0.17041 0.00881 215.71423 374.02 <.0001
sex_IDsW -0.17070 0.01396 86.21686 149.49 <.0001
1 CVDf1 Combination of 20 diet risk factors 1 0.1717 0.1717 3424.47 1626.24 <.0001
2 CVDf2 Child wt, air pollution, Kidney Dz 2 0.1530 0.3247 1345.69 1776.62 <.0001
3 CVDf3 Types 1 and 2 DM 3 0.0630 0.3877 491.270 806.32 <.0001
4 SBP17msW Systolic BP mm Hg 4 0.0250 0.4126 153.489 333.47 <.0001
5 sex_IDsW Sex male 1 and female 2 5 0.0110 0.4236 6.0000 149.49 <.0001
;
*CVD formula R2=0.4236;
data work.source;
set projects.source;
label
CVDf1=Combination of 20 diet risk factors
CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3=Types 1 and 2 DM
CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ;
CVDf4=
+ CVDf1 * 0.01476
+ CVDf2 * 16.45730
+ CVDf3 * 0.11176
+ SBP17msW * 0.17041
- sex_IDsW * 0.17070
;
run; quit;
proc corr data=work.source fisher;
label
CVDf1=Combination of 20 diet risk factors
CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3=Types 1 and 2 DM
CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ;
var CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW;
with CVD2017m;
run;quit;
*CVD formula R2=0.4236;
data work.source;
set projects.source;
CVDf5=
- pmeat17KCsW * 0.09
- rmeat17KCsW * 1.14
- fish17KCsW * 0.17
- milk17KCsW * 0.39
- poultry16KCsW * 1.54
- eggs16kcsW * 1.23
- SFA16KCsW * 2.10
- PUFA17KCsW * 1.56
- TFA17KCsW * 0.03
- ALCOHOL17KCsW * 0.13
+ Sugarb17KCsW * 1.59
+ potatoes16KCsW * 0.09
- corn16kcsW * 0.06
- fruits17KCsW * 2.12
- vegetables17KCsW* 0.38
- nutsseeds17KCsW * 0.27
- wgrains17KCsW * 0.88
- legumes17kcsW * 0.01
+ rice16kcsW * 0.00
- swtpot16kcsW * 0.27
+ smoke17msW * 8.45
+ SLTobacco17msW * 2.57
+ kidneydz17msW * 0.98
+ T1DM17msW * 3.80
+ T2DM17msW * 2.85
+ SBP17msW * 4.83
- sex_IDsW * 4.84
;
run; quit;
proc corr data=work.source fisher;
label
CVDf1=Combination of 20 diet risk factors
CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3=Types 1 and 2 DM
CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex
CVDf5=Final CVD risk factor formula;
var CVDf5 SBP17msW sex_IDsW;
with CVD2017m;
run;quit;
results:
The REG Procedure
Model: MODEL1
Dependent Variable: CVD2017msW CVD/100k/year ages 15-69
Number of Observations Read 7846
Number of Observations Used 7846
Stepwise Selection: Step 1
Variable CVDf1 Entered: R-Square = 0.1717 and C(p) = 3424.469
Analysis of Variance
Source DF Sum of
Squares Mean
Square F Value Pr > F
Model 1 1347.15123 1347.15123 1626.24 <.0001
Error 7844 6497.84877 0.82838
Corrected Total 7845 7845.00000
Variable Parameter
Estimate Standard
Error Type II SS F Value Pr > F
Intercept -5.1482E-14 0.01028 2.07946E-23 0.00 1.0000
CVDf1 0.01917 0.00047543 1347.15123 1626.24 <.0001
Bounds on condition number: 1, 1
Stepwise Selection: Step 2
Variable CVDf2 Entered: R-Square = 0.3247 and C(p) = 1345.693
Analysis of Variance
Source DF Sum of
Squares Mean
Square F Value Pr > F
Model 2 2547.22254 1273.61127 1885.50 <.0001
Error 7843 5297.77746 0.67548
Corrected Total 7845 7845.00000
Variable Parameter
Estimate Standard
Error Type II SS F Value Pr > F
Intercept -4.5343E-14 0.00928 1.61312E-23 0.00 1.0000
CVDf1 0.01823 0.00042990 1214.49377 1797.98 <.0001
CVDf2 21.80856 0.51740 1200.07130 1776.62 <.0001
Bounds on condition number: 1.0027, 4.0109
Stepwise Selection: Step 3
Variable CVDf3 Entered: R-Square = 0.3877 and C(p) = 491.2699
Analysis of Variance
Source DF Sum of
Squares Mean
Square F Value Pr > F
Model 3 3041.15785 1013.71928 1654.84 <.0001
Error 7842 4803.84215 0.61258
Corrected Total 7845 7845.00000
Variable Parameter
Estimate Standard
Error Type II SS F Value Pr > F
Intercept -4.4852E-14 0.00884 1.57839E-23 0.00 1.0000
CVDf1 0.01438 0.00043120 681.70150 1112.84 <.0001
CVDf2 23.56816 0.49661 1379.71384 2252.30 <.0001
CVDf3 0.13686 0.00482 493.93531 806.32 <.0001
Bounds on condition number: 1.1212, 9.7564
Stepwise Selection: Step 4
Variable SBP17msW Entered: R-Square = 0.4126 and C(p) = 153.4894
Analysis of Variance
Source DF Sum of
Squares Mean
Square F Value Pr > F
Model 4 3237.12365 809.28091 1377.11 <.0001
Error 7841 4607.87635 0.58766
Corrected Total 7845 7845.00000
Variable Parameter
Estimate Standard
Error Type II SS F Value Pr > F
Intercept -1.2225E-13 0.00865 1.17256E-22 0.00 1.0000
CVDf1 0.01453 0.00042241 695.29876 1183.16 <.0001
CVDf2 23.97928 0.48692 1425.21520 2425.22 <.0001
CVDf3 0.11914 0.00482 359.11233 611.08 <.0001
SBP17msW 0.16191 0.00887 195.96581 333.47 <.0001
Bounds on condition number: 1.1686, 17.406
Stepwise Selection: Step 5
Variable sex_IDsW Entered: R-Square = 0.4236 and C(p) = 6.0000
Analysis of Variance
Source DF Sum of
Squares Mean
Square F Value Pr > F
Model 5 3323.34051 664.66810 1152.45 <.0001
Error 7840 4521.65949 0.57674
Corrected Total 7845 7845.00000
Variable Parameter
Estimate Standard
Error Type II SS F Value Pr > F
Intercept -1.2845E-13 0.00857 1.29454E-22 0.00 1.0000
CVDf1 0.01476 0.00041890 716.32252 1242.01 <.0001
CVDf2 16.45730 0.78178 255.58277 443.15 <.0001
CVDf3 0.11176 0.00481 311.03348 539.29 <.0001
SBP17msW 0.17041 0.00881 215.71423 374.02 <.0001
sex_IDsW -0.17070 0.01396 86.21686 149.49 <.0001
Bounds on condition number: 2.6811, 43.454
All variables left in the model are significant at the 0.2500 level.
All variables have been entered into the model.
Summary of Stepwise Selection
Step Variable
Entered Variable
Removed Label Number
Vars In Partial
R-Square Model
R-Square C(p) F Value Pr > F
1 CVDf1 Combination of 20 diet risk factors 1 0.1717 0.1717 3424.47 1626.24 <.0001
2 CVDf2 Kidney Dz, Smoking tobacco, sublingual tobacco 2 0.1530 0.3247 1345.69 1776.62 <.0001
3 CVDf3 Types 1 and 2 DM 3 0.0630 0.3877 491.270 806.32 <.0001
4 SBP17msW Systolic BP mm Hg 4 0.0250 0.4126 153.489 333.47 <.0001
5 sex_IDsW Sex male 1 and female 2 5 0.0110 0.4236 6.0000 149.49 <.0001
The REG Procedure
Model: MODEL1
Dependent Variable: CVD2017msW CVD/100k/year ages 15-69
Panel of heat maps of residuals by regressors for CVD2017msW.
The CORR Procedure
1 With Variables: CVD2017m
6 Variables: CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum Label
CVD2017m 7846 543.66067 288.00939 4265562 73.47499 1844 CVD/100k/year ages 15-69
CVDf1 0 . . . . . Combination of 20 diet risk factors
CVDf2 0 . . . . . Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3 0 . . . . . Types 1 and 2 DM
CVDf4 0 . . . . . Mult reg CVDf1 CVDf2 CVDf3 SBP sex
SBP17msW 7846 4.9045E-13 1.00000 3.84807E-9 -2.43011 3.23505 Systolic BP mm Hg
sex_IDsW 7846 0 1.00000 0 -0.99994 0.99994 Sex male 1 and female 2
Pearson Correlation Coefficients
Prob > |r| under H0: Rho=0
Number of Observations
CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW
CVD2017m
CVD/100k/year ages 15-69
.
.
0
.
.
0
.
.
0
.
.
0
0.19492
<.0001
7846
-0.39527
<.0001
7846
Pearson Correlation Statistics (Fisher's z Transformation)
Variable With Variable N Sample Correlation Fisher's z Bias Adjustment Correlation Estimate 95% Confidence Limits p Value for
H0:Rho=0
CVDf1 CVD2017m 0 . . . . . . .
CVDf2 CVD2017m 0 . . . . . . .
CVDf3 CVD2017m 0 . . . . . . .
CVDf4 CVD2017m 0 . . . . . . .
SBP17msW CVD2017m 7846 0.19492 0.19745 0.0000124 0.19491 0.173531 0.216106 <.0001
sex_IDsW CVD2017m 7846 -0.39527 -0.41803 -0.0000252 -0.39525 -0.413755 -0.376410 <.0001
The CORR Procedure
1 With Variables: CVD2017m
3 Variables: CVDf5 SBP17msW sex_IDsW
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum Label
CVD2017m 7846 543.66067 288.00939 4265562 73.47499 1844 CVD/100k/year ages 15-69
CVDf5 7846 2.6151E-12 18.11747 2.05182E-8 -48.22646 83.18385 Final CVD risk factor formula
SBP17msW 7846 4.9045E-13 1.00000 3.84807E-9 -2.43011 3.23505 Systolic BP mm Hg
sex_IDsW 7846 0 1.00000 0 -0.99994 0.99994 Sex male 1 and female 2
Pearson Correlation Coefficients, N = 7846
Prob > |r| under H0: Rho=0
CVDf5 SBP17msW sex_IDsW
CVD2017m
CVD/100k/year ages 15-69
0.65244
<.0001
0.19492
<.0001
-0.39527
<.0001
Pearson Correlation Statistics (Fisher's z Transformation)
Variable With Variable N Sample Correlation Fisher's z Bias Adjustment Correlation Estimate 95% Confidence Limits p Value for
H0:Rho=0
CVDf5 CVD2017m 7846 0.65244 0.77953 0.0000416 0.65241 0.639517 0.664941 <.0001
SBP17msW CVD2017m 7846 0.19492 0.19745 0.0000124 0.19491 0.173531 0.216106 <.0001
sex_IDsW CVD2017m 7846 -0.39527 -0.41803 -0.0000252 -0.39525 -0.413755 -0.376410 <.0001
Log;
217 *CVD formula R2=0.4236;
218 data work.source;
219 set projects.source;
220 CVDf5=
221 -pmeat17KCsW*0.09
222 -rmeat17KCsW*1.14
223 -fish17KCsW*0.17
224 -milk17KCsW*0.39
225 -poultry16KCsW*1.54
226 -eggs16kcsW*1.23
227 -SFA16KCsW*2.10
228 -PUFA17KCsW*1.56
229 -TFA17KCsW*0.03
230 -ALCOHOL17KCsW*0.13
231 +Sugarb17KCsW*1.59
232 +potatoes16KCsW*0.09
233 -corn16kcsW*0.06
234 -fruits17KCsW*2.12
235 -vegetables17KCsW*0.38
236 -nutsseeds17KCsW*0.27
237 -wgrains17KCsW*0.88
238 -legumes17kcsW*0.01
239 +rice16kcsW*0.00
240 -swtpot16kcsW*0.27
241 +smoke17msW*8.45
242 +SLTobacco17msW*2.57
243 +kidneydz17msW*0.98
244 +T1DM17msW*3.80
245 +T2DM17msW*2.85
246 +SBP17msW*4.83
247 -sex_IDsW*4.84
248
249 ;
250 run;
NOTE: There were 7846 observations read from the data set PROJECTS.SOURCE.
NOTE: The data set WORK.SOURCE has 7846 observations and 1273 variables.
NOTE: DATA statement used (Total process time):
real time 0.11 seconds
user cpu time 0.02 seconds
system cpu time 0.10 seconds
memory 5043.34k
OS Memory 68032.00k
Timestamp 04/30/2021 02:16:01 AM
Step Count 299 Switch Count 15
Page Faults 0
Page Reclaims 743
Page Swaps 0
Voluntary Context Switches 51
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 167952
250 ! quit;
251
252 proc corr data=work.source fisher;
253 label
254 CVDf1=Combination of 20 diet risk factors
255 CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
WARNING: Variable CVDF1 not found in data set WORK.SOURCE.
256 CVDf3=Types 1 and 2 DM
WARNING: Variable CVDF2 not found in data set WORK.SOURCE.
257 CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex
WARNING: Variable CVDF3 not found in data set WORK.SOURCE.
258 CVDf5=Final CVD risk factor formula;
WARNING: Variable CVDF4 not found in data set WORK.SOURCE.
259 var CVDf5 SBP17msW sex_IDsW;
260 with CVD2017m;
261 run;
NOTE: PROCEDURE CORR used (Total process time):
real time 0.08 seconds
user cpu time 0.06 seconds
system cpu time 0.02 seconds
memory 2373.78k
OS Memory 64952.00k
Timestamp 04/30/2021 02:16:01 AM
Step Count 300 Switch Count 13
Page Faults 0
Page Reclaims 246
Page Swaps 0
Voluntary Context Switches 36
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 8
261 ! quit;
262
263 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
275
I think you need to modify the following two parts from set projects.source to set work.source.
Isn't CVDf1-CVDf5 included in projects.source?
data work.source;
set projects.source;
label
CVDf1=Combination of 20 diet risk factors
CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3=Types 1 and 2 DM
CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ;
CVDf4=
data work.source;
set projects.source;
CVDf5=
You did not include your log, you included your output. I would have expected to see errors for the PROC CORR step because your LABEL statement looks problematic. I would have expected to see the text into quotations.
Also, if you put the LABEL in the data step you do not have to repeat it for each PROC.
This part of the code is the problem. Order of operations. You're now reading from the original source file, not the file you created in the work library called SOURCE. That means the variables you created previously no longer exist.
data source; set projects.source; label CVDf1=Combination of 20 diet risk factors CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3=Types 1 and 2 DM CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ; CVDf4= + CVDf1 * 0.01476 + CVDf2 * 16.45730 + CVDf3 * 0.11176 + SBP17msW * 0.17041 - sex_IDsW * 0.17070 ; run; quit
You probably want to do this instead:
data source2;
set source;
You don't seem to be able to follow how the data is flowing yet, so for now my recommendation would be to not re-use the same names ANYWHERE in your code. Use a unique name for each data set so that you can trace things.
@dkcundiffMD wrote:
This analysis uses complex variables and multiple regression. I'll show the code and the log.
CVDf1, CVDf2, CVDf3, CVDf4 got lost at the end.
data source; set projects.source; CVDf1= - pmeat17KCsW * 5.49 * 0.0443 - rmeat17KCsW * 50.70 * 0.0560 - fish17KCsW * 10.01 * 0.0412 - milk17KCsW * 25.37 * 0.0398 - poultry16KCsW * 45.06 * 0.0867 - eggs16KCsW * 19.47 * 0.1544 - SFA16KCsW * 191.27 * 0.0603 * 0.46 - PUFA17KCsW * 82.24 * 0.1033 * 0.46 - TFA17KCsW * 13.40 * 0.0128 * 0.46 - Alcohol17KCsW * 81.71 * 0.0047 + Sugarb17KCsW * 297.65 * 0.0136 + potatoes16KCsW * 84.16 * 0.0024 - corn16KCsW * 34.67 * 0.0037 - fruits17KCsW * 40.39 * 0.1291 - Vegetables17KCsW * 80.14 * 0.0127 - nutsseeds17KCsW * 8.51 * 0.0797 - wgrains17KCsW * 55.65 * 0.0376 - legumes17KCsW * 51.66 * 0.0005 + rice16KCsW * 141.23 * 0.0001 - swtpot16KCsW * 22.67 * 0.0270 ; CVDf2= + smoke17msW * 0.2046 * 0.08899 + SLTobacco17msW * 0.0680 * 0.08179 + kidneydz17msW * 0.056 * 0.037636 ; CVDf3= + T1DM17msW * 10.34 * 0.1169 + T2DM17msW * 17.47 * 0.05193 ; run;quit; proc corr data=source fisher; label CVDf1=Combination of 20 diet risk factors CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3=Types 1 and 2 DM; var CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW; with CVD2017m; run;quit;
* Works ok to here;
Proc reg data=source; label CVDf1=Combination of 20 diet risk factors CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3=Types 1 and 2 DM; ; model CVD2017msW=CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW / selection=STEPWISE slentry=.25 slstay=.25; run; quit; * Works ok to here;
*CVD formula R2=0.4236; data source; set projects.source; label CVDf1=Combination of 20 diet risk factors CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3=Types 1 and 2 DM CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ; CVDf4= + CVDf1 * 0.01476 + CVDf2 * 16.45730 + CVDf3 * 0.11176 + SBP17msW * 0.17041 - sex_IDsW * 0.17070 ; run; quit; proc corr data=source fisher; label CVDf1=Combination of 20 diet risk factors CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3=Types 1 and 2 DM CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ; var CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW; with CVD2017m; run;quit;
*Here SAS can't find CVDf1 CVDf2 CVDf3 CVDf4;Here is the log;
The CORR Procedure 1 With Variables: CVD2017m 5 Variables: CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum Label CVD2017m 7846 543.66067 288.00939 4265562 73.47499 1844 CVD/100k/year ages 15-69 CVDf1 7846 0 21.61400 0 -74.89858 33.74918 Combination of 20 diet risk factors CVDf2 7846 0 0.01796 0 -0.02383 0.04322 Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3 7846 0 1.94134 0 -2.18859 20.35708 Types 1 and 2 DM SBP17msW 7846 4.9045E-13 1.00000 3.84807E-9 -2.43011 3.23505 Systolic BP mm Hg sex_IDsW 7846 0 1.00000 0 -0.99994 0.99994 Sex male 1 and female 2 Pearson Correlation Coefficients, N = 7846 Prob > |r| under H0: Rho=0 CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW CVD2017m CVD/100k/year ages 15-69 0.41439 <.0001 0.41217 <.0001 0.31755 <.0001 0.19492 <.0001 -0.39527 <.0001 Pearson Correlation Statistics (Fisher's z Transformation) Variable With Variable N Sample Correlation Fisher's z Bias Adjustment Correlation Estimate 95% Confidence Limits p Value for H0:Rho=0 CVDf1 CVD2017m 7846 0.41439 0.44090 0.0000264 0.41437 0.395873 0.432532 <.0001 CVDf2 CVD2017m 7846 0.41217 0.43822 0.0000263 0.41215 0.393608 0.430349 <.0001 CVDf3 CVD2017m 7846 0.31755 0.32892 0.0000202 0.31753 0.297494 0.337290 <.0001 SBP17msW CVD2017m 7846 0.19492 0.19745 0.0000124 0.19491 0.173531 0.216106 <.0001 sex_IDsW CVD2017m 7846 -0.39527 -0.41803 -0.0000252 -0.39525 -0.413755 -0.376410 <.0001 The REG Procedure Model: MODEL1 Dependent Variable: CVD2017msW CVD/100k/year ages 15-69 Number of Observations Read 7846 Number of Observations Used 7846 Stepwise Selection: Step 1 Variable CVDf1 Entered: R-Square = 0.1717 and C(p) = 3424.469 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 1 1347.15123 1347.15123 1626.24 <.0001 Error 7844 6497.84877 0.82838 Corrected Total 7845 7845.00000 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept -5.1482E-14 0.01028 2.07946E-23 0.00 1.0000 CVDf1 0.01917 0.00047543 1347.15123 1626.24 <.0001 Bounds on condition number: 1, 1 Stepwise Selection: Step 2 Variable CVDf2 Entered: R-Square = 0.3247 and C(p) = 1345.693 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 2 2547.22254 1273.61127 1885.50 <.0001 Error 7843 5297.77746 0.67548 Corrected Total 7845 7845.00000 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept -4.5343E-14 0.00928 1.61312E-23 0.00 1.0000 CVDf1 0.01823 0.00042990 1214.49377 1797.98 <.0001 CVDf2 21.80856 0.51740 1200.07130 1776.62 <.0001 Bounds on condition number: 1.0027, 4.0109 Stepwise Selection: Step 3 Variable CVDf3 Entered: R-Square = 0.3877 and C(p) = 491.2699 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 3 3041.15785 1013.71928 1654.84 <.0001 Error 7842 4803.84215 0.61258 Corrected Total 7845 7845.00000 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept -4.4852E-14 0.00884 1.57839E-23 0.00 1.0000 CVDf1 0.01438 0.00043120 681.70150 1112.84 <.0001 CVDf2 23.56816 0.49661 1379.71384 2252.30 <.0001 CVDf3 0.13686 0.00482 493.93531 806.32 <.0001 Bounds on condition number: 1.1212, 9.7564 Stepwise Selection: Step 4 Variable SBP17msW Entered: R-Square = 0.4126 and C(p) = 153.4894 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 4 3237.12365 809.28091 1377.11 <.0001 Error 7841 4607.87635 0.58766 Corrected Total 7845 7845.00000 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept -1.2225E-13 0.00865 1.17256E-22 0.00 1.0000 CVDf1 0.01453 0.00042241 695.29876 1183.16 <.0001 CVDf2 23.97928 0.48692 1425.21520 2425.22 <.0001 CVDf3 0.11914 0.00482 359.11233 611.08 <.0001 SBP17msW 0.16191 0.00887 195.96581 333.47 <.0001 Bounds on condition number: 1.1686, 17.406 Stepwise Selection: Step 5 Variable sex_IDsW Entered: R-Square = 0.4236 and C(p) = 6.0000 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 5 3323.34051 664.66810 1152.45 <.0001 Error 7840 4521.65949 0.57674 Corrected Total 7845 7845.00000 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept -1.2845E-13 0.00857 1.29454E-22 0.00 1.0000 CVDf1 0.01476 0.00041890 716.32252 1242.01 <.0001 CVDf2 16.45730 0.78178 255.58277 443.15 <.0001 CVDf3 0.11176 0.00481 311.03348 539.29 <.0001 SBP17msW 0.17041 0.00881 215.71423 374.02 <.0001 sex_IDsW -0.17070 0.01396 86.21686 149.49 <.0001 Bounds on condition number: 2.6811, 43.454 All variables left in the model are significant at the 0.2500 level. All variables have been entered into the model. Summary of Stepwise Selection Step Variable Entered Variable Removed Label Number Vars In Partial R-Square Model R-Square C(p) F Value Pr > F 1 CVDf1 Combination of 20 diet risk factors 1 0.1717 0.1717 3424.47 1626.24 <.0001 2 CVDf2 Kidney Dz, Smoking tobacco, sublingual tobacco 2 0.1530 0.3247 1345.69 1776.62 <.0001 3 CVDf3 Types 1 and 2 DM 3 0.0630 0.3877 491.270 806.32 <.0001 4 SBP17msW Systolic BP mm Hg 4 0.0250 0.4126 153.489 333.47 <.0001 5 sex_IDsW Sex male 1 and female 2 5 0.0110 0.4236 6.0000 149.49 <.0001 The REG Procedure Model: MODEL1 Dependent Variable: CVD2017msW CVD/100k/year ages 15-69 Panel of heat maps of residuals by regressors for CVD2017msW . The CORR Procedure 1 With Variables: CVD2017m 6 Variables: CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum Label CVD2017m 7846 543.66067 288.00939 4265562 73.47499 1844 CVD/100k/year ages 15-69 CVDf1 0 . . . . . Combination of 20 diet risk factors CVDf2 0 . . . . . Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3 0 . . . . . Types 1 and 2 DM CVDf4 0 . . . . . Mult reg CVDf1 CVDf2 CVDf3 SBP sex SBP17msW 7846 4.9045E-13 1.00000 3.84807E-9 -2.43011 3.23505 Systolic BP mm Hg sex_IDsW 7846 0 1.00000 0 -0.99994 0.99994 Sex male 1 and female 2 Pearson Correlation Coefficients Prob > |r| under H0: Rho=0 Number of Observations CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW CVD2017m CVD/100k/year ages 15-69 . . 0 . . 0 . . 0 . . 0 0.19492 <.0001 7846 -0.39527 <.0001 7846 Pearson Correlation Statistics (Fisher's z Transformation) Variable With Variable N Sample Correlation Fisher's z Bias Adjustment Correlation Estimate 95% Confidence Limits p Value for H0:Rho=0 CVDf1 CVD2017m 0 . . . . . . . CVDf2 CVD2017m 0 . . . . . . . CVDf3 CVD2017m 0 . . . . . . . CVDf4 CVD2017m 0 . . . . . . . SBP17msW CVD2017m 7846 0.19492 0.19745 0.0000124 0.19491 0.173531 0.216106 <.0001 sex_IDsW CVD2017m 7846 -0.39527 -0.41803 -0.0000252 -0.39525 -0.413755 -0.376410 <.0001
I use SAS on Demand for Academics SAS9.4
Where did CVDf1, CVDf2, CVDf3, CVDf4 go?
Thanks.
Well, using source2 and source3 did the job. However, it doesn't explain why that was necessary and when it might again be suddenly necessary.
Thanks for your help.
Result:
The CORR Procedure
1 With Variables: CVD2017m
5 Variables: CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum Label
CVD2017m 7846 543.66067 288.00939 4265562 73.47499 1844 CVD/100k/year ages 15-69
CVDf1 7846 0 21.61400 0 -74.89858 33.74918 Combination of 20 diet risk factors
CVDf2 7846 0 0.01796 0 -0.02383 0.04322 Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3 7846 0 1.94134 0 -2.18859 20.35708 Types 1 and 2 DM
SBP17msW 7846 4.9045E-13 1.00000 3.84807E-9 -2.43011 3.23505 Systolic BP mm Hg
sex_IDsW 7846 0 1.00000 0 -0.99994 0.99994 Sex male 1 and female 2
Pearson Correlation Coefficients, N = 7846
Prob > |r| under H0: Rho=0
CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW
CVD2017m
CVD/100k/year ages 15-69
0.41439
<.0001
0.41217
<.0001
0.31755
<.0001
0.19492
<.0001
-0.39527
<.0001
Pearson Correlation Statistics (Fisher's z Transformation)
Variable With Variable N Sample Correlation Fisher's z Bias Adjustment Correlation Estimate 95% Confidence Limits p Value for
H0:Rho=0
CVDf1 CVD2017m 7846 0.41439 0.44090 0.0000264 0.41437 0.395873 0.432532 <.0001
CVDf2 CVD2017m 7846 0.41217 0.43822 0.0000263 0.41215 0.393608 0.430349 <.0001
CVDf3 CVD2017m 7846 0.31755 0.32892 0.0000202 0.31753 0.297494 0.337290 <.0001
SBP17msW CVD2017m 7846 0.19492 0.19745 0.0000124 0.19491 0.173531 0.216106 <.0001
sex_IDsW CVD2017m 7846 -0.39527 -0.41803 -0.0000252 -0.39525 -0.413755 -0.376410 <.0001
The REG Procedure
Model: MODEL1
Dependent Variable: CVD2017msW CVD/100k/year ages 15-69
Number of Observations Read 7846
Number of Observations Used 7846
Stepwise Selection: Step 1
Variable CVDf1 Entered: R-Square = 0.1717 and C(p) = 3424.469
Analysis of Variance
Source DF Sum of
Squares Mean
Square F Value Pr > F
Model 1 1347.15123 1347.15123 1626.24 <.0001
Error 7844 6497.84877 0.82838
Corrected Total 7845 7845.00000
Variable Parameter
Estimate Standard
Error Type II SS F Value Pr > F
Intercept -5.1482E-14 0.01028 2.07946E-23 0.00 1.0000
CVDf1 0.01917 0.00047543 1347.15123 1626.24 <.0001
Bounds on condition number: 1, 1
Stepwise Selection: Step 2
Variable CVDf2 Entered: R-Square = 0.3247 and C(p) = 1345.693
Analysis of Variance
Source DF Sum of
Squares Mean
Square F Value Pr > F
Model 2 2547.22254 1273.61127 1885.50 <.0001
Error 7843 5297.77746 0.67548
Corrected Total 7845 7845.00000
Variable Parameter
Estimate Standard
Error Type II SS F Value Pr > F
Intercept -4.5343E-14 0.00928 1.61312E-23 0.00 1.0000
CVDf1 0.01823 0.00042990 1214.49377 1797.98 <.0001
CVDf2 21.80856 0.51740 1200.07130 1776.62 <.0001
Bounds on condition number: 1.0027, 4.0109
Stepwise Selection: Step 3
Variable CVDf3 Entered: R-Square = 0.3877 and C(p) = 491.2699
Analysis of Variance
Source DF Sum of
Squares Mean
Square F Value Pr > F
Model 3 3041.15785 1013.71928 1654.84 <.0001
Error 7842 4803.84215 0.61258
Corrected Total 7845 7845.00000
Variable Parameter
Estimate Standard
Error Type II SS F Value Pr > F
Intercept -4.4852E-14 0.00884 1.57839E-23 0.00 1.0000
CVDf1 0.01438 0.00043120 681.70150 1112.84 <.0001
CVDf2 23.56816 0.49661 1379.71384 2252.30 <.0001
CVDf3 0.13686 0.00482 493.93531 806.32 <.0001
Bounds on condition number: 1.1212, 9.7564
Stepwise Selection: Step 4
Variable SBP17msW Entered: R-Square = 0.4126 and C(p) = 153.4894
Analysis of Variance
Source DF Sum of
Squares Mean
Square F Value Pr > F
Model 4 3237.12365 809.28091 1377.11 <.0001
Error 7841 4607.87635 0.58766
Corrected Total 7845 7845.00000
Variable Parameter
Estimate Standard
Error Type II SS F Value Pr > F
Intercept -1.2225E-13 0.00865 1.17256E-22 0.00 1.0000
CVDf1 0.01453 0.00042241 695.29876 1183.16 <.0001
CVDf2 23.97928 0.48692 1425.21520 2425.22 <.0001
CVDf3 0.11914 0.00482 359.11233 611.08 <.0001
SBP17msW 0.16191 0.00887 195.96581 333.47 <.0001
Bounds on condition number: 1.1686, 17.406
Stepwise Selection: Step 5
Variable sex_IDsW Entered: R-Square = 0.4236 and C(p) = 6.0000
Analysis of Variance
Source DF Sum of
Squares Mean
Square F Value Pr > F
Model 5 3323.34051 664.66810 1152.45 <.0001
Error 7840 4521.65949 0.57674
Corrected Total 7845 7845.00000
Variable Parameter
Estimate Standard
Error Type II SS F Value Pr > F
Intercept -1.2845E-13 0.00857 1.29454E-22 0.00 1.0000
CVDf1 0.01476 0.00041890 716.32252 1242.01 <.0001
CVDf2 16.45730 0.78178 255.58277 443.15 <.0001
CVDf3 0.11176 0.00481 311.03348 539.29 <.0001
SBP17msW 0.17041 0.00881 215.71423 374.02 <.0001
sex_IDsW -0.17070 0.01396 86.21686 149.49 <.0001
Bounds on condition number: 2.6811, 43.454
All variables left in the model are significant at the 0.2500 level.
All variables have been entered into the model.
Summary of Stepwise Selection
Step Variable
Entered Variable
Removed Label Number
Vars In Partial
R-Square Model
R-Square C(p) F Value Pr > F
1 CVDf1 Combination of 20 diet risk factors 1 0.1717 0.1717 3424.47 1626.24 <.0001
2 CVDf2 Kidney Dz, Smoking tobacco, sublingual tobacco 2 0.1530 0.3247 1345.69 1776.62 <.0001
3 CVDf3 Types 1 and 2 DM 3 0.0630 0.3877 491.270 806.32 <.0001
4 SBP17msW Systolic BP mm Hg 4 0.0250 0.4126 153.489 333.47 <.0001
5 sex_IDsW Sex male 1 and female 2 5 0.0110 0.4236 6.0000 149.49 <.0001
The REG Procedure
Model: MODEL1
Dependent Variable: CVD2017msW CVD/100k/year ages 15-69
Panel of heat maps of residuals by regressors for CVD2017msW.
The CORR Procedure
1 With Variables: CVD2017m
6 Variables: CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum Label
CVD2017m 7846 543.66067 288.00939 4265562 73.47499 1844 CVD/100k/year ages 15-69
CVDf1 7846 0 21.61400 0 -74.89858 33.74918 Combination of 20 diet risk factors
CVDf2 7846 0 0.01796 0 -0.02383 0.04322 Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3 7846 0 1.94134 0 -2.18859 20.35708 Types 1 and 2 DM
CVDf4 7846 9.0822E-14 0.65083 7.1259E-10 -1.75968 2.95661 Mult reg CVDf1 CVDf2 CVDf3 SBP sex
SBP17msW 7846 4.9045E-13 1.00000 3.84807E-9 -2.43011 3.23505 Systolic BP mm Hg
sex_IDsW 7846 0 1.00000 0 -0.99994 0.99994 Sex male 1 and female 2
Pearson Correlation Coefficients, N = 7846
Prob > |r| under H0: Rho=0
CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW
CVD2017m
CVD/100k/year ages 15-69
0.41439
<.0001
0.41217
<.0001
0.31755
<.0001
0.65087
<.0001
0.19492
<.0001
-0.39527
<.0001
Pearson Correlation Statistics (Fisher's z Transformation)
Variable With Variable N Sample Correlation Fisher's z Bias Adjustment Correlation Estimate 95% Confidence Limits p Value for
H0:Rho=0
CVDf1 CVD2017m 7846 0.41439 0.44090 0.0000264 0.41437 0.395873 0.432532 <.0001
CVDf2 CVD2017m 7846 0.41217 0.43822 0.0000263 0.41215 0.393608 0.430349 <.0001
CVDf3 CVD2017m 7846 0.31755 0.32892 0.0000202 0.31753 0.297494 0.337290 <.0001
CVDf4 CVD2017m 7846 0.65087 0.77680 0.0000415 0.65084 0.637900 0.663415 <.0001
SBP17msW CVD2017m 7846 0.19492 0.19745 0.0000124 0.19491 0.173531 0.216106 <.0001
sex_IDsW CVD2017m 7846 -0.39527 -0.41803 -0.0000252 -0.39525 -0.413755 -0.376410 <.0001
The CORR Procedure
1 With Variables: CVD2017m
7 Variables: CVDf1 CVDf2 CVDf3 CVDf4 CVDf5 SBP17msW sex_IDsW
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum Label
CVD2017m 7846 543.66067 288.00939 4265562 73.47499 1844 CVD/100k/year ages 15-69
CVDf1 7846 0 21.61400 0 -74.89858 33.74918 Combination of 20 diet risk factors
CVDf2 7846 0 0.01796 0 -0.02383 0.04322 Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3 7846 0 1.94134 0 -2.18859 20.35708 Types 1 and 2 DM
CVDf4 7846 9.0822E-14 0.65083 7.1259E-10 -1.75968 2.95661 Mult reg CVDf1 CVDf2 CVDf3 SBP sex
CVDf5 7846 2.6151E-12 18.11747 2.05182E-8 -48.22646 83.18385 Final CVD risk factor formula
SBP17msW 7846 4.9045E-13 1.00000 3.84807E-9 -2.43011 3.23505 Systolic BP mm Hg
sex_IDsW 7846 0 1.00000 0 -0.99994 0.99994 Sex male 1 and female 2
Pearson Correlation Coefficients, N = 7846
Prob > |r| under H0: Rho=0
CVDf1 CVDf2 CVDf3 CVDf4 CVDf5 SBP17msW sex_IDsW
CVD2017m
CVD/100k/year ages 15-69
0.41439
<.0001
0.41217
<.0001
0.31755
<.0001
0.65087
<.0001
0.65244
<.0001
0.19492
<.0001
-0.39527
<.0001
Pearson Correlation Statistics (Fisher's z Transformation)
Variable With Variable N Sample Correlation Fisher's z Bias Adjustment Correlation Estimate 95% Confidence Limits p Value for
H0:Rho=0
CVDf1 CVD2017m 7846 0.41439 0.44090 0.0000264 0.41437 0.395873 0.432532 <.0001
CVDf2 CVD2017m 7846 0.41217 0.43822 0.0000263 0.41215 0.393608 0.430349 <.0001
CVDf3 CVD2017m 7846 0.31755 0.32892 0.0000202 0.31753 0.297494 0.337290 <.0001
CVDf4 CVD2017m 7846 0.65087 0.77680 0.0000415 0.65084 0.637900 0.663415 <.0001
CVDf5 CVD2017m 7846 0.65244 0.77953 0.0000416 0.65241 0.639517 0.664941 <.0001
SBP17msW CVD2017m 7846 0.19492 0.19745 0.0000124 0.19491 0.173531 0.216106 <.0001
sex_IDsW CVD2017m 7846 -0.39527 -0.41803 -0.0000252 -0.39525 -0.413755 -0.376410 <.0001
SAS code:
data work.source;
set projects.source;
label
CVDf1=Combination of 20 diet risk factors
CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3=Types 1 and 2 DM;
CVDf1=
- pmeat17KCsW * 5.49 * 0.0443
- rmeat17KCsW * 50.70 * 0.0560
- fish17KCsW * 10.01 * 0.0412
- milk17KCsW * 25.37 * 0.0398
- poultry16KCsW * 45.06 * 0.0867
- eggs16KCsW * 19.47 * 0.1544
- SFA16KCsW * 191.27 * 0.0603 * 0.46
- PUFA17KCsW * 82.24 * 0.1033 * 0.46
- TFA17KCsW * 13.40 * 0.0128 * 0.46
- Alcohol17KCsW * 81.71 * 0.0047
+ Sugarb17KCsW * 297.65 * 0.0136
+ potatoes16KCsW * 84.16 * 0.0024
- corn16KCsW * 34.67 * 0.0037
- fruits17KCsW * 40.39 * 0.1291
- Vegetables17KCsW * 80.14 * 0.0127
- nutsseeds17KCsW * 8.51 * 0.0797
- wgrains17KCsW * 55.65 * 0.0376
- legumes17KCsW * 51.66 * 0.0005
+ rice16KCsW * 141.23 * 0.0001
- swtpot16KCsW * 22.67 * 0.0270
;
CVDf2=
+ smoke17msW * 0.2046 * 0.08899
+ SLTobacco17msW * 0.0680 * 0.08179
+ kidneydz17msW * 0.056 * 0.037636
;
CVDf3=
+ T1DM17msW * 10.34 * 0.1169
+ T2DM17msW * 17.47 * 0.05193
;
run;quit;
proc corr data=work.source fisher;
var CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW;
with CVD2017m;
run;quit;
Proc reg data=work.source;
*label
CVDf1=Combination of 20 diet risk factors
CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3=Types 1 and 2 DM
;
model CVD2017msW=CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW
/ selection=STEPWISE
slentry=.25 slstay=.25;
run; quit;
*
CVDf1 0.01476 0.00041890 716.32252 1242.01 <.0001
CVDf2 16.45730 0.78178 255.58277 443.15 <.0001
CVDf3 0.11176 0.00481 311.03348 539.29 <.0001
SBP17msW 0.17041 0.00881 215.71423 374.02 <.0001
sex_IDsW -0.17070 0.01396 86.21686 149.49 <.0001
1 CVDf1 Combination of 20 diet risk factors 1 0.1717 0.1717 3424.47 1626.24 <.0001
2 CVDf2 Child wt, air pollution, Kidney Dz 2 0.1530 0.3247 1345.69 1776.62 <.0001
3 CVDf3 Types 1 and 2 DM 3 0.0630 0.3877 491.270 806.32 <.0001
4 SBP17msW Systolic BP mm Hg 4 0.0250 0.4126 153.489 333.47 <.0001
5 sex_IDsW Sex male 1 and female 2 5 0.0110 0.4236 6.0000 149.49 <.0001
;
*CVD formula R2=0.4236;
*data work.source;
* set projects.source;
data source2;
set source;
label
CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ;
CVDf4=
+ CVDf1 * 0.01476
+ CVDf2 * 16.45730
+ CVDf3 * 0.11176
+ SBP17msW * 0.17041
- sex_IDsW * 0.17070
;
run; quit;
proc corr data=source2 fisher;
var CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW;
with CVD2017m;
run;quit;
*CVD formula R2=0.4236;
data source3;
set source2;
label
CVDf5=Final CVD risk factor formula;
CVDf5=
- pmeat17KCsW * 0.09
- rmeat17KCsW * 1.14
- fish17KCsW * 0.17
- milk17KCsW * 0.39
- poultry16KCsW * 1.54
- eggs16kcsW * 1.23
- SFA16KCsW * 2.10
- PUFA17KCsW * 1.56
- TFA17KCsW * 0.03
- ALCOHOL17KCsW * 0.13
+ Sugarb17KCsW * 1.59
+ potatoes16KCsW * 0.09
- corn16kcsW * 0.06
- fruits17KCsW * 2.12
- vegetables17KCsW* 0.38
- nutsseeds17KCsW * 0.27
- wgrains17KCsW * 0.88
- legumes17kcsW * 0.01
+ rice16kcsW * 0.00
- swtpot16kcsW * 0.27
+ smoke17msW * 8.45
+ SLTobacco17msW * 2.57
+ kidneydz17msW * 0.98
+ T1DM17msW * 3.80
+ T2DM17msW * 2.85
+ SBP17msW * 4.83
- sex_IDsW * 4.84
;
run; quit;
proc corr data=source3 fisher;
var CVDf1 CVDf2 CVDf3 CVDf4 CVDf5 SBP17msW sex_IDsW;
with CVD2017m;
run;quit;
Now I understand it.
Many thanks.
David Cundiff
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.