Solved: Re: Lost variables between proc reg and proc corr

dkcundiffMD · Posted 04-29-2021 08:04 PM

This analysis uses complex variables and multiple regression. I'll show the code and the log. CVDf1, CVDf2, CVDf3, CVDf4 got lost at the end.

data source;
	set projects.source;
CVDf1=
-	pmeat17KCsW 	* 	5.49 	* 	0.0443
-	rmeat17KCsW 	* 	50.70 	* 	0.0560
-	fish17KCsW 		* 	10.01 	* 	0.0412
-	milk17KCsW 		* 	25.37 	* 	0.0398
-	poultry16KCsW 	* 	45.06 	* 	0.0867
-	eggs16KCsW 		* 	19.47 	* 	0.1544
-	SFA16KCsW 		* 	191.27 	* 	0.0603 	* 	0.46
-	PUFA17KCsW 		* 	82.24 	* 	0.1033 	* 	0.46
-	TFA17KCsW 		* 	13.40 	* 	0.0128 	* 	0.46
-	Alcohol17KCsW 	* 	81.71 	* 	0.0047
+	Sugarb17KCsW 	* 	297.65 	* 	0.0136
+	potatoes16KCsW 	* 	84.16 	* 	0.0024
-	corn16KCsW 		* 	34.67 	* 	0.0037
-	fruits17KCsW 	* 	40.39 	* 	0.1291
- Vegetables17KCsW 	* 	80.14 	* 	0.0127
-	nutsseeds17KCsW * 	8.51 	* 	0.0797
-	wgrains17KCsW 	* 	55.65 	* 	0.0376
-	legumes17KCsW 	* 	51.66 	* 	0.0005
+	rice16KCsW 		* 	141.23 	* 	0.0001
-	swtpot16KCsW 	* 	22.67 	* 	0.0270
;

CVDf2=
+	smoke17msW 		* 0.2046 * 0.08899
+	SLTobacco17msW	* 0.0680 * 0.08179
+	kidneydz17msW  * 0.056	* 0.037636
;
CVDf3=
+	T1DM17msW	* 	10.34 * 0.1169
+	T2DM17msW	* 	17.47 * 0.05193
;
run;quit;

proc corr data=source fisher;
	label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM;
	var CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW;
	with CVD2017m;
run;quit;


* Works ok to here;


Proc reg data=source;
	label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM;
	;
	model CVD2017msW=CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW 
	 	/ selection=STEPWISE 
		slentry=.25 slstay=.25;
	run; quit;
	
	* Works ok to here;


*CVD formula R2=0.4236;
data source;
	set projects.source;
label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM
	CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ;	
CVDf4=
+	CVDf1 * 0.01476	
+	CVDf2 * 16.45730
+	CVDf3 * 0.11176	
+	SBP17msW * 0.17041
-	sex_IDsW * 0.17070
;

run; quit;

proc corr data=source fisher;
	label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM
	CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ;
	var CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW;
	with CVD2017m;
run;quit; 

*Here SAS can't find CVDf1 CVDf2 CVDf3 CVDf4;

Here is the log;

The CORR Procedure

1 With Variables:	CVD2017m
5 Variables:	CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW
Simple Statistics
Variable	N	Mean	Std Dev	Sum	Minimum	Maximum	Label
CVD2017m	7846	543.66067	288.00939	4265562	73.47499	1844	CVD/100k/year ages 15-69
CVDf1	7846	0	21.61400	0	-74.89858	33.74918	Combination of 20 diet risk factors
CVDf2	7846	0	0.01796	0	-0.02383	0.04322	Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3	7846	0	1.94134	0	-2.18859	20.35708	Types 1 and 2 DM
SBP17msW	7846	4.9045E-13	1.00000	3.84807E-9	-2.43011	3.23505	Systolic BP mm Hg
sex_IDsW	7846	0	1.00000	0	-0.99994	0.99994	Sex male 1 and female 2
Pearson Correlation Coefficients, N = 7846
Prob > |r| under H0: Rho=0
 	CVDf1	CVDf2	CVDf3	SBP17msW	sex_IDsW
CVD2017m
CVD/100k/year ages 15-69
0.41439
<.0001
0.41217
<.0001
0.31755
<.0001
0.19492
<.0001
-0.39527
<.0001
Pearson Correlation Statistics (Fisher's z Transformation)
Variable	With Variable	N	Sample Correlation	Fisher's z	Bias Adjustment	Correlation Estimate	95% Confidence Limits	p Value for
H0:Rho=0
CVDf1	CVD2017m	7846	0.41439	0.44090	0.0000264	0.41437	0.395873	0.432532	<.0001
CVDf2	CVD2017m	7846	0.41217	0.43822	0.0000263	0.41215	0.393608	0.430349	<.0001
CVDf3	CVD2017m	7846	0.31755	0.32892	0.0000202	0.31753	0.297494	0.337290	<.0001
SBP17msW	CVD2017m	7846	0.19492	0.19745	0.0000124	0.19491	0.173531	0.216106	<.0001
sex_IDsW	CVD2017m	7846	-0.39527	-0.41803	-0.0000252	-0.39525	-0.413755	-0.376410	<.0001
The REG Procedure

Model: MODEL1

Dependent Variable: CVD2017msW CVD/100k/year ages 15-69

Number of Observations Read	7846
Number of Observations Used	7846
Stepwise Selection: Step 1

Variable CVDf1 Entered: R-Square = 0.1717 and C(p) = 3424.469

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	1	1347.15123	1347.15123	1626.24	<.0001
Error	7844	6497.84877	0.82838	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-5.1482E-14	0.01028	2.07946E-23	0.00	1.0000
CVDf1	0.01917	0.00047543	1347.15123	1626.24	<.0001
Bounds on condition number: 1, 1

Stepwise Selection: Step 2

Variable CVDf2 Entered: R-Square = 0.3247 and C(p) = 1345.693

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	2	2547.22254	1273.61127	1885.50	<.0001
Error	7843	5297.77746	0.67548	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-4.5343E-14	0.00928	1.61312E-23	0.00	1.0000
CVDf1	0.01823	0.00042990	1214.49377	1797.98	<.0001
CVDf2	21.80856	0.51740	1200.07130	1776.62	<.0001
Bounds on condition number: 1.0027, 4.0109

Stepwise Selection: Step 3

Variable CVDf3 Entered: R-Square = 0.3877 and C(p) = 491.2699

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	3	3041.15785	1013.71928	1654.84	<.0001
Error	7842	4803.84215	0.61258	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-4.4852E-14	0.00884	1.57839E-23	0.00	1.0000
CVDf1	0.01438	0.00043120	681.70150	1112.84	<.0001
CVDf2	23.56816	0.49661	1379.71384	2252.30	<.0001
CVDf3	0.13686	0.00482	493.93531	806.32	<.0001
Bounds on condition number: 1.1212, 9.7564

Stepwise Selection: Step 4

Variable SBP17msW Entered: R-Square = 0.4126 and C(p) = 153.4894

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	4	3237.12365	809.28091	1377.11	<.0001
Error	7841	4607.87635	0.58766	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-1.2225E-13	0.00865	1.17256E-22	0.00	1.0000
CVDf1	0.01453	0.00042241	695.29876	1183.16	<.0001
CVDf2	23.97928	0.48692	1425.21520	2425.22	<.0001
CVDf3	0.11914	0.00482	359.11233	611.08	<.0001
SBP17msW	0.16191	0.00887	195.96581	333.47	<.0001
Bounds on condition number: 1.1686, 17.406

Stepwise Selection: Step 5

Variable sex_IDsW Entered: R-Square = 0.4236 and C(p) = 6.0000

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	5	3323.34051	664.66810	1152.45	<.0001
Error	7840	4521.65949	0.57674	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-1.2845E-13	0.00857	1.29454E-22	0.00	1.0000
CVDf1	0.01476	0.00041890	716.32252	1242.01	<.0001
CVDf2	16.45730	0.78178	255.58277	443.15	<.0001
CVDf3	0.11176	0.00481	311.03348	539.29	<.0001
SBP17msW	0.17041	0.00881	215.71423	374.02	<.0001
sex_IDsW	-0.17070	0.01396	86.21686	149.49	<.0001
Bounds on condition number: 2.6811, 43.454

All variables left in the model are significant at the 0.2500 level.

All variables have been entered into the model.

Summary of Stepwise Selection
Step	Variable
Entered	Variable
Removed	Label	Number
Vars In	Partial
R-Square	Model
R-Square	C(p)	F Value	Pr > F
1	CVDf1	 	Combination of 20 diet risk factors	1	0.1717	0.1717	3424.47	1626.24	<.0001
2	CVDf2	 	Kidney Dz, Smoking tobacco, sublingual tobacco	2	0.1530	0.3247	1345.69	1776.62	<.0001
3	CVDf3	 	Types 1 and 2 DM	3	0.0630	0.3877	491.270	806.32	<.0001
4	SBP17msW	 	Systolic BP mm Hg	4	0.0250	0.4126	153.489	333.47	<.0001
5	sex_IDsW	 	Sex male 1 and female 2	5	0.0110	0.4236	6.0000	149.49	<.0001
The REG Procedure

Model: MODEL1

Dependent Variable: CVD2017msW CVD/100k/year ages 15-69

Panel of heat maps of residuals by regressors for CVD2017msW
.
The CORR Procedure

1 With Variables:	CVD2017m
6 Variables:	CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW
Simple Statistics
Variable	N	Mean	Std Dev	Sum	Minimum	Maximum	Label
CVD2017m	7846	543.66067	288.00939	4265562	73.47499	1844	CVD/100k/year ages 15-69
CVDf1	0	.	.	.	.	.	Combination of 20 diet risk factors
CVDf2	0	.	.	.	.	.	Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3	0	.	.	.	.	.	Types 1 and 2 DM
CVDf4	0	.	.	.	.	.	Mult reg CVDf1 CVDf2 CVDf3 SBP sex
SBP17msW	7846	4.9045E-13	1.00000	3.84807E-9	-2.43011	3.23505	Systolic BP mm Hg
sex_IDsW	7846	0	1.00000	0	-0.99994	0.99994	Sex male 1 and female 2
Pearson Correlation Coefficients
Prob > |r| under H0: Rho=0
Number of Observations
 	CVDf1	CVDf2	CVDf3	CVDf4	SBP17msW	sex_IDsW
CVD2017m
CVD/100k/year ages 15-69
.
.
0
.
.
0
.
.
0
.
.
0
0.19492
<.0001
7846
-0.39527
<.0001
7846
Pearson Correlation Statistics (Fisher's z Transformation)
Variable	With Variable	N	Sample Correlation	Fisher's z	Bias Adjustment	Correlation Estimate	95% Confidence Limits	p Value for
H0:Rho=0
CVDf1	CVD2017m	0	.	.	.	.	.	.	.
CVDf2	CVD2017m	0	.	.	.	.	.	.	.
CVDf3	CVD2017m	0	.	.	.	.	.	.	.
CVDf4	CVD2017m	0	.	.	.	.	.	.	.
SBP17msW	CVD2017m	7846	0.19492	0.19745	0.0000124	0.19491	0.173531	0.216106	<.0001
sex_IDsW	CVD2017m	7846	-0.39527	-0.41803	-0.0000252	-0.39525	-0.413755	-0.376410	<.0001

I use SAS on Demand for Academics SAS9.4

Where did CVDf1, CVDf2, CVDf3, CVDf4 go?

Thanks.

Reeza · Posted 04-29-2021 10:09 PM

You did not include your log, you included your output. I would have expected to see errors for the PROC CORR step because your LABEL statement looks problematic. I would have expected to see the text into quotations.

Also, if you put the LABEL in the data step you do not have to repeat it for each PROC.

This part of the code is the problem. Order of operations. You're now reading from the original source file, not the file you created in the work library called SOURCE. That means the variables you created previously no longer exist.

data source;
	set projects.source;
label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM
	CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ;	
CVDf4=
+	CVDf1 * 0.01476	
+	CVDf2 * 16.45730
+	CVDf3 * 0.11176	
+	SBP17msW * 0.17041
-	sex_IDsW * 0.17070
;

run; quit

You probably want to do this instead:

data source2;
	set source;

You don't seem to be able to follow how the data is flowing yet, so for now my recommendation would be to not re-use the same names ANYWHERE in your code. Use a unique name for each data set so that you can trace things.

Spoiler

@dkcundiffMD wrote:

This analysis uses complex variables and multiple regression. I'll show the code and the log. CVDf1, CVDf2, CVDf3, CVDf4 got lost at the end.

data source;
	set projects.source;
CVDf1=
-	pmeat17KCsW 	* 	5.49 	* 	0.0443
-	rmeat17KCsW 	* 	50.70 	* 	0.0560
-	fish17KCsW 		* 	10.01 	* 	0.0412
-	milk17KCsW 		* 	25.37 	* 	0.0398
-	poultry16KCsW 	* 	45.06 	* 	0.0867
-	eggs16KCsW 		* 	19.47 	* 	0.1544
-	SFA16KCsW 		* 	191.27 	* 	0.0603 	* 	0.46
-	PUFA17KCsW 		* 	82.24 	* 	0.1033 	* 	0.46
-	TFA17KCsW 		* 	13.40 	* 	0.0128 	* 	0.46
-	Alcohol17KCsW 	* 	81.71 	* 	0.0047
+	Sugarb17KCsW 	* 	297.65 	* 	0.0136
+	potatoes16KCsW 	* 	84.16 	* 	0.0024
-	corn16KCsW 		* 	34.67 	* 	0.0037
-	fruits17KCsW 	* 	40.39 	* 	0.1291
- Vegetables17KCsW 	* 	80.14 	* 	0.0127
-	nutsseeds17KCsW * 	8.51 	* 	0.0797
-	wgrains17KCsW 	* 	55.65 	* 	0.0376
-	legumes17KCsW 	* 	51.66 	* 	0.0005
+	rice16KCsW 		* 	141.23 	* 	0.0001
-	swtpot16KCsW 	* 	22.67 	* 	0.0270
;

CVDf2=
+	smoke17msW 		* 0.2046 * 0.08899
+	SLTobacco17msW	* 0.0680 * 0.08179
+	kidneydz17msW  * 0.056	* 0.037636
;
CVDf3=
+	T1DM17msW	* 	10.34 * 0.1169
+	T2DM17msW	* 	17.47 * 0.05193
;
run;quit;

proc corr data=source fisher;
	label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM;
	var CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW;
	with CVD2017m;
run;quit;


* Works ok to here;


Proc reg data=source;
	label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM;
	;
	model CVD2017msW=CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW 
	 	/ selection=STEPWISE 
		slentry=.25 slstay=.25;
	run; quit;
	
	* Works ok to here;


*CVD formula R2=0.4236;
data source;
	set projects.source;
label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM
	CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ;	
CVDf4=
+	CVDf1 * 0.01476	
+	CVDf2 * 16.45730
+	CVDf3 * 0.11176	
+	SBP17msW * 0.17041
-	sex_IDsW * 0.17070
;

run; quit;

proc corr data=source fisher;
	label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM
	CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ;
	var CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW;
	with CVD2017m;
run;quit; 

*Here SAS can't find CVDf1 CVDf2 CVDf3 CVDf4;

Here is the log;

The CORR Procedure

1 With Variables:	CVD2017m
5 Variables:	CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW
Simple Statistics
Variable	N	Mean	Std Dev	Sum	Minimum	Maximum	Label
CVD2017m	7846	543.66067	288.00939	4265562	73.47499	1844	CVD/100k/year ages 15-69
CVDf1	7846	0	21.61400	0	-74.89858	33.74918	Combination of 20 diet risk factors
CVDf2	7846	0	0.01796	0	-0.02383	0.04322	Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3	7846	0	1.94134	0	-2.18859	20.35708	Types 1 and 2 DM
SBP17msW	7846	4.9045E-13	1.00000	3.84807E-9	-2.43011	3.23505	Systolic BP mm Hg
sex_IDsW	7846	0	1.00000	0	-0.99994	0.99994	Sex male 1 and female 2
Pearson Correlation Coefficients, N = 7846
Prob > |r| under H0: Rho=0
 	CVDf1	CVDf2	CVDf3	SBP17msW	sex_IDsW
CVD2017m
CVD/100k/year ages 15-69
0.41439
<.0001
0.41217
<.0001
0.31755
<.0001
0.19492
<.0001
-0.39527
<.0001
Pearson Correlation Statistics (Fisher's z Transformation)
Variable	With Variable	N	Sample Correlation	Fisher's z	Bias Adjustment	Correlation Estimate	95% Confidence Limits	p Value for
H0:Rho=0
CVDf1	CVD2017m	7846	0.41439	0.44090	0.0000264	0.41437	0.395873	0.432532	<.0001
CVDf2	CVD2017m	7846	0.41217	0.43822	0.0000263	0.41215	0.393608	0.430349	<.0001
CVDf3	CVD2017m	7846	0.31755	0.32892	0.0000202	0.31753	0.297494	0.337290	<.0001
SBP17msW	CVD2017m	7846	0.19492	0.19745	0.0000124	0.19491	0.173531	0.216106	<.0001
sex_IDsW	CVD2017m	7846	-0.39527	-0.41803	-0.0000252	-0.39525	-0.413755	-0.376410	<.0001
The REG Procedure

Model: MODEL1

Dependent Variable: CVD2017msW CVD/100k/year ages 15-69

Number of Observations Read	7846
Number of Observations Used	7846
Stepwise Selection: Step 1

Variable CVDf1 Entered: R-Square = 0.1717 and C(p) = 3424.469

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	1	1347.15123	1347.15123	1626.24	<.0001
Error	7844	6497.84877	0.82838	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-5.1482E-14	0.01028	2.07946E-23	0.00	1.0000
CVDf1	0.01917	0.00047543	1347.15123	1626.24	<.0001
Bounds on condition number: 1, 1

Stepwise Selection: Step 2

Variable CVDf2 Entered: R-Square = 0.3247 and C(p) = 1345.693

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	2	2547.22254	1273.61127	1885.50	<.0001
Error	7843	5297.77746	0.67548	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-4.5343E-14	0.00928	1.61312E-23	0.00	1.0000
CVDf1	0.01823	0.00042990	1214.49377	1797.98	<.0001
CVDf2	21.80856	0.51740	1200.07130	1776.62	<.0001
Bounds on condition number: 1.0027, 4.0109

Stepwise Selection: Step 3

Variable CVDf3 Entered: R-Square = 0.3877 and C(p) = 491.2699

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	3	3041.15785	1013.71928	1654.84	<.0001
Error	7842	4803.84215	0.61258	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-4.4852E-14	0.00884	1.57839E-23	0.00	1.0000
CVDf1	0.01438	0.00043120	681.70150	1112.84	<.0001
CVDf2	23.56816	0.49661	1379.71384	2252.30	<.0001
CVDf3	0.13686	0.00482	493.93531	806.32	<.0001
Bounds on condition number: 1.1212, 9.7564

Stepwise Selection: Step 4

Variable SBP17msW Entered: R-Square = 0.4126 and C(p) = 153.4894

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	4	3237.12365	809.28091	1377.11	<.0001
Error	7841	4607.87635	0.58766	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-1.2225E-13	0.00865	1.17256E-22	0.00	1.0000
CVDf1	0.01453	0.00042241	695.29876	1183.16	<.0001
CVDf2	23.97928	0.48692	1425.21520	2425.22	<.0001
CVDf3	0.11914	0.00482	359.11233	611.08	<.0001
SBP17msW	0.16191	0.00887	195.96581	333.47	<.0001
Bounds on condition number: 1.1686, 17.406

Stepwise Selection: Step 5

Variable sex_IDsW Entered: R-Square = 0.4236 and C(p) = 6.0000

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	5	3323.34051	664.66810	1152.45	<.0001
Error	7840	4521.65949	0.57674	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-1.2845E-13	0.00857	1.29454E-22	0.00	1.0000
CVDf1	0.01476	0.00041890	716.32252	1242.01	<.0001
CVDf2	16.45730	0.78178	255.58277	443.15	<.0001
CVDf3	0.11176	0.00481	311.03348	539.29	<.0001
SBP17msW	0.17041	0.00881	215.71423	374.02	<.0001
sex_IDsW	-0.17070	0.01396	86.21686	149.49	<.0001
Bounds on condition number: 2.6811, 43.454

All variables left in the model are significant at the 0.2500 level.

All variables have been entered into the model.

Summary of Stepwise Selection
Step	Variable
Entered	Variable
Removed	Label	Number
Vars In	Partial
R-Square	Model
R-Square	C(p)	F Value	Pr > F
1	CVDf1	 	Combination of 20 diet risk factors	1	0.1717	0.1717	3424.47	1626.24	<.0001
2	CVDf2	 	Kidney Dz, Smoking tobacco, sublingual tobacco	2	0.1530	0.3247	1345.69	1776.62	<.0001
3	CVDf3	 	Types 1 and 2 DM	3	0.0630	0.3877	491.270	806.32	<.0001
4	SBP17msW	 	Systolic BP mm Hg	4	0.0250	0.4126	153.489	333.47	<.0001
5	sex_IDsW	 	Sex male 1 and female 2	5	0.0110	0.4236	6.0000	149.49	<.0001
The REG Procedure

Model: MODEL1

Dependent Variable: CVD2017msW CVD/100k/year ages 15-69

Panel of heat maps of residuals by regressors for CVD2017msW
.
The CORR Procedure

1 With Variables:	CVD2017m
6 Variables:	CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW
Simple Statistics
Variable	N	Mean	Std Dev	Sum	Minimum	Maximum	Label
CVD2017m	7846	543.66067	288.00939	4265562	73.47499	1844	CVD/100k/year ages 15-69
CVDf1	0	.	.	.	.	.	Combination of 20 diet risk factors
CVDf2	0	.	.	.	.	.	Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3	0	.	.	.	.	.	Types 1 and 2 DM
CVDf4	0	.	.	.	.	.	Mult reg CVDf1 CVDf2 CVDf3 SBP sex
SBP17msW	7846	4.9045E-13	1.00000	3.84807E-9	-2.43011	3.23505	Systolic BP mm Hg
sex_IDsW	7846	0	1.00000	0	-0.99994	0.99994	Sex male 1 and female 2
Pearson Correlation Coefficients
Prob > |r| under H0: Rho=0
Number of Observations
 	CVDf1	CVDf2	CVDf3	CVDf4	SBP17msW	sex_IDsW
CVD2017m
CVD/100k/year ages 15-69
.
.
0
.
.
0
.
.
0
.
.
0
0.19492
<.0001
7846
-0.39527
<.0001
7846
Pearson Correlation Statistics (Fisher's z Transformation)
Variable	With Variable	N	Sample Correlation	Fisher's z	Bias Adjustment	Correlation Estimate	95% Confidence Limits	p Value for
H0:Rho=0
CVDf1	CVD2017m	0	.	.	.	.	.	.	.
CVDf2	CVD2017m	0	.	.	.	.	.	.	.
CVDf3	CVD2017m	0	.	.	.	.	.	.	.
CVDf4	CVD2017m	0	.	.	.	.	.	.	.
SBP17msW	CVD2017m	7846	0.19492	0.19745	0.0000124	0.19491	0.173531	0.216106	<.0001
sex_IDsW	CVD2017m	7846	-0.39527	-0.41803	-0.0000252	-0.39525	-0.413755	-0.376410	<.0001

I use SAS on Demand for Academics SAS9.4

Where did CVDf1, CVDf2, CVDf3, CVDf4 go?

Thanks.

wrote: This analysis uses complex variables and multiple regression. I'll show the code and the log. CVDf1, CVDf2, CVDf3, CVDf4 got lost at the end. data source; set projects.source; CVDf1= - pmeat17KCsW * 5.49 * 0.0443 - rmeat17KCsW * 50.70 * 0.0560 - fish17KCsW * 10.01 * 0.0412 - milk17KCsW * 25.37 * 0.0398 - poultry16KCsW * 45.06 * 0.0867 - eggs16KCsW * 19.47 * 0.1544 - SFA16KCsW * 191.27 * 0.0603 * 0.46 - PUFA17KCsW * 82.24 * 0.1033 * 0.46 - TFA17KCsW * 13.40 * 0.0128 * 0.46 - Alcohol17KCsW * 81.71 * 0.0047 + Sugarb17KCsW * 297.65 * 0.0136 + potatoes16KCsW * 84.16 * 0.0024 - corn16KCsW * 34.67 * 0.0037 - fruits17KCsW * 40.39 * 0.1291 - Vegetables17KCsW * 80.14 * 0.0127 - nutsseeds17KCsW * 8.51 * 0.0797 - wgrains17KCsW * 55.65 * 0.0376 - legumes17KCsW * 51.66 * 0.0005 + rice16KCsW * 141.23 * 0.0001 - swtpot16KCsW * 22.67 * 0.0270 ; CVDf2= + smoke17msW * 0.2046 * 0.08899 + SLTobacco17msW * 0.0680 * 0.08179 + kidneydz17msW * 0.056 * 0.037636 ; CVDf3= + T1DM17msW * 10.34 * 0.1169 + T2DM17msW * 17.47 * 0.05193 ; run;quit; proc corr data=source fisher; label CVDf1=Combination of 20 diet risk factors CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3=Types 1 and 2 DM; var CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW; with CVD2017m; run;quit; * Works ok to here; Proc reg data=source; label CVDf1=Combination of 20 diet risk factors CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3=Types 1 and 2 DM; ; model CVD2017msW=CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW / selection=STEPWISE slentry=.25 slstay=.25; run; quit; * Works ok to here; *CVD formula R2=0.4236; data source; set projects.source; label CVDf1=Combination of 20 diet risk factors CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3=Types 1 and 2 DM CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ; CVDf4= + CVDf1 * 0.01476 + CVDf2 * 16.45730 + CVDf3 * 0.11176 + SBP17msW * 0.17041 - sex_IDsW * 0.17070 ; run; quit; proc corr data=source fisher; label CVDf1=Combination of 20 diet risk factors CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3=Types 1 and 2 DM CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ; var CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW; with CVD2017m; run;quit; *Here SAS can't find CVDf1 CVDf2 CVDf3 CVDf4; Here is the log; The CORR Procedure 1 With Variables: CVD2017m 5 Variables: CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum Label CVD2017m 7846 543.66067 288.00939 4265562 73.47499 1844 CVD/100k/year ages 15-69 CVDf1 7846 0 21.61400 0 -74.89858 33.74918 Combination of 20 diet risk factors CVDf2 7846 0 0.01796 0 -0.02383 0.04322 Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3 7846 0 1.94134 0 -2.18859 20.35708 Types 1 and 2 DM SBP17msW 7846 4.9045E-13 1.00000 3.84807E-9 -2.43011 3.23505 Systolic BP mm Hg sex_IDsW 7846 0 1.00000 0 -0.99994 0.99994 Sex male 1 and female 2 Pearson Correlation Coefficients, N = 7846 Prob > |r| under H0: Rho=0 CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW CVD2017m CVD/100k/year ages 15-69 0.41439 <.0001 0.41217 <.0001 0.31755 <.0001 0.19492 <.0001 -0.39527 <.0001 Pearson Correlation Statistics (Fisher's z Transformation) Variable With Variable N Sample Correlation Fisher's z Bias Adjustment Correlation Estimate 95% Confidence Limits p Value for H0:Rho=0 CVDf1 CVD2017m 7846 0.41439 0.44090 0.0000264 0.41437 0.395873 0.432532 <.0001 CVDf2 CVD2017m 7846 0.41217 0.43822 0.0000263 0.41215 0.393608 0.430349 <.0001 CVDf3 CVD2017m 7846 0.31755 0.32892 0.0000202 0.31753 0.297494 0.337290 <.0001 SBP17msW CVD2017m 7846 0.19492 0.19745 0.0000124 0.19491 0.173531 0.216106 <.0001 sex_IDsW CVD2017m 7846 -0.39527 -0.41803 -0.0000252 -0.39525 -0.413755 -0.376410 <.0001 The REG Procedure Model: MODEL1 Dependent Variable: CVD2017msW CVD/100k/year ages 15-69 Number of Observations Read 7846 Number of Observations Used 7846 Stepwise Selection: Step 1 Variable CVDf1 Entered: R-Square = 0.1717 and C(p) = 3424.469 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 1 1347.15123 1347.15123 1626.24 <.0001 Error 7844 6497.84877 0.82838 Corrected Total 7845 7845.00000 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept -5.1482E-14 0.01028 2.07946E-23 0.00 1.0000 CVDf1 0.01917 0.00047543 1347.15123 1626.24 <.0001 Bounds on condition number: 1, 1 Stepwise Selection: Step 2 Variable CVDf2 Entered: R-Square = 0.3247 and C(p) = 1345.693 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 2 2547.22254 1273.61127 1885.50 <.0001 Error 7843 5297.77746 0.67548 Corrected Total 7845 7845.00000 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept -4.5343E-14 0.00928 1.61312E-23 0.00 1.0000 CVDf1 0.01823 0.00042990 1214.49377 1797.98 <.0001 CVDf2 21.80856 0.51740 1200.07130 1776.62 <.0001 Bounds on condition number: 1.0027, 4.0109 Stepwise Selection: Step 3 Variable CVDf3 Entered: R-Square = 0.3877 and C(p) = 491.2699 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 3 3041.15785 1013.71928 1654.84 <.0001 Error 7842 4803.84215 0.61258 Corrected Total 7845 7845.00000 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept -4.4852E-14 0.00884 1.57839E-23 0.00 1.0000 CVDf1 0.01438 0.00043120 681.70150 1112.84 <.0001 CVDf2 23.56816 0.49661 1379.71384 2252.30 <.0001 CVDf3 0.13686 0.00482 493.93531 806.32 <.0001 Bounds on condition number: 1.1212, 9.7564 Stepwise Selection: Step 4 Variable SBP17msW Entered: R-Square = 0.4126 and C(p) = 153.4894 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 4 3237.12365 809.28091 1377.11 <.0001 Error 7841 4607.87635 0.58766 Corrected Total 7845 7845.00000 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept -1.2225E-13 0.00865 1.17256E-22 0.00 1.0000 CVDf1 0.01453 0.00042241 695.29876 1183.16 <.0001 CVDf2 23.97928 0.48692 1425.21520 2425.22 <.0001 CVDf3 0.11914 0.00482 359.11233 611.08 <.0001 SBP17msW 0.16191 0.00887 195.96581 333.47 <.0001 Bounds on condition number: 1.1686, 17.406 Stepwise Selection: Step 5 Variable sex_IDsW Entered: R-Square = 0.4236 and C(p) = 6.0000 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 5 3323.34051 664.66810 1152.45 <.0001 Error 7840 4521.65949 0.57674 Corrected Total 7845 7845.00000 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept -1.2845E-13 0.00857 1.29454E-22 0.00 1.0000 CVDf1 0.01476 0.00041890 716.32252 1242.01 <.0001 CVDf2 16.45730 0.78178 255.58277 443.15 <.0001 CVDf3 0.11176 0.00481 311.03348 539.29 <.0001 SBP17msW 0.17041 0.00881 215.71423 374.02 <.0001 sex_IDsW -0.17070 0.01396 86.21686 149.49 <.0001 Bounds on condition number: 2.6811, 43.454 All variables left in the model are significant at the 0.2500 level. All variables have been entered into the model. Summary of Stepwise Selection Step Variable Entered Variable Removed Label Number Vars In Partial R-Square Model R-Square C(p) F Value Pr > F 1 CVDf1 Combination of 20 diet risk factors 1 0.1717 0.1717 3424.47 1626.24 <.0001 2 CVDf2 Kidney Dz, Smoking tobacco, sublingual tobacco 2 0.1530 0.3247 1345.69 1776.62 <.0001 3 CVDf3 Types 1 and 2 DM 3 0.0630 0.3877 491.270 806.32 <.0001 4 SBP17msW Systolic BP mm Hg 4 0.0250 0.4126 153.489 333.47 <.0001 5 sex_IDsW Sex male 1 and female 2 5 0.0110 0.4236 6.0000 149.49 <.0001 The REG Procedure Model: MODEL1 Dependent Variable: CVD2017msW CVD/100k/year ages 15-69 Panel of heat maps of residuals by regressors for CVD2017msW . The CORR Procedure 1 With Variables: CVD2017m 6 Variables: CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum Label CVD2017m 7846 543.66067 288.00939 4265562 73.47499 1844 CVD/100k/year ages 15-69 CVDf1 0 . . . . . Combination of 20 diet risk factors CVDf2 0 . . . . . Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3 0 . . . . . Types 1 and 2 DM CVDf4 0 . . . . . Mult reg CVDf1 CVDf2 CVDf3 SBP sex SBP17msW 7846 4.9045E-13 1.00000 3.84807E-9 -2.43011 3.23505 Systolic BP mm Hg sex_IDsW 7846 0 1.00000 0 -0.99994 0.99994 Sex male 1 and female 2 Pearson Correlation Coefficients Prob > |r| under H0: Rho=0 Number of Observations CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW CVD2017m CVD/100k/year ages 15-69 . . 0 . . 0 . . 0 . . 0 0.19492 <.0001 7846 -0.39527 <.0001 7846 Pearson Correlation Statistics (Fisher's z Transformation) Variable With Variable N Sample Correlation Fisher's z Bias Adjustment Correlation Estimate 95% Confidence Limits p Value for H0:Rho=0 CVDf1 CVD2017m 0 . . . . . . . CVDf2 CVD2017m 0 . . . . . . . CVDf3 CVD2017m 0 . . . . . . . CVDf4 CVD2017m 0 . . . . . . . SBP17msW CVD2017m 7846 0.19492 0.19745 0.0000124 0.19491 0.173531 0.216106 <.0001 sex_IDsW CVD2017m 7846 -0.39527 -0.41803 -0.0000252 -0.39525 -0.413755 -0.376410 <.0001 I use SAS on Demand for Academics SAS9.4 Where did CVDf1, CVDf2, CVDf3, CVDf4 go? Thanks.

View solution in original post

japelin · Posted 04-29-2021 09:00 PM

I don't have the data and submit log, so I'm guessing, but I think that in the second data step, the dataset to be set is work.source, not projects.source.

The other thing I noticed is that you are repeating the same label statement over and over again.
I think if you specify it in the first data step, you don't need to specify it every time.
I am sorry if this is intentionally done this way.

dkcundiffMD · Posted 04-29-2021 10:53 PM

It was helpful to use work.source instead of source. However, I still lose CVDf1, CVDf2, CVDf3, and CVDf4 after a regression step. The final answer is CVDf5 and it gets that answer. That answer depends on CVDf1, CVDf2, CVDf3, and CVDf4. If I stop before the final CVDf5 data step followed by proc corr, then CVDf1, CVDf2, CVDf3, and CVDf4 are included in the work.souce. If I take it to the final step to derive CVDf5, then CVDf5 is included in the output but not CVDf1, CVDf2, CVDf3, and CVDf4. The log says: WARNING: Variable CVDF1 not found in data set WORK.SOURCE.

How does it lose CVDf1, CVDf2, CVDf3, and CVDf4 in the final proc corr? The last data step must have those variables because CVDf5, the final answer, depends on them.

Thanks

SAS code:

data work.source;
	set projects.source;
CVDf1=
-	pmeat17KCsW 	* 	5.49 	* 	0.0443
-	rmeat17KCsW 	* 	50.70 	* 	0.0560
-	fish17KCsW 		* 	10.01 	* 	0.0412
-	milk17KCsW 		* 	25.37 	* 	0.0398
-	poultry16KCsW 	* 	45.06 	* 	0.0867
-	eggs16KCsW 		* 	19.47 	* 	0.1544
-	SFA16KCsW 		* 	191.27 	* 	0.0603 	* 	0.46
-	PUFA17KCsW 		* 	82.24 	* 	0.1033 	* 	0.46
-	TFA17KCsW 		* 	13.40 	* 	0.0128 	* 	0.46
-	Alcohol17KCsW 	* 	81.71 	* 	0.0047
+	Sugarb17KCsW 	* 	297.65 	* 	0.0136
+	potatoes16KCsW 	* 	84.16 	* 	0.0024
-	corn16KCsW 		* 	34.67 	* 	0.0037
-	fruits17KCsW 	* 	40.39 	* 	0.1291
- Vegetables17KCsW 	* 	80.14 	* 	0.0127
-	nutsseeds17KCsW * 	8.51 	* 	0.0797
-	wgrains17KCsW 	* 	55.65 	* 	0.0376
-	legumes17KCsW 	* 	51.66 	* 	0.0005
+	rice16KCsW 		* 	141.23 	* 	0.0001
-	swtpot16KCsW 	* 	22.67 	* 	0.0270
;

CVDf2=
+	smoke17msW 		* 0.2046 * 0.08899
+	SLTobacco17msW	* 0.0680 * 0.08179
+	kidneydz17msW  * 0.056	* 0.037636
;
CVDf3=
+	T1DM17msW	* 	10.34 * 0.1169
+	T2DM17msW	* 	17.47 * 0.05193
;
run;quit;

proc corr data=work.source fisher;
	label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM;
	var CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW;
	with CVD2017m;
run;quit;

Proc reg data=work.source;
	label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM;
	;
	model CVD2017msW=CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW 
	 	/ selection=STEPWISE 
		slentry=.25 slstay=.25;
	run; quit;
	
	*
	CVDf1	0.01476	0.00041890	716.32252	1242.01	<.0001
CVDf2	16.45730	0.78178	255.58277	443.15	<.0001
CVDf3	0.11176	0.00481	311.03348	539.29	<.0001
SBP17msW	0.17041	0.00881	215.71423	374.02	<.0001
sex_IDsW	-0.17070	0.01396	86.21686	149.49	<.0001
1	CVDf1	 	Combination of 20 diet risk factors	1	0.1717	0.1717	3424.47	1626.24	<.0001
2	CVDf2	 	Child wt, air pollution, Kidney Dz	2	0.1530	0.3247	1345.69	1776.62	<.0001
3	CVDf3	 	Types 1 and 2 DM	3	0.0630	0.3877	491.270	806.32	<.0001
4	SBP17msW	 	Systolic BP mm Hg	4	0.0250	0.4126	153.489	333.47	<.0001
5	sex_IDsW	 	Sex male 1 and female 2	5	0.0110	0.4236	6.0000	149.49	<.0001
;
*CVD formula R2=0.4236;
data work.source;
	set projects.source;
label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM
	CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ;	
CVDf4=
+	CVDf1 * 0.01476	
+	CVDf2 * 16.45730
+	CVDf3 * 0.11176	
+	SBP17msW * 0.17041
-	sex_IDsW * 0.17070
;

run; quit;

proc corr data=work.source fisher;
	label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM
	CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ;
	var CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW;
	with CVD2017m;
run;quit;

*CVD formula R2=0.4236;
data work.source;
	set projects.source;
CVDf5=
-	pmeat17KCsW		*	0.09
-	rmeat17KCsW		*	1.14
-	fish17KCsW		*	0.17
-	milk17KCsW		*	0.39
-	poultry16KCsW	*	1.54
-	eggs16kcsW		*	1.23
-	SFA16KCsW		*	2.10
-	PUFA17KCsW		*	1.56
-	TFA17KCsW		*	0.03
-	ALCOHOL17KCsW	*	0.13
+	Sugarb17KCsW	*	1.59
+	potatoes16KCsW	*	0.09
-	corn16kcsW		*	0.06
-	fruits17KCsW	*	2.12
-	vegetables17KCsW*	0.38
-	nutsseeds17KCsW	*	0.27
-	wgrains17KCsW	*	0.88
-	legumes17kcsW	*	0.01
+	rice16kcsW		*	0.00
-	swtpot16kcsW	*	0.27
+	smoke17msW		*	8.45
+	SLTobacco17msW	*	2.57
+	kidneydz17msW	*	0.98
+	T1DM17msW		*	3.80
+	T2DM17msW		*	2.85
+	SBP17msW		*	4.83
-	sex_IDsW		*	4.84

;
run; quit;

proc corr data=work.source fisher;
	label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM
	CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex 
	CVDf5=Final CVD risk factor formula;
	var  CVDf5 SBP17msW sex_IDsW;
	with CVD2017m;
run;quit;

results:
The REG Procedure

Model: MODEL1

Dependent Variable: CVD2017msW CVD/100k/year ages 15-69

Number of Observations Read	7846
Number of Observations Used	7846
Stepwise Selection: Step 1

Variable CVDf1 Entered: R-Square = 0.1717 and C(p) = 3424.469

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	1	1347.15123	1347.15123	1626.24	<.0001
Error	7844	6497.84877	0.82838	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-5.1482E-14	0.01028	2.07946E-23	0.00	1.0000
CVDf1	0.01917	0.00047543	1347.15123	1626.24	<.0001
Bounds on condition number: 1, 1

Stepwise Selection: Step 2

Variable CVDf2 Entered: R-Square = 0.3247 and C(p) = 1345.693

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	2	2547.22254	1273.61127	1885.50	<.0001
Error	7843	5297.77746	0.67548	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-4.5343E-14	0.00928	1.61312E-23	0.00	1.0000
CVDf1	0.01823	0.00042990	1214.49377	1797.98	<.0001
CVDf2	21.80856	0.51740	1200.07130	1776.62	<.0001
Bounds on condition number: 1.0027, 4.0109

Stepwise Selection: Step 3

Variable CVDf3 Entered: R-Square = 0.3877 and C(p) = 491.2699

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	3	3041.15785	1013.71928	1654.84	<.0001
Error	7842	4803.84215	0.61258	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-4.4852E-14	0.00884	1.57839E-23	0.00	1.0000
CVDf1	0.01438	0.00043120	681.70150	1112.84	<.0001
CVDf2	23.56816	0.49661	1379.71384	2252.30	<.0001
CVDf3	0.13686	0.00482	493.93531	806.32	<.0001
Bounds on condition number: 1.1212, 9.7564

Stepwise Selection: Step 4

Variable SBP17msW Entered: R-Square = 0.4126 and C(p) = 153.4894

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	4	3237.12365	809.28091	1377.11	<.0001
Error	7841	4607.87635	0.58766	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-1.2225E-13	0.00865	1.17256E-22	0.00	1.0000
CVDf1	0.01453	0.00042241	695.29876	1183.16	<.0001
CVDf2	23.97928	0.48692	1425.21520	2425.22	<.0001
CVDf3	0.11914	0.00482	359.11233	611.08	<.0001
SBP17msW	0.16191	0.00887	195.96581	333.47	<.0001
Bounds on condition number: 1.1686, 17.406

Stepwise Selection: Step 5

Variable sex_IDsW Entered: R-Square = 0.4236 and C(p) = 6.0000

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	5	3323.34051	664.66810	1152.45	<.0001
Error	7840	4521.65949	0.57674	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-1.2845E-13	0.00857	1.29454E-22	0.00	1.0000
CVDf1	0.01476	0.00041890	716.32252	1242.01	<.0001
CVDf2	16.45730	0.78178	255.58277	443.15	<.0001
CVDf3	0.11176	0.00481	311.03348	539.29	<.0001
SBP17msW	0.17041	0.00881	215.71423	374.02	<.0001
sex_IDsW	-0.17070	0.01396	86.21686	149.49	<.0001
Bounds on condition number: 2.6811, 43.454

All variables left in the model are significant at the 0.2500 level.

All variables have been entered into the model.

Summary of Stepwise Selection
Step	Variable
Entered	Variable
Removed	Label	Number
Vars In	Partial
R-Square	Model
R-Square	C(p)	F Value	Pr > F
1	CVDf1	 	Combination of 20 diet risk factors	1	0.1717	0.1717	3424.47	1626.24	<.0001
2	CVDf2	 	Kidney Dz, Smoking tobacco, sublingual tobacco	2	0.1530	0.3247	1345.69	1776.62	<.0001
3	CVDf3	 	Types 1 and 2 DM	3	0.0630	0.3877	491.270	806.32	<.0001
4	SBP17msW	 	Systolic BP mm Hg	4	0.0250	0.4126	153.489	333.47	<.0001
5	sex_IDsW	 	Sex male 1 and female 2	5	0.0110	0.4236	6.0000	149.49	<.0001
The REG Procedure

Model: MODEL1

Dependent Variable: CVD2017msW CVD/100k/year ages 15-69

Panel of heat maps of residuals by regressors for CVD2017msW.
The CORR Procedure

1 With Variables:	CVD2017m
6 Variables:	CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW
Simple Statistics
Variable	N	Mean	Std Dev	Sum	Minimum	Maximum	Label
CVD2017m	7846	543.66067	288.00939	4265562	73.47499	1844	CVD/100k/year ages 15-69
CVDf1	0	.	.	.	.	.	Combination of 20 diet risk factors
CVDf2	0	.	.	.	.	.	Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3	0	.	.	.	.	.	Types 1 and 2 DM
CVDf4	0	.	.	.	.	.	Mult reg CVDf1 CVDf2 CVDf3 SBP sex
SBP17msW	7846	4.9045E-13	1.00000	3.84807E-9	-2.43011	3.23505	Systolic BP mm Hg
sex_IDsW	7846	0	1.00000	0	-0.99994	0.99994	Sex male 1 and female 2
Pearson Correlation Coefficients
Prob > |r| under H0: Rho=0
Number of Observations
 	CVDf1	CVDf2	CVDf3	CVDf4	SBP17msW	sex_IDsW
CVD2017m
CVD/100k/year ages 15-69
.
.
0
.
.
0
.
.
0
.
.
0
0.19492
<.0001
7846
-0.39527
<.0001
7846
Pearson Correlation Statistics (Fisher's z Transformation)
Variable	With Variable	N	Sample Correlation	Fisher's z	Bias Adjustment	Correlation Estimate	95% Confidence Limits	p Value for
H0:Rho=0
CVDf1	CVD2017m	0	.	.	.	.	.	.	.
CVDf2	CVD2017m	0	.	.	.	.	.	.	.
CVDf3	CVD2017m	0	.	.	.	.	.	.	.
CVDf4	CVD2017m	0	.	.	.	.	.	.	.
SBP17msW	CVD2017m	7846	0.19492	0.19745	0.0000124	0.19491	0.173531	0.216106	<.0001
sex_IDsW	CVD2017m	7846	-0.39527	-0.41803	-0.0000252	-0.39525	-0.413755	-0.376410	<.0001
The CORR Procedure

1 With Variables:	CVD2017m
3 Variables:	CVDf5 SBP17msW sex_IDsW
Simple Statistics
Variable	N	Mean	Std Dev	Sum	Minimum	Maximum	Label
CVD2017m	7846	543.66067	288.00939	4265562	73.47499	1844	CVD/100k/year ages 15-69
CVDf5	7846	2.6151E-12	18.11747	2.05182E-8	-48.22646	83.18385	Final CVD risk factor formula
SBP17msW	7846	4.9045E-13	1.00000	3.84807E-9	-2.43011	3.23505	Systolic BP mm Hg
sex_IDsW	7846	0	1.00000	0	-0.99994	0.99994	Sex male 1 and female 2
Pearson Correlation Coefficients, N = 7846
Prob > |r| under H0: Rho=0
 	CVDf5	SBP17msW	sex_IDsW
CVD2017m
CVD/100k/year ages 15-69
0.65244
<.0001
0.19492
<.0001
-0.39527
<.0001
Pearson Correlation Statistics (Fisher's z Transformation)
Variable	With Variable	N	Sample Correlation	Fisher's z	Bias Adjustment	Correlation Estimate	95% Confidence Limits	p Value for
H0:Rho=0
CVDf5	CVD2017m	7846	0.65244	0.77953	0.0000416	0.65241	0.639517	0.664941	<.0001
SBP17msW	CVD2017m	7846	0.19492	0.19745	0.0000124	0.19491	0.173531	0.216106	<.0001
sex_IDsW	CVD2017m	7846	-0.39527	-0.41803	-0.0000252	-0.39525	-0.413755	-0.376410	<.0001

Log;

      
 217        *CVD formula R2=0.4236;
 218        data work.source;
 219        set projects.source;
 220        CVDf5=
 221        -pmeat17KCsW*0.09
 222        -rmeat17KCsW*1.14
 223        -fish17KCsW*0.17
 224        -milk17KCsW*0.39
 225        -poultry16KCsW*1.54
 226        -eggs16kcsW*1.23
 227        -SFA16KCsW*2.10
 228        -PUFA17KCsW*1.56
 229        -TFA17KCsW*0.03
 230        -ALCOHOL17KCsW*0.13
 231        +Sugarb17KCsW*1.59
 232        +potatoes16KCsW*0.09
 233        -corn16kcsW*0.06
 234        -fruits17KCsW*2.12
 235        -vegetables17KCsW*0.38
 236        -nutsseeds17KCsW*0.27
 237        -wgrains17KCsW*0.88
 238        -legumes17kcsW*0.01
 239        +rice16kcsW*0.00
 240        -swtpot16kcsW*0.27
 241        +smoke17msW*8.45
 242        +SLTobacco17msW*2.57
 243        +kidneydz17msW*0.98
 244        +T1DM17msW*3.80
 245        +T2DM17msW*2.85
 246        +SBP17msW*4.83
 247        -sex_IDsW*4.84
 248        
 249        ;
 250        run;
 
 NOTE: There were 7846 observations read from the data set PROJECTS.SOURCE.
 NOTE: The data set WORK.SOURCE has 7846 observations and 1273 variables.
 NOTE: DATA statement used (Total process time):
       real time           0.11 seconds
       user cpu time       0.02 seconds
       system cpu time     0.10 seconds
       memory              5043.34k
       OS Memory           68032.00k
       Timestamp           04/30/2021 02:16:01 AM
       Step Count                        299  Switch Count  15
       Page Faults                       0
       Page Reclaims                     743
       Page Swaps                        0
       Voluntary Context Switches        51
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           167952
       
 
 250      !      quit;
 251        
 252        proc corr data=work.source fisher;
 253        label
 254        CVDf1=Combination of 20 diet risk factors
 255        CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
 WARNING: Variable CVDF1 not found in data set WORK.SOURCE.
 256        CVDf3=Types 1 and 2 DM
 WARNING: Variable CVDF2 not found in data set WORK.SOURCE.
 257        CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex
 WARNING: Variable CVDF3 not found in data set WORK.SOURCE.
 258        CVDf5=Final CVD risk factor formula;
 WARNING: Variable CVDF4 not found in data set WORK.SOURCE.
 259        var  CVDf5 SBP17msW sex_IDsW;
 260        with CVD2017m;
 261        run;
 
 NOTE: PROCEDURE CORR used (Total process time):
       real time           0.08 seconds
       user cpu time       0.06 seconds
       system cpu time     0.02 seconds
       memory              2373.78k
       OS Memory           64952.00k
       Timestamp           04/30/2021 02:16:01 AM
       Step Count                        300  Switch Count  13
       Page Faults                       0
       Page Reclaims                     246
       Page Swaps                        0
       Voluntary Context Switches        36
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           8
       
 
 261      !     quit;
 262        
 263        OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 275

japelin · Posted 04-30-2021 12:19 AM

I think you need to modify the following two parts from set projects.source to set work.source.
Isn't CVDf1-CVDf5 included in projects.source?

data work.source;
	set projects.source;
label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM
	CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ;	
CVDf4=

data work.source;
	set projects.source;
CVDf5=

Reeza · Posted 04-29-2021 10:09 PM

You did not include your log, you included your output. I would have expected to see errors for the PROC CORR step because your LABEL statement looks problematic. I would have expected to see the text into quotations.

Also, if you put the LABEL in the data step you do not have to repeat it for each PROC.

This part of the code is the problem. Order of operations. You're now reading from the original source file, not the file you created in the work library called SOURCE. That means the variables you created previously no longer exist.

data source;
	set projects.source;
label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM
	CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ;	
CVDf4=
+	CVDf1 * 0.01476	
+	CVDf2 * 16.45730
+	CVDf3 * 0.11176	
+	SBP17msW * 0.17041
-	sex_IDsW * 0.17070
;

run; quit

You probably want to do this instead:

data source2;
	set source;

You don't seem to be able to follow how the data is flowing yet, so for now my recommendation would be to not re-use the same names ANYWHERE in your code. Use a unique name for each data set so that you can trace things.

Spoiler

@dkcundiffMD wrote:

This analysis uses complex variables and multiple regression. I'll show the code and the log. CVDf1, CVDf2, CVDf3, CVDf4 got lost at the end.

data source;
	set projects.source;
CVDf1=
-	pmeat17KCsW 	* 	5.49 	* 	0.0443
-	rmeat17KCsW 	* 	50.70 	* 	0.0560
-	fish17KCsW 		* 	10.01 	* 	0.0412
-	milk17KCsW 		* 	25.37 	* 	0.0398
-	poultry16KCsW 	* 	45.06 	* 	0.0867
-	eggs16KCsW 		* 	19.47 	* 	0.1544
-	SFA16KCsW 		* 	191.27 	* 	0.0603 	* 	0.46
-	PUFA17KCsW 		* 	82.24 	* 	0.1033 	* 	0.46
-	TFA17KCsW 		* 	13.40 	* 	0.0128 	* 	0.46
-	Alcohol17KCsW 	* 	81.71 	* 	0.0047
+	Sugarb17KCsW 	* 	297.65 	* 	0.0136
+	potatoes16KCsW 	* 	84.16 	* 	0.0024
-	corn16KCsW 		* 	34.67 	* 	0.0037
-	fruits17KCsW 	* 	40.39 	* 	0.1291
- Vegetables17KCsW 	* 	80.14 	* 	0.0127
-	nutsseeds17KCsW * 	8.51 	* 	0.0797
-	wgrains17KCsW 	* 	55.65 	* 	0.0376
-	legumes17KCsW 	* 	51.66 	* 	0.0005
+	rice16KCsW 		* 	141.23 	* 	0.0001
-	swtpot16KCsW 	* 	22.67 	* 	0.0270
;

CVDf2=
+	smoke17msW 		* 0.2046 * 0.08899
+	SLTobacco17msW	* 0.0680 * 0.08179
+	kidneydz17msW  * 0.056	* 0.037636
;
CVDf3=
+	T1DM17msW	* 	10.34 * 0.1169
+	T2DM17msW	* 	17.47 * 0.05193
;
run;quit;

proc corr data=source fisher;
	label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM;
	var CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW;
	with CVD2017m;
run;quit;


* Works ok to here;


Proc reg data=source;
	label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM;
	;
	model CVD2017msW=CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW 
	 	/ selection=STEPWISE 
		slentry=.25 slstay=.25;
	run; quit;
	
	* Works ok to here;


*CVD formula R2=0.4236;
data source;
	set projects.source;
label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM
	CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ;	
CVDf4=
+	CVDf1 * 0.01476	
+	CVDf2 * 16.45730
+	CVDf3 * 0.11176	
+	SBP17msW * 0.17041
-	sex_IDsW * 0.17070
;

run; quit;

proc corr data=source fisher;
	label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM
	CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ;
	var CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW;
	with CVD2017m;
run;quit; 

*Here SAS can't find CVDf1 CVDf2 CVDf3 CVDf4;

Here is the log;

The CORR Procedure

1 With Variables:	CVD2017m
5 Variables:	CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW
Simple Statistics
Variable	N	Mean	Std Dev	Sum	Minimum	Maximum	Label
CVD2017m	7846	543.66067	288.00939	4265562	73.47499	1844	CVD/100k/year ages 15-69
CVDf1	7846	0	21.61400	0	-74.89858	33.74918	Combination of 20 diet risk factors
CVDf2	7846	0	0.01796	0	-0.02383	0.04322	Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3	7846	0	1.94134	0	-2.18859	20.35708	Types 1 and 2 DM
SBP17msW	7846	4.9045E-13	1.00000	3.84807E-9	-2.43011	3.23505	Systolic BP mm Hg
sex_IDsW	7846	0	1.00000	0	-0.99994	0.99994	Sex male 1 and female 2
Pearson Correlation Coefficients, N = 7846
Prob > |r| under H0: Rho=0
 	CVDf1	CVDf2	CVDf3	SBP17msW	sex_IDsW
CVD2017m
CVD/100k/year ages 15-69
0.41439
<.0001
0.41217
<.0001
0.31755
<.0001
0.19492
<.0001
-0.39527
<.0001
Pearson Correlation Statistics (Fisher's z Transformation)
Variable	With Variable	N	Sample Correlation	Fisher's z	Bias Adjustment	Correlation Estimate	95% Confidence Limits	p Value for
H0:Rho=0
CVDf1	CVD2017m	7846	0.41439	0.44090	0.0000264	0.41437	0.395873	0.432532	<.0001
CVDf2	CVD2017m	7846	0.41217	0.43822	0.0000263	0.41215	0.393608	0.430349	<.0001
CVDf3	CVD2017m	7846	0.31755	0.32892	0.0000202	0.31753	0.297494	0.337290	<.0001
SBP17msW	CVD2017m	7846	0.19492	0.19745	0.0000124	0.19491	0.173531	0.216106	<.0001
sex_IDsW	CVD2017m	7846	-0.39527	-0.41803	-0.0000252	-0.39525	-0.413755	-0.376410	<.0001
The REG Procedure

Model: MODEL1

Dependent Variable: CVD2017msW CVD/100k/year ages 15-69

Number of Observations Read	7846
Number of Observations Used	7846
Stepwise Selection: Step 1

Variable CVDf1 Entered: R-Square = 0.1717 and C(p) = 3424.469

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	1	1347.15123	1347.15123	1626.24	<.0001
Error	7844	6497.84877	0.82838	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-5.1482E-14	0.01028	2.07946E-23	0.00	1.0000
CVDf1	0.01917	0.00047543	1347.15123	1626.24	<.0001
Bounds on condition number: 1, 1

Stepwise Selection: Step 2

Variable CVDf2 Entered: R-Square = 0.3247 and C(p) = 1345.693

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	2	2547.22254	1273.61127	1885.50	<.0001
Error	7843	5297.77746	0.67548	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-4.5343E-14	0.00928	1.61312E-23	0.00	1.0000
CVDf1	0.01823	0.00042990	1214.49377	1797.98	<.0001
CVDf2	21.80856	0.51740	1200.07130	1776.62	<.0001
Bounds on condition number: 1.0027, 4.0109

Stepwise Selection: Step 3

Variable CVDf3 Entered: R-Square = 0.3877 and C(p) = 491.2699

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	3	3041.15785	1013.71928	1654.84	<.0001
Error	7842	4803.84215	0.61258	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-4.4852E-14	0.00884	1.57839E-23	0.00	1.0000
CVDf1	0.01438	0.00043120	681.70150	1112.84	<.0001
CVDf2	23.56816	0.49661	1379.71384	2252.30	<.0001
CVDf3	0.13686	0.00482	493.93531	806.32	<.0001
Bounds on condition number: 1.1212, 9.7564

Stepwise Selection: Step 4

Variable SBP17msW Entered: R-Square = 0.4126 and C(p) = 153.4894

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	4	3237.12365	809.28091	1377.11	<.0001
Error	7841	4607.87635	0.58766	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-1.2225E-13	0.00865	1.17256E-22	0.00	1.0000
CVDf1	0.01453	0.00042241	695.29876	1183.16	<.0001
CVDf2	23.97928	0.48692	1425.21520	2425.22	<.0001
CVDf3	0.11914	0.00482	359.11233	611.08	<.0001
SBP17msW	0.16191	0.00887	195.96581	333.47	<.0001
Bounds on condition number: 1.1686, 17.406

Stepwise Selection: Step 5

Variable sex_IDsW Entered: R-Square = 0.4236 and C(p) = 6.0000

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	5	3323.34051	664.66810	1152.45	<.0001
Error	7840	4521.65949	0.57674	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-1.2845E-13	0.00857	1.29454E-22	0.00	1.0000
CVDf1	0.01476	0.00041890	716.32252	1242.01	<.0001
CVDf2	16.45730	0.78178	255.58277	443.15	<.0001
CVDf3	0.11176	0.00481	311.03348	539.29	<.0001
SBP17msW	0.17041	0.00881	215.71423	374.02	<.0001
sex_IDsW	-0.17070	0.01396	86.21686	149.49	<.0001
Bounds on condition number: 2.6811, 43.454

All variables left in the model are significant at the 0.2500 level.

All variables have been entered into the model.

Summary of Stepwise Selection
Step	Variable
Entered	Variable
Removed	Label	Number
Vars In	Partial
R-Square	Model
R-Square	C(p)	F Value	Pr > F
1	CVDf1	 	Combination of 20 diet risk factors	1	0.1717	0.1717	3424.47	1626.24	<.0001
2	CVDf2	 	Kidney Dz, Smoking tobacco, sublingual tobacco	2	0.1530	0.3247	1345.69	1776.62	<.0001
3	CVDf3	 	Types 1 and 2 DM	3	0.0630	0.3877	491.270	806.32	<.0001
4	SBP17msW	 	Systolic BP mm Hg	4	0.0250	0.4126	153.489	333.47	<.0001
5	sex_IDsW	 	Sex male 1 and female 2	5	0.0110	0.4236	6.0000	149.49	<.0001
The REG Procedure

Model: MODEL1

Dependent Variable: CVD2017msW CVD/100k/year ages 15-69

Panel of heat maps of residuals by regressors for CVD2017msW
.
The CORR Procedure

1 With Variables:	CVD2017m
6 Variables:	CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW
Simple Statistics
Variable	N	Mean	Std Dev	Sum	Minimum	Maximum	Label
CVD2017m	7846	543.66067	288.00939	4265562	73.47499	1844	CVD/100k/year ages 15-69
CVDf1	0	.	.	.	.	.	Combination of 20 diet risk factors
CVDf2	0	.	.	.	.	.	Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3	0	.	.	.	.	.	Types 1 and 2 DM
CVDf4	0	.	.	.	.	.	Mult reg CVDf1 CVDf2 CVDf3 SBP sex
SBP17msW	7846	4.9045E-13	1.00000	3.84807E-9	-2.43011	3.23505	Systolic BP mm Hg
sex_IDsW	7846	0	1.00000	0	-0.99994	0.99994	Sex male 1 and female 2
Pearson Correlation Coefficients
Prob > |r| under H0: Rho=0
Number of Observations
 	CVDf1	CVDf2	CVDf3	CVDf4	SBP17msW	sex_IDsW
CVD2017m
CVD/100k/year ages 15-69
.
.
0
.
.
0
.
.
0
.
.
0
0.19492
<.0001
7846
-0.39527
<.0001
7846
Pearson Correlation Statistics (Fisher's z Transformation)
Variable	With Variable	N	Sample Correlation	Fisher's z	Bias Adjustment	Correlation Estimate	95% Confidence Limits	p Value for
H0:Rho=0
CVDf1	CVD2017m	0	.	.	.	.	.	.	.
CVDf2	CVD2017m	0	.	.	.	.	.	.	.
CVDf3	CVD2017m	0	.	.	.	.	.	.	.
CVDf4	CVD2017m	0	.	.	.	.	.	.	.
SBP17msW	CVD2017m	7846	0.19492	0.19745	0.0000124	0.19491	0.173531	0.216106	<.0001
sex_IDsW	CVD2017m	7846	-0.39527	-0.41803	-0.0000252	-0.39525	-0.413755	-0.376410	<.0001

I use SAS on Demand for Academics SAS9.4

Where did CVDf1, CVDf2, CVDf3, CVDf4 go?

Thanks.

wrote: This analysis uses complex variables and multiple regression. I'll show the code and the log. CVDf1, CVDf2, CVDf3, CVDf4 got lost at the end. data source; set projects.source; CVDf1= - pmeat17KCsW * 5.49 * 0.0443 - rmeat17KCsW * 50.70 * 0.0560 - fish17KCsW * 10.01 * 0.0412 - milk17KCsW * 25.37 * 0.0398 - poultry16KCsW * 45.06 * 0.0867 - eggs16KCsW * 19.47 * 0.1544 - SFA16KCsW * 191.27 * 0.0603 * 0.46 - PUFA17KCsW * 82.24 * 0.1033 * 0.46 - TFA17KCsW * 13.40 * 0.0128 * 0.46 - Alcohol17KCsW * 81.71 * 0.0047 + Sugarb17KCsW * 297.65 * 0.0136 + potatoes16KCsW * 84.16 * 0.0024 - corn16KCsW * 34.67 * 0.0037 - fruits17KCsW * 40.39 * 0.1291 - Vegetables17KCsW * 80.14 * 0.0127 - nutsseeds17KCsW * 8.51 * 0.0797 - wgrains17KCsW * 55.65 * 0.0376 - legumes17KCsW * 51.66 * 0.0005 + rice16KCsW * 141.23 * 0.0001 - swtpot16KCsW * 22.67 * 0.0270 ; CVDf2= + smoke17msW * 0.2046 * 0.08899 + SLTobacco17msW * 0.0680 * 0.08179 + kidneydz17msW * 0.056 * 0.037636 ; CVDf3= + T1DM17msW * 10.34 * 0.1169 + T2DM17msW * 17.47 * 0.05193 ; run;quit; proc corr data=source fisher; label CVDf1=Combination of 20 diet risk factors CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3=Types 1 and 2 DM; var CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW; with CVD2017m; run;quit; * Works ok to here; Proc reg data=source; label CVDf1=Combination of 20 diet risk factors CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3=Types 1 and 2 DM; ; model CVD2017msW=CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW / selection=STEPWISE slentry=.25 slstay=.25; run; quit; * Works ok to here; *CVD formula R2=0.4236; data source; set projects.source; label CVDf1=Combination of 20 diet risk factors CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3=Types 1 and 2 DM CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ; CVDf4= + CVDf1 * 0.01476 + CVDf2 * 16.45730 + CVDf3 * 0.11176 + SBP17msW * 0.17041 - sex_IDsW * 0.17070 ; run; quit; proc corr data=source fisher; label CVDf1=Combination of 20 diet risk factors CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3=Types 1 and 2 DM CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ; var CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW; with CVD2017m; run;quit; *Here SAS can't find CVDf1 CVDf2 CVDf3 CVDf4; Here is the log; The CORR Procedure 1 With Variables: CVD2017m 5 Variables: CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum Label CVD2017m 7846 543.66067 288.00939 4265562 73.47499 1844 CVD/100k/year ages 15-69 CVDf1 7846 0 21.61400 0 -74.89858 33.74918 Combination of 20 diet risk factors CVDf2 7846 0 0.01796 0 -0.02383 0.04322 Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3 7846 0 1.94134 0 -2.18859 20.35708 Types 1 and 2 DM SBP17msW 7846 4.9045E-13 1.00000 3.84807E-9 -2.43011 3.23505 Systolic BP mm Hg sex_IDsW 7846 0 1.00000 0 -0.99994 0.99994 Sex male 1 and female 2 Pearson Correlation Coefficients, N = 7846 Prob > |r| under H0: Rho=0 CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW CVD2017m CVD/100k/year ages 15-69 0.41439 <.0001 0.41217 <.0001 0.31755 <.0001 0.19492 <.0001 -0.39527 <.0001 Pearson Correlation Statistics (Fisher's z Transformation) Variable With Variable N Sample Correlation Fisher's z Bias Adjustment Correlation Estimate 95% Confidence Limits p Value for H0:Rho=0 CVDf1 CVD2017m 7846 0.41439 0.44090 0.0000264 0.41437 0.395873 0.432532 <.0001 CVDf2 CVD2017m 7846 0.41217 0.43822 0.0000263 0.41215 0.393608 0.430349 <.0001 CVDf3 CVD2017m 7846 0.31755 0.32892 0.0000202 0.31753 0.297494 0.337290 <.0001 SBP17msW CVD2017m 7846 0.19492 0.19745 0.0000124 0.19491 0.173531 0.216106 <.0001 sex_IDsW CVD2017m 7846 -0.39527 -0.41803 -0.0000252 -0.39525 -0.413755 -0.376410 <.0001 The REG Procedure Model: MODEL1 Dependent Variable: CVD2017msW CVD/100k/year ages 15-69 Number of Observations Read 7846 Number of Observations Used 7846 Stepwise Selection: Step 1 Variable CVDf1 Entered: R-Square = 0.1717 and C(p) = 3424.469 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 1 1347.15123 1347.15123 1626.24 <.0001 Error 7844 6497.84877 0.82838 Corrected Total 7845 7845.00000 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept -5.1482E-14 0.01028 2.07946E-23 0.00 1.0000 CVDf1 0.01917 0.00047543 1347.15123 1626.24 <.0001 Bounds on condition number: 1, 1 Stepwise Selection: Step 2 Variable CVDf2 Entered: R-Square = 0.3247 and C(p) = 1345.693 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 2 2547.22254 1273.61127 1885.50 <.0001 Error 7843 5297.77746 0.67548 Corrected Total 7845 7845.00000 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept -4.5343E-14 0.00928 1.61312E-23 0.00 1.0000 CVDf1 0.01823 0.00042990 1214.49377 1797.98 <.0001 CVDf2 21.80856 0.51740 1200.07130 1776.62 <.0001 Bounds on condition number: 1.0027, 4.0109 Stepwise Selection: Step 3 Variable CVDf3 Entered: R-Square = 0.3877 and C(p) = 491.2699 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 3 3041.15785 1013.71928 1654.84 <.0001 Error 7842 4803.84215 0.61258 Corrected Total 7845 7845.00000 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept -4.4852E-14 0.00884 1.57839E-23 0.00 1.0000 CVDf1 0.01438 0.00043120 681.70150 1112.84 <.0001 CVDf2 23.56816 0.49661 1379.71384 2252.30 <.0001 CVDf3 0.13686 0.00482 493.93531 806.32 <.0001 Bounds on condition number: 1.1212, 9.7564 Stepwise Selection: Step 4 Variable SBP17msW Entered: R-Square = 0.4126 and C(p) = 153.4894 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 4 3237.12365 809.28091 1377.11 <.0001 Error 7841 4607.87635 0.58766 Corrected Total 7845 7845.00000 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept -1.2225E-13 0.00865 1.17256E-22 0.00 1.0000 CVDf1 0.01453 0.00042241 695.29876 1183.16 <.0001 CVDf2 23.97928 0.48692 1425.21520 2425.22 <.0001 CVDf3 0.11914 0.00482 359.11233 611.08 <.0001 SBP17msW 0.16191 0.00887 195.96581 333.47 <.0001 Bounds on condition number: 1.1686, 17.406 Stepwise Selection: Step 5 Variable sex_IDsW Entered: R-Square = 0.4236 and C(p) = 6.0000 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 5 3323.34051 664.66810 1152.45 <.0001 Error 7840 4521.65949 0.57674 Corrected Total 7845 7845.00000 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept -1.2845E-13 0.00857 1.29454E-22 0.00 1.0000 CVDf1 0.01476 0.00041890 716.32252 1242.01 <.0001 CVDf2 16.45730 0.78178 255.58277 443.15 <.0001 CVDf3 0.11176 0.00481 311.03348 539.29 <.0001 SBP17msW 0.17041 0.00881 215.71423 374.02 <.0001 sex_IDsW -0.17070 0.01396 86.21686 149.49 <.0001 Bounds on condition number: 2.6811, 43.454 All variables left in the model are significant at the 0.2500 level. All variables have been entered into the model. Summary of Stepwise Selection Step Variable Entered Variable Removed Label Number Vars In Partial R-Square Model R-Square C(p) F Value Pr > F 1 CVDf1 Combination of 20 diet risk factors 1 0.1717 0.1717 3424.47 1626.24 <.0001 2 CVDf2 Kidney Dz, Smoking tobacco, sublingual tobacco 2 0.1530 0.3247 1345.69 1776.62 <.0001 3 CVDf3 Types 1 and 2 DM 3 0.0630 0.3877 491.270 806.32 <.0001 4 SBP17msW Systolic BP mm Hg 4 0.0250 0.4126 153.489 333.47 <.0001 5 sex_IDsW Sex male 1 and female 2 5 0.0110 0.4236 6.0000 149.49 <.0001 The REG Procedure Model: MODEL1 Dependent Variable: CVD2017msW CVD/100k/year ages 15-69 Panel of heat maps of residuals by regressors for CVD2017msW . The CORR Procedure 1 With Variables: CVD2017m 6 Variables: CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum Label CVD2017m 7846 543.66067 288.00939 4265562 73.47499 1844 CVD/100k/year ages 15-69 CVDf1 0 . . . . . Combination of 20 diet risk factors CVDf2 0 . . . . . Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3 0 . . . . . Types 1 and 2 DM CVDf4 0 . . . . . Mult reg CVDf1 CVDf2 CVDf3 SBP sex SBP17msW 7846 4.9045E-13 1.00000 3.84807E-9 -2.43011 3.23505 Systolic BP mm Hg sex_IDsW 7846 0 1.00000 0 -0.99994 0.99994 Sex male 1 and female 2 Pearson Correlation Coefficients Prob > |r| under H0: Rho=0 Number of Observations CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW CVD2017m CVD/100k/year ages 15-69 . . 0 . . 0 . . 0 . . 0 0.19492 <.0001 7846 -0.39527 <.0001 7846 Pearson Correlation Statistics (Fisher's z Transformation) Variable With Variable N Sample Correlation Fisher's z Bias Adjustment Correlation Estimate 95% Confidence Limits p Value for H0:Rho=0 CVDf1 CVD2017m 0 . . . . . . . CVDf2 CVD2017m 0 . . . . . . . CVDf3 CVD2017m 0 . . . . . . . CVDf4 CVD2017m 0 . . . . . . . SBP17msW CVD2017m 7846 0.19492 0.19745 0.0000124 0.19491 0.173531 0.216106 <.0001 sex_IDsW CVD2017m 7846 -0.39527 -0.41803 -0.0000252 -0.39525 -0.413755 -0.376410 <.0001 I use SAS on Demand for Academics SAS9.4 Where did CVDf1, CVDf2, CVDf3, CVDf4 go? Thanks.

dkcundiffMD · Posted 04-30-2021 12:06 AM

Well, using source2 and source3 did the job. However, it doesn't explain why that was necessary and when it might again be suddenly necessary.

Thanks for your help.

Result:

The CORR Procedure

1 With Variables:	CVD2017m
5 Variables:	CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW
Simple Statistics
Variable	N	Mean	Std Dev	Sum	Minimum	Maximum	Label
CVD2017m	7846	543.66067	288.00939	4265562	73.47499	1844	CVD/100k/year ages 15-69
CVDf1	7846	0	21.61400	0	-74.89858	33.74918	Combination of 20 diet risk factors
CVDf2	7846	0	0.01796	0	-0.02383	0.04322	Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3	7846	0	1.94134	0	-2.18859	20.35708	Types 1 and 2 DM
SBP17msW	7846	4.9045E-13	1.00000	3.84807E-9	-2.43011	3.23505	Systolic BP mm Hg
sex_IDsW	7846	0	1.00000	0	-0.99994	0.99994	Sex male 1 and female 2
Pearson Correlation Coefficients, N = 7846
Prob > |r| under H0: Rho=0
 	CVDf1	CVDf2	CVDf3	SBP17msW	sex_IDsW
CVD2017m
CVD/100k/year ages 15-69
0.41439
<.0001
0.41217
<.0001
0.31755
<.0001
0.19492
<.0001
-0.39527
<.0001
Pearson Correlation Statistics (Fisher's z Transformation)
Variable	With Variable	N	Sample Correlation	Fisher's z	Bias Adjustment	Correlation Estimate	95% Confidence Limits	p Value for
H0:Rho=0
CVDf1	CVD2017m	7846	0.41439	0.44090	0.0000264	0.41437	0.395873	0.432532	<.0001
CVDf2	CVD2017m	7846	0.41217	0.43822	0.0000263	0.41215	0.393608	0.430349	<.0001
CVDf3	CVD2017m	7846	0.31755	0.32892	0.0000202	0.31753	0.297494	0.337290	<.0001
SBP17msW	CVD2017m	7846	0.19492	0.19745	0.0000124	0.19491	0.173531	0.216106	<.0001
sex_IDsW	CVD2017m	7846	-0.39527	-0.41803	-0.0000252	-0.39525	-0.413755	-0.376410	<.0001
The REG Procedure

Model: MODEL1

Dependent Variable: CVD2017msW CVD/100k/year ages 15-69

Number of Observations Read	7846
Number of Observations Used	7846
Stepwise Selection: Step 1

Variable CVDf1 Entered: R-Square = 0.1717 and C(p) = 3424.469

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	1	1347.15123	1347.15123	1626.24	<.0001
Error	7844	6497.84877	0.82838	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-5.1482E-14	0.01028	2.07946E-23	0.00	1.0000
CVDf1	0.01917	0.00047543	1347.15123	1626.24	<.0001
Bounds on condition number: 1, 1

Stepwise Selection: Step 2

Variable CVDf2 Entered: R-Square = 0.3247 and C(p) = 1345.693

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	2	2547.22254	1273.61127	1885.50	<.0001
Error	7843	5297.77746	0.67548	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-4.5343E-14	0.00928	1.61312E-23	0.00	1.0000
CVDf1	0.01823	0.00042990	1214.49377	1797.98	<.0001
CVDf2	21.80856	0.51740	1200.07130	1776.62	<.0001
Bounds on condition number: 1.0027, 4.0109

Stepwise Selection: Step 3

Variable CVDf3 Entered: R-Square = 0.3877 and C(p) = 491.2699

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	3	3041.15785	1013.71928	1654.84	<.0001
Error	7842	4803.84215	0.61258	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-4.4852E-14	0.00884	1.57839E-23	0.00	1.0000
CVDf1	0.01438	0.00043120	681.70150	1112.84	<.0001
CVDf2	23.56816	0.49661	1379.71384	2252.30	<.0001
CVDf3	0.13686	0.00482	493.93531	806.32	<.0001
Bounds on condition number: 1.1212, 9.7564

Stepwise Selection: Step 4

Variable SBP17msW Entered: R-Square = 0.4126 and C(p) = 153.4894

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	4	3237.12365	809.28091	1377.11	<.0001
Error	7841	4607.87635	0.58766	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-1.2225E-13	0.00865	1.17256E-22	0.00	1.0000
CVDf1	0.01453	0.00042241	695.29876	1183.16	<.0001
CVDf2	23.97928	0.48692	1425.21520	2425.22	<.0001
CVDf3	0.11914	0.00482	359.11233	611.08	<.0001
SBP17msW	0.16191	0.00887	195.96581	333.47	<.0001
Bounds on condition number: 1.1686, 17.406

Stepwise Selection: Step 5

Variable sex_IDsW Entered: R-Square = 0.4236 and C(p) = 6.0000

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	5	3323.34051	664.66810	1152.45	<.0001
Error	7840	4521.65949	0.57674	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-1.2845E-13	0.00857	1.29454E-22	0.00	1.0000
CVDf1	0.01476	0.00041890	716.32252	1242.01	<.0001
CVDf2	16.45730	0.78178	255.58277	443.15	<.0001
CVDf3	0.11176	0.00481	311.03348	539.29	<.0001
SBP17msW	0.17041	0.00881	215.71423	374.02	<.0001
sex_IDsW	-0.17070	0.01396	86.21686	149.49	<.0001
Bounds on condition number: 2.6811, 43.454

All variables left in the model are significant at the 0.2500 level.

All variables have been entered into the model.

Summary of Stepwise Selection
Step	Variable
Entered	Variable
Removed	Label	Number
Vars In	Partial
R-Square	Model
R-Square	C(p)	F Value	Pr > F
1	CVDf1	 	Combination of 20 diet risk factors	1	0.1717	0.1717	3424.47	1626.24	<.0001
2	CVDf2	 	Kidney Dz, Smoking tobacco, sublingual tobacco	2	0.1530	0.3247	1345.69	1776.62	<.0001
3	CVDf3	 	Types 1 and 2 DM	3	0.0630	0.3877	491.270	806.32	<.0001
4	SBP17msW	 	Systolic BP mm Hg	4	0.0250	0.4126	153.489	333.47	<.0001
5	sex_IDsW	 	Sex male 1 and female 2	5	0.0110	0.4236	6.0000	149.49	<.0001
The REG Procedure

Model: MODEL1

Dependent Variable: CVD2017msW CVD/100k/year ages 15-69

Panel of heat maps of residuals by regressors for CVD2017msW.
The CORR Procedure

1 With Variables:	CVD2017m
6 Variables:	CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW
Simple Statistics
Variable	N	Mean	Std Dev	Sum	Minimum	Maximum	Label
CVD2017m	7846	543.66067	288.00939	4265562	73.47499	1844	CVD/100k/year ages 15-69
CVDf1	7846	0	21.61400	0	-74.89858	33.74918	Combination of 20 diet risk factors
CVDf2	7846	0	0.01796	0	-0.02383	0.04322	Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3	7846	0	1.94134	0	-2.18859	20.35708	Types 1 and 2 DM
CVDf4	7846	9.0822E-14	0.65083	7.1259E-10	-1.75968	2.95661	Mult reg CVDf1 CVDf2 CVDf3 SBP sex
SBP17msW	7846	4.9045E-13	1.00000	3.84807E-9	-2.43011	3.23505	Systolic BP mm Hg
sex_IDsW	7846	0	1.00000	0	-0.99994	0.99994	Sex male 1 and female 2
Pearson Correlation Coefficients, N = 7846
Prob > |r| under H0: Rho=0
 	CVDf1	CVDf2	CVDf3	CVDf4	SBP17msW	sex_IDsW
CVD2017m
CVD/100k/year ages 15-69
0.41439
<.0001
0.41217
<.0001
0.31755
<.0001
0.65087
<.0001
0.19492
<.0001
-0.39527
<.0001
Pearson Correlation Statistics (Fisher's z Transformation)
Variable	With Variable	N	Sample Correlation	Fisher's z	Bias Adjustment	Correlation Estimate	95% Confidence Limits	p Value for
H0:Rho=0
CVDf1	CVD2017m	7846	0.41439	0.44090	0.0000264	0.41437	0.395873	0.432532	<.0001
CVDf2	CVD2017m	7846	0.41217	0.43822	0.0000263	0.41215	0.393608	0.430349	<.0001
CVDf3	CVD2017m	7846	0.31755	0.32892	0.0000202	0.31753	0.297494	0.337290	<.0001
CVDf4	CVD2017m	7846	0.65087	0.77680	0.0000415	0.65084	0.637900	0.663415	<.0001
SBP17msW	CVD2017m	7846	0.19492	0.19745	0.0000124	0.19491	0.173531	0.216106	<.0001
sex_IDsW	CVD2017m	7846	-0.39527	-0.41803	-0.0000252	-0.39525	-0.413755	-0.376410	<.0001
The CORR Procedure

1 With Variables:	CVD2017m
7 Variables:	CVDf1 CVDf2 CVDf3 CVDf4 CVDf5 SBP17msW sex_IDsW
Simple Statistics
Variable	N	Mean	Std Dev	Sum	Minimum	Maximum	Label
CVD2017m	7846	543.66067	288.00939	4265562	73.47499	1844	CVD/100k/year ages 15-69
CVDf1	7846	0	21.61400	0	-74.89858	33.74918	Combination of 20 diet risk factors
CVDf2	7846	0	0.01796	0	-0.02383	0.04322	Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3	7846	0	1.94134	0	-2.18859	20.35708	Types 1 and 2 DM
CVDf4	7846	9.0822E-14	0.65083	7.1259E-10	-1.75968	2.95661	Mult reg CVDf1 CVDf2 CVDf3 SBP sex
CVDf5	7846	2.6151E-12	18.11747	2.05182E-8	-48.22646	83.18385	Final CVD risk factor formula
SBP17msW	7846	4.9045E-13	1.00000	3.84807E-9	-2.43011	3.23505	Systolic BP mm Hg
sex_IDsW	7846	0	1.00000	0	-0.99994	0.99994	Sex male 1 and female 2
Pearson Correlation Coefficients, N = 7846
Prob > |r| under H0: Rho=0
 	CVDf1	CVDf2	CVDf3	CVDf4	CVDf5	SBP17msW	sex_IDsW
CVD2017m
CVD/100k/year ages 15-69
0.41439
<.0001
0.41217
<.0001
0.31755
<.0001
0.65087
<.0001
0.65244
<.0001
0.19492
<.0001
-0.39527
<.0001
Pearson Correlation Statistics (Fisher's z Transformation)
Variable	With Variable	N	Sample Correlation	Fisher's z	Bias Adjustment	Correlation Estimate	95% Confidence Limits	p Value for
H0:Rho=0
CVDf1	CVD2017m	7846	0.41439	0.44090	0.0000264	0.41437	0.395873	0.432532	<.0001
CVDf2	CVD2017m	7846	0.41217	0.43822	0.0000263	0.41215	0.393608	0.430349	<.0001
CVDf3	CVD2017m	7846	0.31755	0.32892	0.0000202	0.31753	0.297494	0.337290	<.0001
CVDf4	CVD2017m	7846	0.65087	0.77680	0.0000415	0.65084	0.637900	0.663415	<.0001
CVDf5	CVD2017m	7846	0.65244	0.77953	0.0000416	0.65241	0.639517	0.664941	<.0001
SBP17msW	CVD2017m	7846	0.19492	0.19745	0.0000124	0.19491	0.173531	0.216106	<.0001
sex_IDsW	CVD2017m	7846	-0.39527	-0.41803	-0.0000252	-0.39525	-0.413755	-0.376410	<.0001

SAS code:

data work.source;
	set projects.source;
label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM;
CVDf1=
-	pmeat17KCsW 	* 	5.49 	* 	0.0443
-	rmeat17KCsW 	* 	50.70 	* 	0.0560
-	fish17KCsW 		* 	10.01 	* 	0.0412
-	milk17KCsW 		* 	25.37 	* 	0.0398
-	poultry16KCsW 	* 	45.06 	* 	0.0867
-	eggs16KCsW 		* 	19.47 	* 	0.1544
-	SFA16KCsW 		* 	191.27 	* 	0.0603 	* 	0.46
-	PUFA17KCsW 		* 	82.24 	* 	0.1033 	* 	0.46
-	TFA17KCsW 		* 	13.40 	* 	0.0128 	* 	0.46
-	Alcohol17KCsW 	* 	81.71 	* 	0.0047
+	Sugarb17KCsW 	* 	297.65 	* 	0.0136
+	potatoes16KCsW 	* 	84.16 	* 	0.0024
-	corn16KCsW 		* 	34.67 	* 	0.0037
-	fruits17KCsW 	* 	40.39 	* 	0.1291
- Vegetables17KCsW 	* 	80.14 	* 	0.0127
-	nutsseeds17KCsW * 	8.51 	* 	0.0797
-	wgrains17KCsW 	* 	55.65 	* 	0.0376
-	legumes17KCsW 	* 	51.66 	* 	0.0005
+	rice16KCsW 		* 	141.23 	* 	0.0001
-	swtpot16KCsW 	* 	22.67 	* 	0.0270
;

CVDf2=
+	smoke17msW 		* 0.2046 * 0.08899
+	SLTobacco17msW	* 0.0680 * 0.08179
+	kidneydz17msW  * 0.056	* 0.037636
;
CVDf3=
+	T1DM17msW	* 	10.34 * 0.1169
+	T2DM17msW	* 	17.47 * 0.05193
;
run;quit;

proc corr data=work.source fisher;
	var CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW;
	with CVD2017m;
run;quit;

Proc reg data=work.source;
*label
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM
	;
	model CVD2017msW=CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW 
	 	/ selection=STEPWISE 
		slentry=.25 slstay=.25;
	run; quit;
	
	*
	CVDf1	0.01476	0.00041890	716.32252	1242.01	<.0001
CVDf2	16.45730	0.78178	255.58277	443.15	<.0001
CVDf3	0.11176	0.00481	311.03348	539.29	<.0001
SBP17msW	0.17041	0.00881	215.71423	374.02	<.0001
sex_IDsW	-0.17070	0.01396	86.21686	149.49	<.0001
1	CVDf1	 	Combination of 20 diet risk factors	1	0.1717	0.1717	3424.47	1626.24	<.0001
2	CVDf2	 	Child wt, air pollution, Kidney Dz	2	0.1530	0.3247	1345.69	1776.62	<.0001
3	CVDf3	 	Types 1 and 2 DM	3	0.0630	0.3877	491.270	806.32	<.0001
4	SBP17msW	 	Systolic BP mm Hg	4	0.0250	0.4126	153.489	333.47	<.0001
5	sex_IDsW	 	Sex male 1 and female 2	5	0.0110	0.4236	6.0000	149.49	<.0001
;
*CVD formula R2=0.4236;
*data work.source;
*	set projects.source;
	
data source2;
set source;
	
label 
	CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ;	
CVDf4=
+	CVDf1 * 0.01476	
+	CVDf2 * 16.45730
+	CVDf3 * 0.11176	
+	SBP17msW * 0.17041
-	sex_IDsW * 0.17070
;

run; quit;

proc corr data=source2 fisher;
	var CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW;
	with CVD2017m;
run;quit;

*CVD formula R2=0.4236;
data source3;
	set source2;
		label 
	CVDf5=Final CVD risk factor formula;
CVDf5=
-	pmeat17KCsW		*	0.09
-	rmeat17KCsW		*	1.14
-	fish17KCsW		*	0.17
-	milk17KCsW		*	0.39
-	poultry16KCsW	*	1.54
-	eggs16kcsW		*	1.23
-	SFA16KCsW		*	2.10
-	PUFA17KCsW		*	1.56
-	TFA17KCsW		*	0.03
-	ALCOHOL17KCsW	*	0.13
+	Sugarb17KCsW	*	1.59
+	potatoes16KCsW	*	0.09
-	corn16kcsW		*	0.06
-	fruits17KCsW	*	2.12
-	vegetables17KCsW*	0.38
-	nutsseeds17KCsW	*	0.27
-	wgrains17KCsW	*	0.88
-	legumes17kcsW	*	0.01
+	rice16kcsW		*	0.00
-	swtpot16kcsW	*	0.27
+	smoke17msW		*	8.45
+	SLTobacco17msW	*	2.57
+	kidneydz17msW	*	0.98
+	T1DM17msW		*	3.80
+	T2DM17msW		*	2.85
+	SBP17msW		*	4.83
-	sex_IDsW		*	4.84

;
run; quit;

proc corr data=source3 fisher;

	var CVDf1 CVDf2 CVDf3 CVDf4 CVDf5 SBP17msW sex_IDsW;
	with CVD2017m;
run;quit;

Reeza · Posted 04-30-2021 11:58 AM

You have an original data set, A (projects.source). You made a copy, B (work.source) and added some new variables to it. These new variables are not in data set A ONLY in data set B.

Then you try and create a new copy, C (source) but you start off with using data set A (not B) and are expecting your variables from data set B to also be present. They do not exist in A and so you cannot use them there unless you re-create them. This is why I'm telling you to use very different table names so you can follow which is your input and output data sets. It was the same issue with your previous question.

dkcundiffMD · Posted 04-30-2021 12:20 PM

Now I understand it.

Many thanks.

David Cundiff

Registration is open