BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dkcundiffMD
Quartz | Level 8

This analysis uses complex variables and multiple regression. I'll show the code and the log. CVDf1, CVDf2, CVDf3, CVDf4 got lost at the end. 

data source;
	set projects.source;
CVDf1=
-	pmeat17KCsW 	* 	5.49 	* 	0.0443
-	rmeat17KCsW 	* 	50.70 	* 	0.0560
-	fish17KCsW 		* 	10.01 	* 	0.0412
-	milk17KCsW 		* 	25.37 	* 	0.0398
-	poultry16KCsW 	* 	45.06 	* 	0.0867
-	eggs16KCsW 		* 	19.47 	* 	0.1544
-	SFA16KCsW 		* 	191.27 	* 	0.0603 	* 	0.46
-	PUFA17KCsW 		* 	82.24 	* 	0.1033 	* 	0.46
-	TFA17KCsW 		* 	13.40 	* 	0.0128 	* 	0.46
-	Alcohol17KCsW 	* 	81.71 	* 	0.0047
+	Sugarb17KCsW 	* 	297.65 	* 	0.0136
+	potatoes16KCsW 	* 	84.16 	* 	0.0024
-	corn16KCsW 		* 	34.67 	* 	0.0037
-	fruits17KCsW 	* 	40.39 	* 	0.1291
- Vegetables17KCsW 	* 	80.14 	* 	0.0127
-	nutsseeds17KCsW * 	8.51 	* 	0.0797
-	wgrains17KCsW 	* 	55.65 	* 	0.0376
-	legumes17KCsW 	* 	51.66 	* 	0.0005
+	rice16KCsW 		* 	141.23 	* 	0.0001
-	swtpot16KCsW 	* 	22.67 	* 	0.0270
;

CVDf2=
+	smoke17msW 		* 0.2046 * 0.08899
+	SLTobacco17msW	* 0.0680 * 0.08179
+	kidneydz17msW  * 0.056	* 0.037636
;
CVDf3=
+	T1DM17msW	* 	10.34 * 0.1169
+	T2DM17msW	* 	17.47 * 0.05193
;
run;quit;

proc corr data=source fisher;
	label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM;
	var CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW;
	with CVD2017m;
run;quit;

* Works ok to here;

Proc reg data=source; label CVDf1=Combination of 20 diet risk factors CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3=Types 1 and 2 DM; ; model CVD2017msW=CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW / selection=STEPWISE slentry=.25 slstay=.25; run; quit; * Works ok to here;

*CVD formula R2=0.4236; data source; set projects.source; label CVDf1=Combination of 20 diet risk factors CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3=Types 1 and 2 DM CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ; CVDf4= + CVDf1 * 0.01476 + CVDf2 * 16.45730 + CVDf3 * 0.11176 + SBP17msW * 0.17041 - sex_IDsW * 0.17070 ; run; quit; proc corr data=source fisher; label CVDf1=Combination of 20 diet risk factors CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3=Types 1 and 2 DM CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ; var CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW; with CVD2017m; run;quit;

*Here SAS can't find CVDf1 CVDf2 CVDf3 CVDf4;

Here is the log;

 

The CORR Procedure

1 With Variables:	CVD2017m
5 Variables:	CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW
Simple Statistics
Variable	N	Mean	Std Dev	Sum	Minimum	Maximum	Label
CVD2017m	7846	543.66067	288.00939	4265562	73.47499	1844	CVD/100k/year ages 15-69
CVDf1	7846	0	21.61400	0	-74.89858	33.74918	Combination of 20 diet risk factors
CVDf2	7846	0	0.01796	0	-0.02383	0.04322	Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3	7846	0	1.94134	0	-2.18859	20.35708	Types 1 and 2 DM
SBP17msW	7846	4.9045E-13	1.00000	3.84807E-9	-2.43011	3.23505	Systolic BP mm Hg
sex_IDsW	7846	0	1.00000	0	-0.99994	0.99994	Sex male 1 and female 2
Pearson Correlation Coefficients, N = 7846
Prob > |r| under H0: Rho=0
 	CVDf1	CVDf2	CVDf3	SBP17msW	sex_IDsW
CVD2017m
CVD/100k/year ages 15-69
0.41439
<.0001
0.41217
<.0001
0.31755
<.0001
0.19492
<.0001
-0.39527
<.0001
Pearson Correlation Statistics (Fisher's z Transformation)
Variable	With Variable	N	Sample Correlation	Fisher's z	Bias Adjustment	Correlation Estimate	95% Confidence Limits	p Value for
H0:Rho=0
CVDf1	CVD2017m	7846	0.41439	0.44090	0.0000264	0.41437	0.395873	0.432532	<.0001
CVDf2	CVD2017m	7846	0.41217	0.43822	0.0000263	0.41215	0.393608	0.430349	<.0001
CVDf3	CVD2017m	7846	0.31755	0.32892	0.0000202	0.31753	0.297494	0.337290	<.0001
SBP17msW	CVD2017m	7846	0.19492	0.19745	0.0000124	0.19491	0.173531	0.216106	<.0001
sex_IDsW	CVD2017m	7846	-0.39527	-0.41803	-0.0000252	-0.39525	-0.413755	-0.376410	<.0001
The REG Procedure

Model: MODEL1

Dependent Variable: CVD2017msW CVD/100k/year ages 15-69

Number of Observations Read	7846
Number of Observations Used	7846
Stepwise Selection: Step 1

Variable CVDf1 Entered: R-Square = 0.1717 and C(p) = 3424.469

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	1	1347.15123	1347.15123	1626.24	<.0001
Error	7844	6497.84877	0.82838	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-5.1482E-14	0.01028	2.07946E-23	0.00	1.0000
CVDf1	0.01917	0.00047543	1347.15123	1626.24	<.0001
Bounds on condition number: 1, 1

Stepwise Selection: Step 2

Variable CVDf2 Entered: R-Square = 0.3247 and C(p) = 1345.693

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	2	2547.22254	1273.61127	1885.50	<.0001
Error	7843	5297.77746	0.67548	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-4.5343E-14	0.00928	1.61312E-23	0.00	1.0000
CVDf1	0.01823	0.00042990	1214.49377	1797.98	<.0001
CVDf2	21.80856	0.51740	1200.07130	1776.62	<.0001
Bounds on condition number: 1.0027, 4.0109

Stepwise Selection: Step 3

Variable CVDf3 Entered: R-Square = 0.3877 and C(p) = 491.2699

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	3	3041.15785	1013.71928	1654.84	<.0001
Error	7842	4803.84215	0.61258	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-4.4852E-14	0.00884	1.57839E-23	0.00	1.0000
CVDf1	0.01438	0.00043120	681.70150	1112.84	<.0001
CVDf2	23.56816	0.49661	1379.71384	2252.30	<.0001
CVDf3	0.13686	0.00482	493.93531	806.32	<.0001
Bounds on condition number: 1.1212, 9.7564

Stepwise Selection: Step 4

Variable SBP17msW Entered: R-Square = 0.4126 and C(p) = 153.4894

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	4	3237.12365	809.28091	1377.11	<.0001
Error	7841	4607.87635	0.58766	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-1.2225E-13	0.00865	1.17256E-22	0.00	1.0000
CVDf1	0.01453	0.00042241	695.29876	1183.16	<.0001
CVDf2	23.97928	0.48692	1425.21520	2425.22	<.0001
CVDf3	0.11914	0.00482	359.11233	611.08	<.0001
SBP17msW	0.16191	0.00887	195.96581	333.47	<.0001
Bounds on condition number: 1.1686, 17.406

Stepwise Selection: Step 5

Variable sex_IDsW Entered: R-Square = 0.4236 and C(p) = 6.0000

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	5	3323.34051	664.66810	1152.45	<.0001
Error	7840	4521.65949	0.57674	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-1.2845E-13	0.00857	1.29454E-22	0.00	1.0000
CVDf1	0.01476	0.00041890	716.32252	1242.01	<.0001
CVDf2	16.45730	0.78178	255.58277	443.15	<.0001
CVDf3	0.11176	0.00481	311.03348	539.29	<.0001
SBP17msW	0.17041	0.00881	215.71423	374.02	<.0001
sex_IDsW	-0.17070	0.01396	86.21686	149.49	<.0001
Bounds on condition number: 2.6811, 43.454

All variables left in the model are significant at the 0.2500 level.

All variables have been entered into the model.

Summary of Stepwise Selection
Step	Variable
Entered	Variable
Removed	Label	Number
Vars In	Partial
R-Square	Model
R-Square	C(p)	F Value	Pr > F
1	CVDf1	 	Combination of 20 diet risk factors	1	0.1717	0.1717	3424.47	1626.24	<.0001
2	CVDf2	 	Kidney Dz, Smoking tobacco, sublingual tobacco	2	0.1530	0.3247	1345.69	1776.62	<.0001
3	CVDf3	 	Types 1 and 2 DM	3	0.0630	0.3877	491.270	806.32	<.0001
4	SBP17msW	 	Systolic BP mm Hg	4	0.0250	0.4126	153.489	333.47	<.0001
5	sex_IDsW	 	Sex male 1 and female 2	5	0.0110	0.4236	6.0000	149.49	<.0001
The REG Procedure

Model: MODEL1

Dependent Variable: CVD2017msW CVD/100k/year ages 15-69

Panel of heat maps of residuals by regressors for CVD2017msW
.
The CORR Procedure

1 With Variables:	CVD2017m
6 Variables:	CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW
Simple Statistics
Variable	N	Mean	Std Dev	Sum	Minimum	Maximum	Label
CVD2017m	7846	543.66067	288.00939	4265562	73.47499	1844	CVD/100k/year ages 15-69
CVDf1	0	.	.	.	.	.	Combination of 20 diet risk factors
CVDf2	0	.	.	.	.	.	Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3	0	.	.	.	.	.	Types 1 and 2 DM
CVDf4	0	.	.	.	.	.	Mult reg CVDf1 CVDf2 CVDf3 SBP sex
SBP17msW	7846	4.9045E-13	1.00000	3.84807E-9	-2.43011	3.23505	Systolic BP mm Hg
sex_IDsW	7846	0	1.00000	0	-0.99994	0.99994	Sex male 1 and female 2
Pearson Correlation Coefficients
Prob > |r| under H0: Rho=0
Number of Observations
 	CVDf1	CVDf2	CVDf3	CVDf4	SBP17msW	sex_IDsW
CVD2017m
CVD/100k/year ages 15-69
.
.
0
.
.
0
.
.
0
.
.
0
0.19492
<.0001
7846
-0.39527
<.0001
7846
Pearson Correlation Statistics (Fisher's z Transformation)
Variable	With Variable	N	Sample Correlation	Fisher's z	Bias Adjustment	Correlation Estimate	95% Confidence Limits	p Value for
H0:Rho=0
CVDf1	CVD2017m	0	.	.	.	.	.	.	.
CVDf2	CVD2017m	0	.	.	.	.	.	.	.
CVDf3	CVD2017m	0	.	.	.	.	.	.	.
CVDf4	CVD2017m	0	.	.	.	.	.	.	.
SBP17msW	CVD2017m	7846	0.19492	0.19745	0.0000124	0.19491	0.173531	0.216106	<.0001
sex_IDsW	CVD2017m	7846	-0.39527	-0.41803	-0.0000252	-0.39525	-0.413755	-0.376410	<.0001

I use SAS on Demand for Academics SAS9.4

 

Where did CVDf1, CVDf2, CVDf3, CVDf4 go?

Thanks. 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

You did not include your log, you included your output. I would have expected to see errors for the PROC CORR step because your LABEL statement looks problematic. I would have expected to see the text into quotations. 

 

Also, if you put the LABEL in the data step you do not have to repeat it for each PROC. 

 

This part of the code is the problem. Order of operations. You're now reading from the original source file, not the file you created in the work library called SOURCE. That means the variables you created previously no longer exist. 

 

 

data source;
	set projects.source;
label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM
	CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ;	
CVDf4=
+	CVDf1 * 0.01476	
+	CVDf2 * 16.45730
+	CVDf3 * 0.11176	
+	SBP17msW * 0.17041
-	sex_IDsW * 0.17070
;

run; quit

 

 

You probably want to do this instead:

data source2;
	set source;

You don't seem to be able to follow how the data is flowing yet, so for now my recommendation would be to not re-use the same names ANYWHERE in your code. Use a unique name for each data set so that you can trace things.

 

Spoiler

@dkcundiffMD wrote:

This analysis uses complex variables and multiple regression. I'll show the code and the log. CVDf1, CVDf2, CVDf3, CVDf4 got lost at the end. 

data source;
	set projects.source;
CVDf1=
-	pmeat17KCsW 	* 	5.49 	* 	0.0443
-	rmeat17KCsW 	* 	50.70 	* 	0.0560
-	fish17KCsW 		* 	10.01 	* 	0.0412
-	milk17KCsW 		* 	25.37 	* 	0.0398
-	poultry16KCsW 	* 	45.06 	* 	0.0867
-	eggs16KCsW 		* 	19.47 	* 	0.1544
-	SFA16KCsW 		* 	191.27 	* 	0.0603 	* 	0.46
-	PUFA17KCsW 		* 	82.24 	* 	0.1033 	* 	0.46
-	TFA17KCsW 		* 	13.40 	* 	0.0128 	* 	0.46
-	Alcohol17KCsW 	* 	81.71 	* 	0.0047
+	Sugarb17KCsW 	* 	297.65 	* 	0.0136
+	potatoes16KCsW 	* 	84.16 	* 	0.0024
-	corn16KCsW 		* 	34.67 	* 	0.0037
-	fruits17KCsW 	* 	40.39 	* 	0.1291
- Vegetables17KCsW 	* 	80.14 	* 	0.0127
-	nutsseeds17KCsW * 	8.51 	* 	0.0797
-	wgrains17KCsW 	* 	55.65 	* 	0.0376
-	legumes17KCsW 	* 	51.66 	* 	0.0005
+	rice16KCsW 		* 	141.23 	* 	0.0001
-	swtpot16KCsW 	* 	22.67 	* 	0.0270
;

CVDf2=
+	smoke17msW 		* 0.2046 * 0.08899
+	SLTobacco17msW	* 0.0680 * 0.08179
+	kidneydz17msW  * 0.056	* 0.037636
;
CVDf3=
+	T1DM17msW	* 	10.34 * 0.1169
+	T2DM17msW	* 	17.47 * 0.05193
;
run;quit;

proc corr data=source fisher;
	label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM;
	var CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW;
	with CVD2017m;
run;quit;

* Works ok to here;

Proc reg data=source; label CVDf1=Combination of 20 diet risk factors CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3=Types 1 and 2 DM; ; model CVD2017msW=CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW / selection=STEPWISE slentry=.25 slstay=.25; run; quit; * Works ok to here;

*CVD formula R2=0.4236; data source; set projects.source; label CVDf1=Combination of 20 diet risk factors CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3=Types 1 and 2 DM CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ; CVDf4= + CVDf1 * 0.01476 + CVDf2 * 16.45730 + CVDf3 * 0.11176 + SBP17msW * 0.17041 - sex_IDsW * 0.17070 ; run; quit; proc corr data=source fisher; label CVDf1=Combination of 20 diet risk factors CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3=Types 1 and 2 DM CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ; var CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW; with CVD2017m; run;quit;

*Here SAS can't find CVDf1 CVDf2 CVDf3 CVDf4;

Here is the log;

 

The CORR Procedure

1 With Variables:	CVD2017m
5 Variables:	CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW
Simple Statistics
Variable	N	Mean	Std Dev	Sum	Minimum	Maximum	Label
CVD2017m	7846	543.66067	288.00939	4265562	73.47499	1844	CVD/100k/year ages 15-69
CVDf1	7846	0	21.61400	0	-74.89858	33.74918	Combination of 20 diet risk factors
CVDf2	7846	0	0.01796	0	-0.02383	0.04322	Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3	7846	0	1.94134	0	-2.18859	20.35708	Types 1 and 2 DM
SBP17msW	7846	4.9045E-13	1.00000	3.84807E-9	-2.43011	3.23505	Systolic BP mm Hg
sex_IDsW	7846	0	1.00000	0	-0.99994	0.99994	Sex male 1 and female 2
Pearson Correlation Coefficients, N = 7846
Prob > |r| under H0: Rho=0
 	CVDf1	CVDf2	CVDf3	SBP17msW	sex_IDsW
CVD2017m
CVD/100k/year ages 15-69
0.41439
<.0001
0.41217
<.0001
0.31755
<.0001
0.19492
<.0001
-0.39527
<.0001
Pearson Correlation Statistics (Fisher's z Transformation)
Variable	With Variable	N	Sample Correlation	Fisher's z	Bias Adjustment	Correlation Estimate	95% Confidence Limits	p Value for
H0:Rho=0
CVDf1	CVD2017m	7846	0.41439	0.44090	0.0000264	0.41437	0.395873	0.432532	<.0001
CVDf2	CVD2017m	7846	0.41217	0.43822	0.0000263	0.41215	0.393608	0.430349	<.0001
CVDf3	CVD2017m	7846	0.31755	0.32892	0.0000202	0.31753	0.297494	0.337290	<.0001
SBP17msW	CVD2017m	7846	0.19492	0.19745	0.0000124	0.19491	0.173531	0.216106	<.0001
sex_IDsW	CVD2017m	7846	-0.39527	-0.41803	-0.0000252	-0.39525	-0.413755	-0.376410	<.0001
The REG Procedure

Model: MODEL1

Dependent Variable: CVD2017msW CVD/100k/year ages 15-69

Number of Observations Read	7846
Number of Observations Used	7846
Stepwise Selection: Step 1

Variable CVDf1 Entered: R-Square = 0.1717 and C(p) = 3424.469

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	1	1347.15123	1347.15123	1626.24	<.0001
Error	7844	6497.84877	0.82838	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-5.1482E-14	0.01028	2.07946E-23	0.00	1.0000
CVDf1	0.01917	0.00047543	1347.15123	1626.24	<.0001
Bounds on condition number: 1, 1

Stepwise Selection: Step 2

Variable CVDf2 Entered: R-Square = 0.3247 and C(p) = 1345.693

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	2	2547.22254	1273.61127	1885.50	<.0001
Error	7843	5297.77746	0.67548	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-4.5343E-14	0.00928	1.61312E-23	0.00	1.0000
CVDf1	0.01823	0.00042990	1214.49377	1797.98	<.0001
CVDf2	21.80856	0.51740	1200.07130	1776.62	<.0001
Bounds on condition number: 1.0027, 4.0109

Stepwise Selection: Step 3

Variable CVDf3 Entered: R-Square = 0.3877 and C(p) = 491.2699

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	3	3041.15785	1013.71928	1654.84	<.0001
Error	7842	4803.84215	0.61258	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-4.4852E-14	0.00884	1.57839E-23	0.00	1.0000
CVDf1	0.01438	0.00043120	681.70150	1112.84	<.0001
CVDf2	23.56816	0.49661	1379.71384	2252.30	<.0001
CVDf3	0.13686	0.00482	493.93531	806.32	<.0001
Bounds on condition number: 1.1212, 9.7564

Stepwise Selection: Step 4

Variable SBP17msW Entered: R-Square = 0.4126 and C(p) = 153.4894

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	4	3237.12365	809.28091	1377.11	<.0001
Error	7841	4607.87635	0.58766	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-1.2225E-13	0.00865	1.17256E-22	0.00	1.0000
CVDf1	0.01453	0.00042241	695.29876	1183.16	<.0001
CVDf2	23.97928	0.48692	1425.21520	2425.22	<.0001
CVDf3	0.11914	0.00482	359.11233	611.08	<.0001
SBP17msW	0.16191	0.00887	195.96581	333.47	<.0001
Bounds on condition number: 1.1686, 17.406

Stepwise Selection: Step 5

Variable sex_IDsW Entered: R-Square = 0.4236 and C(p) = 6.0000

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	5	3323.34051	664.66810	1152.45	<.0001
Error	7840	4521.65949	0.57674	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-1.2845E-13	0.00857	1.29454E-22	0.00	1.0000
CVDf1	0.01476	0.00041890	716.32252	1242.01	<.0001
CVDf2	16.45730	0.78178	255.58277	443.15	<.0001
CVDf3	0.11176	0.00481	311.03348	539.29	<.0001
SBP17msW	0.17041	0.00881	215.71423	374.02	<.0001
sex_IDsW	-0.17070	0.01396	86.21686	149.49	<.0001
Bounds on condition number: 2.6811, 43.454

All variables left in the model are significant at the 0.2500 level.

All variables have been entered into the model.

Summary of Stepwise Selection
Step	Variable
Entered	Variable
Removed	Label	Number
Vars In	Partial
R-Square	Model
R-Square	C(p)	F Value	Pr > F
1	CVDf1	 	Combination of 20 diet risk factors	1	0.1717	0.1717	3424.47	1626.24	<.0001
2	CVDf2	 	Kidney Dz, Smoking tobacco, sublingual tobacco	2	0.1530	0.3247	1345.69	1776.62	<.0001
3	CVDf3	 	Types 1 and 2 DM	3	0.0630	0.3877	491.270	806.32	<.0001
4	SBP17msW	 	Systolic BP mm Hg	4	0.0250	0.4126	153.489	333.47	<.0001
5	sex_IDsW	 	Sex male 1 and female 2	5	0.0110	0.4236	6.0000	149.49	<.0001
The REG Procedure

Model: MODEL1

Dependent Variable: CVD2017msW CVD/100k/year ages 15-69

Panel of heat maps of residuals by regressors for CVD2017msW
.
The CORR Procedure

1 With Variables:	CVD2017m
6 Variables:	CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW
Simple Statistics
Variable	N	Mean	Std Dev	Sum	Minimum	Maximum	Label
CVD2017m	7846	543.66067	288.00939	4265562	73.47499	1844	CVD/100k/year ages 15-69
CVDf1	0	.	.	.	.	.	Combination of 20 diet risk factors
CVDf2	0	.	.	.	.	.	Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3	0	.	.	.	.	.	Types 1 and 2 DM
CVDf4	0	.	.	.	.	.	Mult reg CVDf1 CVDf2 CVDf3 SBP sex
SBP17msW	7846	4.9045E-13	1.00000	3.84807E-9	-2.43011	3.23505	Systolic BP mm Hg
sex_IDsW	7846	0	1.00000	0	-0.99994	0.99994	Sex male 1 and female 2
Pearson Correlation Coefficients
Prob > |r| under H0: Rho=0
Number of Observations
 	CVDf1	CVDf2	CVDf3	CVDf4	SBP17msW	sex_IDsW
CVD2017m
CVD/100k/year ages 15-69
.
.
0
.
.
0
.
.
0
.
.
0
0.19492
<.0001
7846
-0.39527
<.0001
7846
Pearson Correlation Statistics (Fisher's z Transformation)
Variable	With Variable	N	Sample Correlation	Fisher's z	Bias Adjustment	Correlation Estimate	95% Confidence Limits	p Value for
H0:Rho=0
CVDf1	CVD2017m	0	.	.	.	.	.	.	.
CVDf2	CVD2017m	0	.	.	.	.	.	.	.
CVDf3	CVD2017m	0	.	.	.	.	.	.	.
CVDf4	CVD2017m	0	.	.	.	.	.	.	.
SBP17msW	CVD2017m	7846	0.19492	0.19745	0.0000124	0.19491	0.173531	0.216106	<.0001
sex_IDsW	CVD2017m	7846	-0.39527	-0.41803	-0.0000252	-0.39525	-0.413755	-0.376410	<.0001

I use SAS on Demand for Academics SAS9.4

 

Where did CVDf1, CVDf2, CVDf3, CVDf4 go?

Thanks. 


 

View solution in original post

7 REPLIES 7
japelin
Rhodochrosite | Level 12

I don't have the data and submit log, so I'm guessing, but I think that in the second data step, the dataset to be set is work.source, not projects.source.

The other thing I noticed is that you are repeating the same label statement over and over again.
I think if you specify it in the first data step, you don't need to specify it every time.
I am sorry if this is intentionally done this way.

dkcundiffMD
Quartz | Level 8

It was helpful to use work.source instead of source. However, I still lose CVDf1, CVDf2, CVDf3, and CVDf4 after a regression step. The final answer is CVDf5 and it gets that answer. That answer depends on CVDf1, CVDf2, CVDf3, and CVDf4. If I stop before the final CVDf5 data step followed by proc corr, then CVDf1, CVDf2, CVDf3, and CVDf4 are included in the work.souce. If I take it to the final step to derive CVDf5, then CVDf5 is included in the output but not CVDf1, CVDf2, CVDf3, and CVDf4. The log says: WARNING: Variable CVDF1 not found in data set WORK.SOURCE.

How does it lose CVDf1, CVDf2, CVDf3, and CVDf4 in the final proc corr? The last data step must have those variables because CVDf5, the final answer, depends on them. 

Thanks

 

SAS code:

data work.source;
	set projects.source;
CVDf1=
-	pmeat17KCsW 	* 	5.49 	* 	0.0443
-	rmeat17KCsW 	* 	50.70 	* 	0.0560
-	fish17KCsW 		* 	10.01 	* 	0.0412
-	milk17KCsW 		* 	25.37 	* 	0.0398
-	poultry16KCsW 	* 	45.06 	* 	0.0867
-	eggs16KCsW 		* 	19.47 	* 	0.1544
-	SFA16KCsW 		* 	191.27 	* 	0.0603 	* 	0.46
-	PUFA17KCsW 		* 	82.24 	* 	0.1033 	* 	0.46
-	TFA17KCsW 		* 	13.40 	* 	0.0128 	* 	0.46
-	Alcohol17KCsW 	* 	81.71 	* 	0.0047
+	Sugarb17KCsW 	* 	297.65 	* 	0.0136
+	potatoes16KCsW 	* 	84.16 	* 	0.0024
-	corn16KCsW 		* 	34.67 	* 	0.0037
-	fruits17KCsW 	* 	40.39 	* 	0.1291
- Vegetables17KCsW 	* 	80.14 	* 	0.0127
-	nutsseeds17KCsW * 	8.51 	* 	0.0797
-	wgrains17KCsW 	* 	55.65 	* 	0.0376
-	legumes17KCsW 	* 	51.66 	* 	0.0005
+	rice16KCsW 		* 	141.23 	* 	0.0001
-	swtpot16KCsW 	* 	22.67 	* 	0.0270
;

CVDf2=
+	smoke17msW 		* 0.2046 * 0.08899
+	SLTobacco17msW	* 0.0680 * 0.08179
+	kidneydz17msW  * 0.056	* 0.037636
;
CVDf3=
+	T1DM17msW	* 	10.34 * 0.1169
+	T2DM17msW	* 	17.47 * 0.05193
;
run;quit;

proc corr data=work.source fisher;
	label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM;
	var CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW;
	with CVD2017m;
run;quit;

Proc reg data=work.source;
	label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM;
	;
	model CVD2017msW=CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW 
	 	/ selection=STEPWISE 
		slentry=.25 slstay=.25;
	run; quit;
	
	*
	CVDf1	0.01476	0.00041890	716.32252	1242.01	<.0001
CVDf2	16.45730	0.78178	255.58277	443.15	<.0001
CVDf3	0.11176	0.00481	311.03348	539.29	<.0001
SBP17msW	0.17041	0.00881	215.71423	374.02	<.0001
sex_IDsW	-0.17070	0.01396	86.21686	149.49	<.0001
1	CVDf1	 	Combination of 20 diet risk factors	1	0.1717	0.1717	3424.47	1626.24	<.0001
2	CVDf2	 	Child wt, air pollution, Kidney Dz	2	0.1530	0.3247	1345.69	1776.62	<.0001
3	CVDf3	 	Types 1 and 2 DM	3	0.0630	0.3877	491.270	806.32	<.0001
4	SBP17msW	 	Systolic BP mm Hg	4	0.0250	0.4126	153.489	333.47	<.0001
5	sex_IDsW	 	Sex male 1 and female 2	5	0.0110	0.4236	6.0000	149.49	<.0001
;
*CVD formula R2=0.4236;
data work.source;
	set projects.source;
label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM
	CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ;	
CVDf4=
+	CVDf1 * 0.01476	
+	CVDf2 * 16.45730
+	CVDf3 * 0.11176	
+	SBP17msW * 0.17041
-	sex_IDsW * 0.17070
;

run; quit;

proc corr data=work.source fisher;
	label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM
	CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ;
	var CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW;
	with CVD2017m;
run;quit;

*CVD formula R2=0.4236;
data work.source;
	set projects.source;
CVDf5=
-	pmeat17KCsW		*	0.09
-	rmeat17KCsW		*	1.14
-	fish17KCsW		*	0.17
-	milk17KCsW		*	0.39
-	poultry16KCsW	*	1.54
-	eggs16kcsW		*	1.23
-	SFA16KCsW		*	2.10
-	PUFA17KCsW		*	1.56
-	TFA17KCsW		*	0.03
-	ALCOHOL17KCsW	*	0.13
+	Sugarb17KCsW	*	1.59
+	potatoes16KCsW	*	0.09
-	corn16kcsW		*	0.06
-	fruits17KCsW	*	2.12
-	vegetables17KCsW*	0.38
-	nutsseeds17KCsW	*	0.27
-	wgrains17KCsW	*	0.88
-	legumes17kcsW	*	0.01
+	rice16kcsW		*	0.00
-	swtpot16kcsW	*	0.27
+	smoke17msW		*	8.45
+	SLTobacco17msW	*	2.57
+	kidneydz17msW	*	0.98
+	T1DM17msW		*	3.80
+	T2DM17msW		*	2.85
+	SBP17msW		*	4.83
-	sex_IDsW		*	4.84

;
run; quit;

proc corr data=work.source fisher;
	label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM
	CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex 
	CVDf5=Final CVD risk factor formula;
	var  CVDf5 SBP17msW sex_IDsW;
	with CVD2017m;
run;quit;

results:
The REG Procedure

Model: MODEL1

Dependent Variable: CVD2017msW CVD/100k/year ages 15-69

Number of Observations Read	7846
Number of Observations Used	7846
Stepwise Selection: Step 1

Variable CVDf1 Entered: R-Square = 0.1717 and C(p) = 3424.469

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	1	1347.15123	1347.15123	1626.24	<.0001
Error	7844	6497.84877	0.82838	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-5.1482E-14	0.01028	2.07946E-23	0.00	1.0000
CVDf1	0.01917	0.00047543	1347.15123	1626.24	<.0001
Bounds on condition number: 1, 1

Stepwise Selection: Step 2

Variable CVDf2 Entered: R-Square = 0.3247 and C(p) = 1345.693

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	2	2547.22254	1273.61127	1885.50	<.0001
Error	7843	5297.77746	0.67548	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-4.5343E-14	0.00928	1.61312E-23	0.00	1.0000
CVDf1	0.01823	0.00042990	1214.49377	1797.98	<.0001
CVDf2	21.80856	0.51740	1200.07130	1776.62	<.0001
Bounds on condition number: 1.0027, 4.0109

Stepwise Selection: Step 3

Variable CVDf3 Entered: R-Square = 0.3877 and C(p) = 491.2699

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	3	3041.15785	1013.71928	1654.84	<.0001
Error	7842	4803.84215	0.61258	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-4.4852E-14	0.00884	1.57839E-23	0.00	1.0000
CVDf1	0.01438	0.00043120	681.70150	1112.84	<.0001
CVDf2	23.56816	0.49661	1379.71384	2252.30	<.0001
CVDf3	0.13686	0.00482	493.93531	806.32	<.0001
Bounds on condition number: 1.1212, 9.7564

Stepwise Selection: Step 4

Variable SBP17msW Entered: R-Square = 0.4126 and C(p) = 153.4894

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	4	3237.12365	809.28091	1377.11	<.0001
Error	7841	4607.87635	0.58766	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-1.2225E-13	0.00865	1.17256E-22	0.00	1.0000
CVDf1	0.01453	0.00042241	695.29876	1183.16	<.0001
CVDf2	23.97928	0.48692	1425.21520	2425.22	<.0001
CVDf3	0.11914	0.00482	359.11233	611.08	<.0001
SBP17msW	0.16191	0.00887	195.96581	333.47	<.0001
Bounds on condition number: 1.1686, 17.406

Stepwise Selection: Step 5

Variable sex_IDsW Entered: R-Square = 0.4236 and C(p) = 6.0000

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	5	3323.34051	664.66810	1152.45	<.0001
Error	7840	4521.65949	0.57674	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-1.2845E-13	0.00857	1.29454E-22	0.00	1.0000
CVDf1	0.01476	0.00041890	716.32252	1242.01	<.0001
CVDf2	16.45730	0.78178	255.58277	443.15	<.0001
CVDf3	0.11176	0.00481	311.03348	539.29	<.0001
SBP17msW	0.17041	0.00881	215.71423	374.02	<.0001
sex_IDsW	-0.17070	0.01396	86.21686	149.49	<.0001
Bounds on condition number: 2.6811, 43.454

All variables left in the model are significant at the 0.2500 level.

All variables have been entered into the model.

Summary of Stepwise Selection
Step	Variable
Entered	Variable
Removed	Label	Number
Vars In	Partial
R-Square	Model
R-Square	C(p)	F Value	Pr > F
1	CVDf1	 	Combination of 20 diet risk factors	1	0.1717	0.1717	3424.47	1626.24	<.0001
2	CVDf2	 	Kidney Dz, Smoking tobacco, sublingual tobacco	2	0.1530	0.3247	1345.69	1776.62	<.0001
3	CVDf3	 	Types 1 and 2 DM	3	0.0630	0.3877	491.270	806.32	<.0001
4	SBP17msW	 	Systolic BP mm Hg	4	0.0250	0.4126	153.489	333.47	<.0001
5	sex_IDsW	 	Sex male 1 and female 2	5	0.0110	0.4236	6.0000	149.49	<.0001
The REG Procedure

Model: MODEL1

Dependent Variable: CVD2017msW CVD/100k/year ages 15-69

Panel of heat maps of residuals by regressors for CVD2017msW.
The CORR Procedure

1 With Variables:	CVD2017m
6 Variables:	CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW
Simple Statistics
Variable	N	Mean	Std Dev	Sum	Minimum	Maximum	Label
CVD2017m	7846	543.66067	288.00939	4265562	73.47499	1844	CVD/100k/year ages 15-69
CVDf1	0	.	.	.	.	.	Combination of 20 diet risk factors
CVDf2	0	.	.	.	.	.	Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3	0	.	.	.	.	.	Types 1 and 2 DM
CVDf4	0	.	.	.	.	.	Mult reg CVDf1 CVDf2 CVDf3 SBP sex
SBP17msW	7846	4.9045E-13	1.00000	3.84807E-9	-2.43011	3.23505	Systolic BP mm Hg
sex_IDsW	7846	0	1.00000	0	-0.99994	0.99994	Sex male 1 and female 2
Pearson Correlation Coefficients
Prob > |r| under H0: Rho=0
Number of Observations
 	CVDf1	CVDf2	CVDf3	CVDf4	SBP17msW	sex_IDsW
CVD2017m
CVD/100k/year ages 15-69
.
.
0
.
.
0
.
.
0
.
.
0
0.19492
<.0001
7846
-0.39527
<.0001
7846
Pearson Correlation Statistics (Fisher's z Transformation)
Variable	With Variable	N	Sample Correlation	Fisher's z	Bias Adjustment	Correlation Estimate	95% Confidence Limits	p Value for
H0:Rho=0
CVDf1	CVD2017m	0	.	.	.	.	.	.	.
CVDf2	CVD2017m	0	.	.	.	.	.	.	.
CVDf3	CVD2017m	0	.	.	.	.	.	.	.
CVDf4	CVD2017m	0	.	.	.	.	.	.	.
SBP17msW	CVD2017m	7846	0.19492	0.19745	0.0000124	0.19491	0.173531	0.216106	<.0001
sex_IDsW	CVD2017m	7846	-0.39527	-0.41803	-0.0000252	-0.39525	-0.413755	-0.376410	<.0001
The CORR Procedure

1 With Variables:	CVD2017m
3 Variables:	CVDf5 SBP17msW sex_IDsW
Simple Statistics
Variable	N	Mean	Std Dev	Sum	Minimum	Maximum	Label
CVD2017m	7846	543.66067	288.00939	4265562	73.47499	1844	CVD/100k/year ages 15-69
CVDf5	7846	2.6151E-12	18.11747	2.05182E-8	-48.22646	83.18385	Final CVD risk factor formula
SBP17msW	7846	4.9045E-13	1.00000	3.84807E-9	-2.43011	3.23505	Systolic BP mm Hg
sex_IDsW	7846	0	1.00000	0	-0.99994	0.99994	Sex male 1 and female 2
Pearson Correlation Coefficients, N = 7846
Prob > |r| under H0: Rho=0
 	CVDf5	SBP17msW	sex_IDsW
CVD2017m
CVD/100k/year ages 15-69
0.65244
<.0001
0.19492
<.0001
-0.39527
<.0001
Pearson Correlation Statistics (Fisher's z Transformation)
Variable	With Variable	N	Sample Correlation	Fisher's z	Bias Adjustment	Correlation Estimate	95% Confidence Limits	p Value for
H0:Rho=0
CVDf5	CVD2017m	7846	0.65244	0.77953	0.0000416	0.65241	0.639517	0.664941	<.0001
SBP17msW	CVD2017m	7846	0.19492	0.19745	0.0000124	0.19491	0.173531	0.216106	<.0001
sex_IDsW	CVD2017m	7846	-0.39527	-0.41803	-0.0000252	-0.39525	-0.413755	-0.376410	<.0001

Log;

      
 217        *CVD formula R2=0.4236;
 218        data work.source;
 219        set projects.source;
 220        CVDf5=
 221        -pmeat17KCsW*0.09
 222        -rmeat17KCsW*1.14
 223        -fish17KCsW*0.17
 224        -milk17KCsW*0.39
 225        -poultry16KCsW*1.54
 226        -eggs16kcsW*1.23
 227        -SFA16KCsW*2.10
 228        -PUFA17KCsW*1.56
 229        -TFA17KCsW*0.03
 230        -ALCOHOL17KCsW*0.13
 231        +Sugarb17KCsW*1.59
 232        +potatoes16KCsW*0.09
 233        -corn16kcsW*0.06
 234        -fruits17KCsW*2.12
 235        -vegetables17KCsW*0.38
 236        -nutsseeds17KCsW*0.27
 237        -wgrains17KCsW*0.88
 238        -legumes17kcsW*0.01
 239        +rice16kcsW*0.00
 240        -swtpot16kcsW*0.27
 241        +smoke17msW*8.45
 242        +SLTobacco17msW*2.57
 243        +kidneydz17msW*0.98
 244        +T1DM17msW*3.80
 245        +T2DM17msW*2.85
 246        +SBP17msW*4.83
 247        -sex_IDsW*4.84
 248        
 249        ;
 250        run;
 
 NOTE: There were 7846 observations read from the data set PROJECTS.SOURCE.
 NOTE: The data set WORK.SOURCE has 7846 observations and 1273 variables.
 NOTE: DATA statement used (Total process time):
       real time           0.11 seconds
       user cpu time       0.02 seconds
       system cpu time     0.10 seconds
       memory              5043.34k
       OS Memory           68032.00k
       Timestamp           04/30/2021 02:16:01 AM
       Step Count                        299  Switch Count  15
       Page Faults                       0
       Page Reclaims                     743
       Page Swaps                        0
       Voluntary Context Switches        51
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           167952
       
 
 250      !      quit;
 251        
 252        proc corr data=work.source fisher;
 253        label
 254        CVDf1=Combination of 20 diet risk factors
 255        CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
 WARNING: Variable CVDF1 not found in data set WORK.SOURCE.
 256        CVDf3=Types 1 and 2 DM
 WARNING: Variable CVDF2 not found in data set WORK.SOURCE.
 257        CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex
 WARNING: Variable CVDF3 not found in data set WORK.SOURCE.
 258        CVDf5=Final CVD risk factor formula;
 WARNING: Variable CVDF4 not found in data set WORK.SOURCE.
 259        var  CVDf5 SBP17msW sex_IDsW;
 260        with CVD2017m;
 261        run;
 
 NOTE: PROCEDURE CORR used (Total process time):
       real time           0.08 seconds
       user cpu time       0.06 seconds
       system cpu time     0.02 seconds
       memory              2373.78k
       OS Memory           64952.00k
       Timestamp           04/30/2021 02:16:01 AM
       Step Count                        300  Switch Count  13
       Page Faults                       0
       Page Reclaims                     246
       Page Swaps                        0
       Voluntary Context Switches        36
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           8
       
 
 261      !     quit;
 262        
 263        OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 275        
japelin
Rhodochrosite | Level 12

I think you need to modify the following two parts from set projects.source to set work.source.
Isn't CVDf1-CVDf5 included in projects.source?

 

data work.source;
	set projects.source;
label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM
	CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ;	
CVDf4=
data work.source;
	set projects.source;
CVDf5=
Reeza
Super User

You did not include your log, you included your output. I would have expected to see errors for the PROC CORR step because your LABEL statement looks problematic. I would have expected to see the text into quotations. 

 

Also, if you put the LABEL in the data step you do not have to repeat it for each PROC. 

 

This part of the code is the problem. Order of operations. You're now reading from the original source file, not the file you created in the work library called SOURCE. That means the variables you created previously no longer exist. 

 

 

data source;
	set projects.source;
label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM
	CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ;	
CVDf4=
+	CVDf1 * 0.01476	
+	CVDf2 * 16.45730
+	CVDf3 * 0.11176	
+	SBP17msW * 0.17041
-	sex_IDsW * 0.17070
;

run; quit

 

 

You probably want to do this instead:

data source2;
	set source;

You don't seem to be able to follow how the data is flowing yet, so for now my recommendation would be to not re-use the same names ANYWHERE in your code. Use a unique name for each data set so that you can trace things.

 

Spoiler

@dkcundiffMD wrote:

This analysis uses complex variables and multiple regression. I'll show the code and the log. CVDf1, CVDf2, CVDf3, CVDf4 got lost at the end. 

data source;
	set projects.source;
CVDf1=
-	pmeat17KCsW 	* 	5.49 	* 	0.0443
-	rmeat17KCsW 	* 	50.70 	* 	0.0560
-	fish17KCsW 		* 	10.01 	* 	0.0412
-	milk17KCsW 		* 	25.37 	* 	0.0398
-	poultry16KCsW 	* 	45.06 	* 	0.0867
-	eggs16KCsW 		* 	19.47 	* 	0.1544
-	SFA16KCsW 		* 	191.27 	* 	0.0603 	* 	0.46
-	PUFA17KCsW 		* 	82.24 	* 	0.1033 	* 	0.46
-	TFA17KCsW 		* 	13.40 	* 	0.0128 	* 	0.46
-	Alcohol17KCsW 	* 	81.71 	* 	0.0047
+	Sugarb17KCsW 	* 	297.65 	* 	0.0136
+	potatoes16KCsW 	* 	84.16 	* 	0.0024
-	corn16KCsW 		* 	34.67 	* 	0.0037
-	fruits17KCsW 	* 	40.39 	* 	0.1291
- Vegetables17KCsW 	* 	80.14 	* 	0.0127
-	nutsseeds17KCsW * 	8.51 	* 	0.0797
-	wgrains17KCsW 	* 	55.65 	* 	0.0376
-	legumes17KCsW 	* 	51.66 	* 	0.0005
+	rice16KCsW 		* 	141.23 	* 	0.0001
-	swtpot16KCsW 	* 	22.67 	* 	0.0270
;

CVDf2=
+	smoke17msW 		* 0.2046 * 0.08899
+	SLTobacco17msW	* 0.0680 * 0.08179
+	kidneydz17msW  * 0.056	* 0.037636
;
CVDf3=
+	T1DM17msW	* 	10.34 * 0.1169
+	T2DM17msW	* 	17.47 * 0.05193
;
run;quit;

proc corr data=source fisher;
	label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM;
	var CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW;
	with CVD2017m;
run;quit;

* Works ok to here;

Proc reg data=source; label CVDf1=Combination of 20 diet risk factors CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3=Types 1 and 2 DM; ; model CVD2017msW=CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW / selection=STEPWISE slentry=.25 slstay=.25; run; quit; * Works ok to here;

*CVD formula R2=0.4236; data source; set projects.source; label CVDf1=Combination of 20 diet risk factors CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3=Types 1 and 2 DM CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ; CVDf4= + CVDf1 * 0.01476 + CVDf2 * 16.45730 + CVDf3 * 0.11176 + SBP17msW * 0.17041 - sex_IDsW * 0.17070 ; run; quit; proc corr data=source fisher; label CVDf1=Combination of 20 diet risk factors CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco CVDf3=Types 1 and 2 DM CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ; var CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW; with CVD2017m; run;quit;

*Here SAS can't find CVDf1 CVDf2 CVDf3 CVDf4;

Here is the log;

 

The CORR Procedure

1 With Variables:	CVD2017m
5 Variables:	CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW
Simple Statistics
Variable	N	Mean	Std Dev	Sum	Minimum	Maximum	Label
CVD2017m	7846	543.66067	288.00939	4265562	73.47499	1844	CVD/100k/year ages 15-69
CVDf1	7846	0	21.61400	0	-74.89858	33.74918	Combination of 20 diet risk factors
CVDf2	7846	0	0.01796	0	-0.02383	0.04322	Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3	7846	0	1.94134	0	-2.18859	20.35708	Types 1 and 2 DM
SBP17msW	7846	4.9045E-13	1.00000	3.84807E-9	-2.43011	3.23505	Systolic BP mm Hg
sex_IDsW	7846	0	1.00000	0	-0.99994	0.99994	Sex male 1 and female 2
Pearson Correlation Coefficients, N = 7846
Prob > |r| under H0: Rho=0
 	CVDf1	CVDf2	CVDf3	SBP17msW	sex_IDsW
CVD2017m
CVD/100k/year ages 15-69
0.41439
<.0001
0.41217
<.0001
0.31755
<.0001
0.19492
<.0001
-0.39527
<.0001
Pearson Correlation Statistics (Fisher's z Transformation)
Variable	With Variable	N	Sample Correlation	Fisher's z	Bias Adjustment	Correlation Estimate	95% Confidence Limits	p Value for
H0:Rho=0
CVDf1	CVD2017m	7846	0.41439	0.44090	0.0000264	0.41437	0.395873	0.432532	<.0001
CVDf2	CVD2017m	7846	0.41217	0.43822	0.0000263	0.41215	0.393608	0.430349	<.0001
CVDf3	CVD2017m	7846	0.31755	0.32892	0.0000202	0.31753	0.297494	0.337290	<.0001
SBP17msW	CVD2017m	7846	0.19492	0.19745	0.0000124	0.19491	0.173531	0.216106	<.0001
sex_IDsW	CVD2017m	7846	-0.39527	-0.41803	-0.0000252	-0.39525	-0.413755	-0.376410	<.0001
The REG Procedure

Model: MODEL1

Dependent Variable: CVD2017msW CVD/100k/year ages 15-69

Number of Observations Read	7846
Number of Observations Used	7846
Stepwise Selection: Step 1

Variable CVDf1 Entered: R-Square = 0.1717 and C(p) = 3424.469

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	1	1347.15123	1347.15123	1626.24	<.0001
Error	7844	6497.84877	0.82838	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-5.1482E-14	0.01028	2.07946E-23	0.00	1.0000
CVDf1	0.01917	0.00047543	1347.15123	1626.24	<.0001
Bounds on condition number: 1, 1

Stepwise Selection: Step 2

Variable CVDf2 Entered: R-Square = 0.3247 and C(p) = 1345.693

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	2	2547.22254	1273.61127	1885.50	<.0001
Error	7843	5297.77746	0.67548	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-4.5343E-14	0.00928	1.61312E-23	0.00	1.0000
CVDf1	0.01823	0.00042990	1214.49377	1797.98	<.0001
CVDf2	21.80856	0.51740	1200.07130	1776.62	<.0001
Bounds on condition number: 1.0027, 4.0109

Stepwise Selection: Step 3

Variable CVDf3 Entered: R-Square = 0.3877 and C(p) = 491.2699

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	3	3041.15785	1013.71928	1654.84	<.0001
Error	7842	4803.84215	0.61258	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-4.4852E-14	0.00884	1.57839E-23	0.00	1.0000
CVDf1	0.01438	0.00043120	681.70150	1112.84	<.0001
CVDf2	23.56816	0.49661	1379.71384	2252.30	<.0001
CVDf3	0.13686	0.00482	493.93531	806.32	<.0001
Bounds on condition number: 1.1212, 9.7564

Stepwise Selection: Step 4

Variable SBP17msW Entered: R-Square = 0.4126 and C(p) = 153.4894

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	4	3237.12365	809.28091	1377.11	<.0001
Error	7841	4607.87635	0.58766	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-1.2225E-13	0.00865	1.17256E-22	0.00	1.0000
CVDf1	0.01453	0.00042241	695.29876	1183.16	<.0001
CVDf2	23.97928	0.48692	1425.21520	2425.22	<.0001
CVDf3	0.11914	0.00482	359.11233	611.08	<.0001
SBP17msW	0.16191	0.00887	195.96581	333.47	<.0001
Bounds on condition number: 1.1686, 17.406

Stepwise Selection: Step 5

Variable sex_IDsW Entered: R-Square = 0.4236 and C(p) = 6.0000

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	5	3323.34051	664.66810	1152.45	<.0001
Error	7840	4521.65949	0.57674	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-1.2845E-13	0.00857	1.29454E-22	0.00	1.0000
CVDf1	0.01476	0.00041890	716.32252	1242.01	<.0001
CVDf2	16.45730	0.78178	255.58277	443.15	<.0001
CVDf3	0.11176	0.00481	311.03348	539.29	<.0001
SBP17msW	0.17041	0.00881	215.71423	374.02	<.0001
sex_IDsW	-0.17070	0.01396	86.21686	149.49	<.0001
Bounds on condition number: 2.6811, 43.454

All variables left in the model are significant at the 0.2500 level.

All variables have been entered into the model.

Summary of Stepwise Selection
Step	Variable
Entered	Variable
Removed	Label	Number
Vars In	Partial
R-Square	Model
R-Square	C(p)	F Value	Pr > F
1	CVDf1	 	Combination of 20 diet risk factors	1	0.1717	0.1717	3424.47	1626.24	<.0001
2	CVDf2	 	Kidney Dz, Smoking tobacco, sublingual tobacco	2	0.1530	0.3247	1345.69	1776.62	<.0001
3	CVDf3	 	Types 1 and 2 DM	3	0.0630	0.3877	491.270	806.32	<.0001
4	SBP17msW	 	Systolic BP mm Hg	4	0.0250	0.4126	153.489	333.47	<.0001
5	sex_IDsW	 	Sex male 1 and female 2	5	0.0110	0.4236	6.0000	149.49	<.0001
The REG Procedure

Model: MODEL1

Dependent Variable: CVD2017msW CVD/100k/year ages 15-69

Panel of heat maps of residuals by regressors for CVD2017msW
.
The CORR Procedure

1 With Variables:	CVD2017m
6 Variables:	CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW
Simple Statistics
Variable	N	Mean	Std Dev	Sum	Minimum	Maximum	Label
CVD2017m	7846	543.66067	288.00939	4265562	73.47499	1844	CVD/100k/year ages 15-69
CVDf1	0	.	.	.	.	.	Combination of 20 diet risk factors
CVDf2	0	.	.	.	.	.	Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3	0	.	.	.	.	.	Types 1 and 2 DM
CVDf4	0	.	.	.	.	.	Mult reg CVDf1 CVDf2 CVDf3 SBP sex
SBP17msW	7846	4.9045E-13	1.00000	3.84807E-9	-2.43011	3.23505	Systolic BP mm Hg
sex_IDsW	7846	0	1.00000	0	-0.99994	0.99994	Sex male 1 and female 2
Pearson Correlation Coefficients
Prob > |r| under H0: Rho=0
Number of Observations
 	CVDf1	CVDf2	CVDf3	CVDf4	SBP17msW	sex_IDsW
CVD2017m
CVD/100k/year ages 15-69
.
.
0
.
.
0
.
.
0
.
.
0
0.19492
<.0001
7846
-0.39527
<.0001
7846
Pearson Correlation Statistics (Fisher's z Transformation)
Variable	With Variable	N	Sample Correlation	Fisher's z	Bias Adjustment	Correlation Estimate	95% Confidence Limits	p Value for
H0:Rho=0
CVDf1	CVD2017m	0	.	.	.	.	.	.	.
CVDf2	CVD2017m	0	.	.	.	.	.	.	.
CVDf3	CVD2017m	0	.	.	.	.	.	.	.
CVDf4	CVD2017m	0	.	.	.	.	.	.	.
SBP17msW	CVD2017m	7846	0.19492	0.19745	0.0000124	0.19491	0.173531	0.216106	<.0001
sex_IDsW	CVD2017m	7846	-0.39527	-0.41803	-0.0000252	-0.39525	-0.413755	-0.376410	<.0001

I use SAS on Demand for Academics SAS9.4

 

Where did CVDf1, CVDf2, CVDf3, CVDf4 go?

Thanks. 


 

dkcundiffMD
Quartz | Level 8

Well, using source2 and source3 did the job. However, it doesn't explain why that was necessary and when it might again be suddenly necessary. 

Thanks for your help.

Result:

The CORR Procedure

1 With Variables:	CVD2017m
5 Variables:	CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW
Simple Statistics
Variable	N	Mean	Std Dev	Sum	Minimum	Maximum	Label
CVD2017m	7846	543.66067	288.00939	4265562	73.47499	1844	CVD/100k/year ages 15-69
CVDf1	7846	0	21.61400	0	-74.89858	33.74918	Combination of 20 diet risk factors
CVDf2	7846	0	0.01796	0	-0.02383	0.04322	Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3	7846	0	1.94134	0	-2.18859	20.35708	Types 1 and 2 DM
SBP17msW	7846	4.9045E-13	1.00000	3.84807E-9	-2.43011	3.23505	Systolic BP mm Hg
sex_IDsW	7846	0	1.00000	0	-0.99994	0.99994	Sex male 1 and female 2
Pearson Correlation Coefficients, N = 7846
Prob > |r| under H0: Rho=0
 	CVDf1	CVDf2	CVDf3	SBP17msW	sex_IDsW
CVD2017m
CVD/100k/year ages 15-69
0.41439
<.0001
0.41217
<.0001
0.31755
<.0001
0.19492
<.0001
-0.39527
<.0001
Pearson Correlation Statistics (Fisher's z Transformation)
Variable	With Variable	N	Sample Correlation	Fisher's z	Bias Adjustment	Correlation Estimate	95% Confidence Limits	p Value for
H0:Rho=0
CVDf1	CVD2017m	7846	0.41439	0.44090	0.0000264	0.41437	0.395873	0.432532	<.0001
CVDf2	CVD2017m	7846	0.41217	0.43822	0.0000263	0.41215	0.393608	0.430349	<.0001
CVDf3	CVD2017m	7846	0.31755	0.32892	0.0000202	0.31753	0.297494	0.337290	<.0001
SBP17msW	CVD2017m	7846	0.19492	0.19745	0.0000124	0.19491	0.173531	0.216106	<.0001
sex_IDsW	CVD2017m	7846	-0.39527	-0.41803	-0.0000252	-0.39525	-0.413755	-0.376410	<.0001
The REG Procedure

Model: MODEL1

Dependent Variable: CVD2017msW CVD/100k/year ages 15-69

Number of Observations Read	7846
Number of Observations Used	7846
Stepwise Selection: Step 1

Variable CVDf1 Entered: R-Square = 0.1717 and C(p) = 3424.469

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	1	1347.15123	1347.15123	1626.24	<.0001
Error	7844	6497.84877	0.82838	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-5.1482E-14	0.01028	2.07946E-23	0.00	1.0000
CVDf1	0.01917	0.00047543	1347.15123	1626.24	<.0001
Bounds on condition number: 1, 1

Stepwise Selection: Step 2

Variable CVDf2 Entered: R-Square = 0.3247 and C(p) = 1345.693

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	2	2547.22254	1273.61127	1885.50	<.0001
Error	7843	5297.77746	0.67548	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-4.5343E-14	0.00928	1.61312E-23	0.00	1.0000
CVDf1	0.01823	0.00042990	1214.49377	1797.98	<.0001
CVDf2	21.80856	0.51740	1200.07130	1776.62	<.0001
Bounds on condition number: 1.0027, 4.0109

Stepwise Selection: Step 3

Variable CVDf3 Entered: R-Square = 0.3877 and C(p) = 491.2699

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	3	3041.15785	1013.71928	1654.84	<.0001
Error	7842	4803.84215	0.61258	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-4.4852E-14	0.00884	1.57839E-23	0.00	1.0000
CVDf1	0.01438	0.00043120	681.70150	1112.84	<.0001
CVDf2	23.56816	0.49661	1379.71384	2252.30	<.0001
CVDf3	0.13686	0.00482	493.93531	806.32	<.0001
Bounds on condition number: 1.1212, 9.7564

Stepwise Selection: Step 4

Variable SBP17msW Entered: R-Square = 0.4126 and C(p) = 153.4894

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	4	3237.12365	809.28091	1377.11	<.0001
Error	7841	4607.87635	0.58766	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-1.2225E-13	0.00865	1.17256E-22	0.00	1.0000
CVDf1	0.01453	0.00042241	695.29876	1183.16	<.0001
CVDf2	23.97928	0.48692	1425.21520	2425.22	<.0001
CVDf3	0.11914	0.00482	359.11233	611.08	<.0001
SBP17msW	0.16191	0.00887	195.96581	333.47	<.0001
Bounds on condition number: 1.1686, 17.406

Stepwise Selection: Step 5

Variable sex_IDsW Entered: R-Square = 0.4236 and C(p) = 6.0000

Analysis of Variance
Source	DF	Sum of
Squares	Mean
Square	F Value	Pr > F
Model	5	3323.34051	664.66810	1152.45	<.0001
Error	7840	4521.65949	0.57674	 	 
Corrected Total	7845	7845.00000	 	 	 
Variable	Parameter
Estimate	Standard
Error	Type II SS	F Value	Pr > F
Intercept	-1.2845E-13	0.00857	1.29454E-22	0.00	1.0000
CVDf1	0.01476	0.00041890	716.32252	1242.01	<.0001
CVDf2	16.45730	0.78178	255.58277	443.15	<.0001
CVDf3	0.11176	0.00481	311.03348	539.29	<.0001
SBP17msW	0.17041	0.00881	215.71423	374.02	<.0001
sex_IDsW	-0.17070	0.01396	86.21686	149.49	<.0001
Bounds on condition number: 2.6811, 43.454

All variables left in the model are significant at the 0.2500 level.

All variables have been entered into the model.

Summary of Stepwise Selection
Step	Variable
Entered	Variable
Removed	Label	Number
Vars In	Partial
R-Square	Model
R-Square	C(p)	F Value	Pr > F
1	CVDf1	 	Combination of 20 diet risk factors	1	0.1717	0.1717	3424.47	1626.24	<.0001
2	CVDf2	 	Kidney Dz, Smoking tobacco, sublingual tobacco	2	0.1530	0.3247	1345.69	1776.62	<.0001
3	CVDf3	 	Types 1 and 2 DM	3	0.0630	0.3877	491.270	806.32	<.0001
4	SBP17msW	 	Systolic BP mm Hg	4	0.0250	0.4126	153.489	333.47	<.0001
5	sex_IDsW	 	Sex male 1 and female 2	5	0.0110	0.4236	6.0000	149.49	<.0001
The REG Procedure

Model: MODEL1

Dependent Variable: CVD2017msW CVD/100k/year ages 15-69

Panel of heat maps of residuals by regressors for CVD2017msW.
The CORR Procedure

1 With Variables:	CVD2017m
6 Variables:	CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW
Simple Statistics
Variable	N	Mean	Std Dev	Sum	Minimum	Maximum	Label
CVD2017m	7846	543.66067	288.00939	4265562	73.47499	1844	CVD/100k/year ages 15-69
CVDf1	7846	0	21.61400	0	-74.89858	33.74918	Combination of 20 diet risk factors
CVDf2	7846	0	0.01796	0	-0.02383	0.04322	Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3	7846	0	1.94134	0	-2.18859	20.35708	Types 1 and 2 DM
CVDf4	7846	9.0822E-14	0.65083	7.1259E-10	-1.75968	2.95661	Mult reg CVDf1 CVDf2 CVDf3 SBP sex
SBP17msW	7846	4.9045E-13	1.00000	3.84807E-9	-2.43011	3.23505	Systolic BP mm Hg
sex_IDsW	7846	0	1.00000	0	-0.99994	0.99994	Sex male 1 and female 2
Pearson Correlation Coefficients, N = 7846
Prob > |r| under H0: Rho=0
 	CVDf1	CVDf2	CVDf3	CVDf4	SBP17msW	sex_IDsW
CVD2017m
CVD/100k/year ages 15-69
0.41439
<.0001
0.41217
<.0001
0.31755
<.0001
0.65087
<.0001
0.19492
<.0001
-0.39527
<.0001
Pearson Correlation Statistics (Fisher's z Transformation)
Variable	With Variable	N	Sample Correlation	Fisher's z	Bias Adjustment	Correlation Estimate	95% Confidence Limits	p Value for
H0:Rho=0
CVDf1	CVD2017m	7846	0.41439	0.44090	0.0000264	0.41437	0.395873	0.432532	<.0001
CVDf2	CVD2017m	7846	0.41217	0.43822	0.0000263	0.41215	0.393608	0.430349	<.0001
CVDf3	CVD2017m	7846	0.31755	0.32892	0.0000202	0.31753	0.297494	0.337290	<.0001
CVDf4	CVD2017m	7846	0.65087	0.77680	0.0000415	0.65084	0.637900	0.663415	<.0001
SBP17msW	CVD2017m	7846	0.19492	0.19745	0.0000124	0.19491	0.173531	0.216106	<.0001
sex_IDsW	CVD2017m	7846	-0.39527	-0.41803	-0.0000252	-0.39525	-0.413755	-0.376410	<.0001
The CORR Procedure

1 With Variables:	CVD2017m
7 Variables:	CVDf1 CVDf2 CVDf3 CVDf4 CVDf5 SBP17msW sex_IDsW
Simple Statistics
Variable	N	Mean	Std Dev	Sum	Minimum	Maximum	Label
CVD2017m	7846	543.66067	288.00939	4265562	73.47499	1844	CVD/100k/year ages 15-69
CVDf1	7846	0	21.61400	0	-74.89858	33.74918	Combination of 20 diet risk factors
CVDf2	7846	0	0.01796	0	-0.02383	0.04322	Kidney Dz, Smoking tobacco, sublingual tobacco
CVDf3	7846	0	1.94134	0	-2.18859	20.35708	Types 1 and 2 DM
CVDf4	7846	9.0822E-14	0.65083	7.1259E-10	-1.75968	2.95661	Mult reg CVDf1 CVDf2 CVDf3 SBP sex
CVDf5	7846	2.6151E-12	18.11747	2.05182E-8	-48.22646	83.18385	Final CVD risk factor formula
SBP17msW	7846	4.9045E-13	1.00000	3.84807E-9	-2.43011	3.23505	Systolic BP mm Hg
sex_IDsW	7846	0	1.00000	0	-0.99994	0.99994	Sex male 1 and female 2
Pearson Correlation Coefficients, N = 7846
Prob > |r| under H0: Rho=0
 	CVDf1	CVDf2	CVDf3	CVDf4	CVDf5	SBP17msW	sex_IDsW
CVD2017m
CVD/100k/year ages 15-69
0.41439
<.0001
0.41217
<.0001
0.31755
<.0001
0.65087
<.0001
0.65244
<.0001
0.19492
<.0001
-0.39527
<.0001
Pearson Correlation Statistics (Fisher's z Transformation)
Variable	With Variable	N	Sample Correlation	Fisher's z	Bias Adjustment	Correlation Estimate	95% Confidence Limits	p Value for
H0:Rho=0
CVDf1	CVD2017m	7846	0.41439	0.44090	0.0000264	0.41437	0.395873	0.432532	<.0001
CVDf2	CVD2017m	7846	0.41217	0.43822	0.0000263	0.41215	0.393608	0.430349	<.0001
CVDf3	CVD2017m	7846	0.31755	0.32892	0.0000202	0.31753	0.297494	0.337290	<.0001
CVDf4	CVD2017m	7846	0.65087	0.77680	0.0000415	0.65084	0.637900	0.663415	<.0001
CVDf5	CVD2017m	7846	0.65244	0.77953	0.0000416	0.65241	0.639517	0.664941	<.0001
SBP17msW	CVD2017m	7846	0.19492	0.19745	0.0000124	0.19491	0.173531	0.216106	<.0001
sex_IDsW	CVD2017m	7846	-0.39527	-0.41803	-0.0000252	-0.39525	-0.413755	-0.376410	<.0001

SAS code:

data work.source;
	set projects.source;
label 
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM;
CVDf1=
-	pmeat17KCsW 	* 	5.49 	* 	0.0443
-	rmeat17KCsW 	* 	50.70 	* 	0.0560
-	fish17KCsW 		* 	10.01 	* 	0.0412
-	milk17KCsW 		* 	25.37 	* 	0.0398
-	poultry16KCsW 	* 	45.06 	* 	0.0867
-	eggs16KCsW 		* 	19.47 	* 	0.1544
-	SFA16KCsW 		* 	191.27 	* 	0.0603 	* 	0.46
-	PUFA17KCsW 		* 	82.24 	* 	0.1033 	* 	0.46
-	TFA17KCsW 		* 	13.40 	* 	0.0128 	* 	0.46
-	Alcohol17KCsW 	* 	81.71 	* 	0.0047
+	Sugarb17KCsW 	* 	297.65 	* 	0.0136
+	potatoes16KCsW 	* 	84.16 	* 	0.0024
-	corn16KCsW 		* 	34.67 	* 	0.0037
-	fruits17KCsW 	* 	40.39 	* 	0.1291
- Vegetables17KCsW 	* 	80.14 	* 	0.0127
-	nutsseeds17KCsW * 	8.51 	* 	0.0797
-	wgrains17KCsW 	* 	55.65 	* 	0.0376
-	legumes17KCsW 	* 	51.66 	* 	0.0005
+	rice16KCsW 		* 	141.23 	* 	0.0001
-	swtpot16KCsW 	* 	22.67 	* 	0.0270
;

CVDf2=
+	smoke17msW 		* 0.2046 * 0.08899
+	SLTobacco17msW	* 0.0680 * 0.08179
+	kidneydz17msW  * 0.056	* 0.037636
;
CVDf3=
+	T1DM17msW	* 	10.34 * 0.1169
+	T2DM17msW	* 	17.47 * 0.05193
;
run;quit;

proc corr data=work.source fisher;
	var CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW;
	with CVD2017m;
run;quit;

Proc reg data=work.source;
*label
	CVDf1=Combination of 20 diet risk factors 
	CVDf2= Kidney Dz, Smoking tobacco, sublingual tobacco
	CVDf3=Types 1 and 2 DM
	;
	model CVD2017msW=CVDf1 CVDf2 CVDf3 SBP17msW sex_IDsW 
	 	/ selection=STEPWISE 
		slentry=.25 slstay=.25;
	run; quit;
	
	*
	CVDf1	0.01476	0.00041890	716.32252	1242.01	<.0001
CVDf2	16.45730	0.78178	255.58277	443.15	<.0001
CVDf3	0.11176	0.00481	311.03348	539.29	<.0001
SBP17msW	0.17041	0.00881	215.71423	374.02	<.0001
sex_IDsW	-0.17070	0.01396	86.21686	149.49	<.0001
1	CVDf1	 	Combination of 20 diet risk factors	1	0.1717	0.1717	3424.47	1626.24	<.0001
2	CVDf2	 	Child wt, air pollution, Kidney Dz	2	0.1530	0.3247	1345.69	1776.62	<.0001
3	CVDf3	 	Types 1 and 2 DM	3	0.0630	0.3877	491.270	806.32	<.0001
4	SBP17msW	 	Systolic BP mm Hg	4	0.0250	0.4126	153.489	333.47	<.0001
5	sex_IDsW	 	Sex male 1 and female 2	5	0.0110	0.4236	6.0000	149.49	<.0001
;
*CVD formula R2=0.4236;
*data work.source;
*	set projects.source;
	
data source2;
set source;
	
label 
	CVDf4=Mult reg CVDf1 CVDf2 CVDf3 SBP sex ;	
CVDf4=
+	CVDf1 * 0.01476	
+	CVDf2 * 16.45730
+	CVDf3 * 0.11176	
+	SBP17msW * 0.17041
-	sex_IDsW * 0.17070
;

run; quit;

proc corr data=source2 fisher;
	var CVDf1 CVDf2 CVDf3 CVDf4 SBP17msW sex_IDsW;
	with CVD2017m;
run;quit;

*CVD formula R2=0.4236;
data source3;
	set source2;
		label 
	CVDf5=Final CVD risk factor formula;
CVDf5=
-	pmeat17KCsW		*	0.09
-	rmeat17KCsW		*	1.14
-	fish17KCsW		*	0.17
-	milk17KCsW		*	0.39
-	poultry16KCsW	*	1.54
-	eggs16kcsW		*	1.23
-	SFA16KCsW		*	2.10
-	PUFA17KCsW		*	1.56
-	TFA17KCsW		*	0.03
-	ALCOHOL17KCsW	*	0.13
+	Sugarb17KCsW	*	1.59
+	potatoes16KCsW	*	0.09
-	corn16kcsW		*	0.06
-	fruits17KCsW	*	2.12
-	vegetables17KCsW*	0.38
-	nutsseeds17KCsW	*	0.27
-	wgrains17KCsW	*	0.88
-	legumes17kcsW	*	0.01
+	rice16kcsW		*	0.00
-	swtpot16kcsW	*	0.27
+	smoke17msW		*	8.45
+	SLTobacco17msW	*	2.57
+	kidneydz17msW	*	0.98
+	T1DM17msW		*	3.80
+	T2DM17msW		*	2.85
+	SBP17msW		*	4.83
-	sex_IDsW		*	4.84

;
run; quit;

proc corr data=source3 fisher;

	var CVDf1 CVDf2 CVDf3 CVDf4 CVDf5 SBP17msW sex_IDsW;
	with CVD2017m;
run;quit;

 

Reeza
Super User
You have an original data set, A (projects.source). You made a copy, B (work.source) and added some new variables to it. These new variables are not in data set A ONLY in data set B.

Then you try and create a new copy, C (source) but you start off with using data set A (not B) and are expecting your variables from data set B to also be present. They do not exist in A and so you cannot use them there unless you re-create them. This is why I'm telling you to use very different table names so you can follow which is your input and output data sets. It was the same issue with your previous question.
dkcundiffMD
Quartz | Level 8

Now I understand it. 

Many thanks. 

David Cundiff

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 767 views
  • 4 likes
  • 3 in conversation