BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Toni2
Lapis Lazuli | Level 10

hi, i run the proc phreg procedure (see below) for stepwise regression. My dataset has 71 observations but SAS reads only the 33 of them. I have seen the dataset and there are no missing values. This is the note from log :

 

"NOTE: 38 observations were deleted due either to missing or invalid values for the time, censoring, frequency or explanatory
variables or to invalid operations in generating the values for some of the explanatory variables."

 

I found someone else had similar issue and included the :

 

output out=resOut resmart=resmart;

 

in the proc phreg (i did the same below) and found all the 38 lines of data that have not been considered by SAS. i am not sure why these 38 observations have been excluded by proc phreg procedure and i wonder how i can fix this in order to be included?

 

proc phreg data=test1;
model &var6 = &var17 &var8 D1 D2 D3/ selection=stepwise slentry=0.05 slstay=0.05 details rl;
output out=resOut resmart=resmart;
run;

proc print data=resOut; where resmart is missing; run;

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

What sort of unit is the variable DDEPPC supposed to be? Have you counted how many negative values you have for that variable? Does it happen to be 37 with the missing values of those other two variables on a record where DDEPPC is positive?

 

I don't think Phreg expects negative time values. In details of the documentation for Phreg in Failure Time Distribution it starts with the following. The modeled variable is that T

Let T be a nonnegative random variable representing the failure time of an individual from a homogeneous population.

View solution in original post

14 REPLIES 14
Reeza
Super User

You're missing values most likely.
Please post the output from the following:

proc freq data=test1;
table &var6. &var17. &var8. D1 D2 D3 / MISSING;
run;
Toni2
Lapis Lazuli | Level 10

hi, thanks. there are no missing values, see below :

 

 

DDEPPC Frequency Percent Cumulative Cumulative
Frequency Percent
-1.23 1 1.41 1 1.41
-1.07 1 1.41 2 2.82
-0.9611 1 1.41 3 4.23
-0.9462 1 1.41 4 5.63
-0.9238 1 1.41 5 7.04
-0.9123 1 1.41 6 8.45
-0.8908 1 1.41 7 9.86
-0.8867 1 1.41 8 11.27
-0.8598 1 1.41 9 12.68
-0.7335 1 1.41 10 14.08
-0.7327 1 1.41 11 15.49
-0.7321 1 1.41 12 16.9
-0.7266 1 1.41 13 18.31
-0.6903 1 1.41 14 19.72
-0.6788 1 1.41 15 21.13
-0.6647 1 1.41 16 22.54
-0.6346 1 1.41 17 23.94
-0.5943 1 1.41 18 25.35
-0.5399 1 1.41 19 26.76
-0.519 1 1.41 20 28.17
-0.5013 1 1.41 21 29.58
-0.427 1 1.41 22 30.99
-0.4238 1 1.41 23 32.39
-0.4135 1 1.41 24 33.8
-0.411 1 1.41 25 35.21
-0.3765 1 1.41 26 36.62
-0.2412 1 1.41 27 38.03
-0.2204 1 1.41 28 39.44
-0.1523 1 1.41 29 40.85
-0.1156 1 1.41 30 42.25
-0.1072 1 1.41 31 43.66
-0.1018 1 1.41 32 45.07
-0.0858 1 1.41 33 46.48
-0.0734 1 1.41 34 47.89
-0.0145 1 1.41 35 49.3
-0.0121 1 1.41 36 50.7
-0.0079 1 1.41 37 52.11
0.0141 1 1.41 38 53.52
0.0795 1 1.41 39 54.93
0.1086 1 1.41 40 56.34
0.1386 1 1.41 41 57.75
0.1465 1 1.41 42 59.15
0.1472 1 1.41 43 60.56
0.1545 1 1.41 44 61.97
0.2226 1 1.41 45 63.38
0.2306 1 1.41 46 64.79
0.2623 1 1.41 47 66.2
0.2644 1 1.41 48 67.61
0.2706 1 1.41 49 69.01
0.2946 1 1.41 50 70.42
0.3479 1 1.41 51 71.83
0.3584 1 1.41 52 73.24
0.3657 1 1.41 53 74.65
0.4343 1 1.41 54 76.06
0.4469 1 1.41 55 77.46
0.475 1 1.41 56 78.87
0.495 1 1.41 57 80.28
0.515 1 1.41 58 81.69
0.5492 1 1.41 59 83.1
0.562 1 1.41 60 84.51
0.7407 1 1.41 61 85.92
0.7527 1 1.41 62 87.32
0.7548 1 1.41 63 88.73
0.7624 1 1.41 64 90.14
0.7959 1 1.41 65 91.55
0.8133 1 1.41 66 92.96
0.8181 1 1.41 67 94.37
0.8266 1 1.41 68 95.77
0.9507 1 1.41 69 97.18
1.4172 1 1.41 70 98.59
1.4174 1 1.41 71 100

 

resid01_lag1 Frequency Percent Cumulative Cumulative
Frequency Percent
. 1 1.41 1 1.41
-2.34387 1 1.41 2 2.82
-1.92675 1 1.41 3 4.23
-1.66221 1 1.41 4 5.63
-1.66076 1 1.41 5 7.04
-1.51416 1 1.41 6 8.45
-1.31411 1 1.41 7 9.86
-1.30036 1 1.41 8 11.27
-1.17059 1 1.41 9 12.68
-1.16493 1 1.41 10 14.08
-1.0268 1 1.41 11 15.49
-0.98757 1 1.41 12 16.9
-0.79868 1 1.41 13 18.31
-0.78508 1 1.41 14 19.72
-0.74925 1 1.41 15 21.13
-0.74815 1 1.41 16 22.54
-0.73537 1 1.41 17 23.94
-0.73498 1 1.41 18 25.35
-0.72178 1 1.41 19 26.76
-0.67466 1 1.41 20 28.17
-0.66885 1 1.41 21 29.58
-0.59144 1 1.41 22 30.99
-0.56058 1 1.41 23 32.39
-0.49143 1 1.41 24 33.8
-0.45108 1 1.41 25 35.21
-0.41997 1 1.41 26 36.62
-0.378 1 1.41 27 38.03
-0.30964 1 1.41 28 39.44
-0.2536 1 1.41 29 40.85
-0.18538 1 1.41 30 42.25
-0.13725 1 1.41 31 43.66
-0.0802 1 1.41 32 45.07
-0.07967 1 1.41 33 46.48
-0.07443 1 1.41 34 47.89
-0.02852 1 1.41 35 49.3
0.022919 1 1.41 36 50.7
0.032978 1 1.41 37 52.11
0.062794 1 1.41 38 53.52
0.11275 1 1.41 39 54.93
0.163709 1 1.41 40 56.34
0.183303 1 1.41 41 57.75
0.198967 1 1.41 42 59.15
0.22923 1 1.41 43 60.56
0.263706 1 1.41 44 61.97
0.279418 1 1.41 45 63.38
0.377233 1 1.41 46 64.79
0.40091 1 1.41 47 66.2
0.434839 1 1.41 48 67.61
0.458864 1 1.41 49 69.01
0.475435 1 1.41 50 70.42
0.57037 1 1.41 51 71.83
0.621019 1 1.41 52 73.24
0.657644 1 1.41 53 74.65
0.676903 1 1.41 54 76.06
0.699543 1 1.41 55 77.46
0.706565 1 1.41 56 78.87
0.72955 1 1.41 57 80.28
0.899265 1 1.41 58 81.69
1.073671 1 1.41 59 83.1
1.098557 1 1.41 60 84.51
1.145171 1 1.41 61 85.92
1.157797 1 1.41 62 87.32
1.248145 1 1.41 63 88.73
1.320147 1 1.41 64 90.14
1.386017 1 1.41 65 91.55
1.408262 1 1.41 66 92.96
1.497769 1 1.41 67 94.37
1.504349 1 1.41 68 95.77
1.526251 1 1.41 69 97.18
1.817184 1 1.41 70 98.59
1.91372 1 1.41 71 100
         
DPSDEPPC Frequency Percent Cumulative Cumulative
Frequency Percent
. 1 1.41 1 1.41
-1.81 1 1.41 2 2.82
-1.5 1 1.41 3 4.23
-1.184 1 1.41 4 5.63
-1.136 1 1.41 5 7.04
-1.001 1 1.41 6 8.45
-0.9896 1 1.41 7 9.86
-0.928 1 1.41 8 11.27
-0.8819 1 1.41 9 12.68
-0.8656 1 1.41 10 14.08
-0.8478 1 1.41 11 15.49
-0.7794 1 1.41 12 16.9
-0.6462 1 1.41 13 18.31
-0.6338 1 1.41 14 19.72
-0.6045 1 1.41 15 21.13
-0.5168 1 1.41 16 22.54
-0.5144 1 1.41 17 23.94
-0.4742 1 1.41 18 25.35
-0.4275 1 1.41 19 26.76
-0.4263 1 1.41 20 28.17
-0.409 1 1.41 21 29.58
-0.4078 1 1.41 22 30.99
-0.3606 1 1.41 23 32.39
-0.3447 1 1.41 24 33.8
-0.3137 1 1.41 25 35.21
-0.3107 1 1.41 26 36.62
-0.2972 1 1.41 27 38.03
-0.2877 1 1.41 28 39.44
-0.2181 1 1.41 29 40.85
-0.1146 1 1.41 30 42.25
-0.0914 1 1.41 31 43.66
-0.0889 1 1.41 32 45.07
-0.0753 1 1.41 33 46.48
-0.065 1 1.41 34 47.89
-0.0635 1 1.41 35 49.3
-0.0625 1 1.41 36 50.7
-0.054 1 1.41 37 52.11
-0.0434 1 1.41 38 53.52
-0.0358 1 1.41 39 54.93
-0.0291 1 1.41 40 56.34
-0.0028 1 1.41 41 57.75
0.0001 1 1.41 42 59.15
0.0226 1 1.41 43 60.56
0.0377 1 1.41 44 61.97
0.0433 1 1.41 45 63.38
0.0491 1 1.41 46 64.79
0.0906 1 1.41 47 66.2
0.1026 1 1.41 48 67.61
0.1664 1 1.41 49 69.01
0.1821 1 1.41 50 70.42
0.2077 1 1.41 51 71.83
0.2491 1 1.41 52 73.24
0.2759 1 1.41 53 74.65
0.3911 1 1.41 54 76.06
0.402 1 1.41 55 77.46
0.4138 1 1.41 56 78.87
0.4245 1 1.41 57 80.28
0.4455 1 1.41 58 81.69
0.4659 1 1.41 59 83.1
0.5259 1 1.41 60 84.51
0.5788 1 1.41 61 85.92
0.7056 1 1.41 62 87.32
0.7422 1 1.41 63 88.73
0.7596 1 1.41 64 90.14
0.8395 1 1.41 65 91.55
0.8925 1 1.41 66 92.96
0.9107 1 1.41 67 94.37
0.9144 1 1.41 68 95.77
1.037 1 1.41 69 97.18
1.5387 1 1.41 70 98.59
1.7481 1 1.41 71 100
         
D1 Frequency Percent Cumulative Cumulative
Frequency Percent
0 53 74.65 53 74.65
1 18 25.35 71 100
         
D2 Frequency Percent Cumulative Cumulative
Frequency Percent
0 53 74.65 53 74.65
1 18 25.35 71 100
         
D3 Frequency Percent Cumulative Cumulative
Frequency Percent
0 53 74.65 53 74.65
1 18 25.35 71 100
Reeza
Super User
Your output shows at least 2 missing values, DPSDEPPC and one for resid01_lag1 ...and D1/D2/D3 seem to have identical distribution which is something I'd check into. Can you show your full log from the PHREG output as well?
Toni2
Lapis Lazuli | Level 10

hi thanks, the missing observations is due to the lags and differencing in the variables, i believe. Please see below the log

 

 

61 data test1;
SYMBOLGEN: Macro variable TD10 resolves to shortrun_dynamic
62 set &td10;
63 keep date &var6 &var17 &var8 D1 D2 D3;
SYMBOLGEN: Macro variable VAR6 resolves to DDEPPC
SYMBOLGEN: Macro variable VAR17 resolves to resid01_lag1
SYMBOLGEN: Macro variable VAR8 resolves to DPSDEPPC
64 run;

NOTE: There were 71 observations read from the data set WORK.SHORTRUN_DYNAMIC.
NOTE: The data set WORK.TEST1 has 71 observations and 7 variables.
NOTE: Compressing data set WORK.TEST1 increased size by 100.00 percent.
Compressed is 2 pages; un-compressed would require 1 pages.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

65
66
67 proc phreg data=test1;
SYMBOLGEN: Macro variable VAR6 resolves to DDEPPC
68 model &var6 = &var17 &var8 D1 D2 D3/ selection=stepwise slentry=0.05 slstay=0.05 details rl;
SYMBOLGEN: Macro variable VAR17 resolves to resid01_lag1
SYMBOLGEN: Macro variable VAR8 resolves to DPSDEPPC
69 output out=resOut resmart=resmart;
70 run;

NOTE: 38 observations were deleted due either to missing or invalid values for the time, censoring, frequency or explanatory
variables or to invalid operations in generating the values for some of the explanatory variables.
NOTE: The data set WORK.RESOUT has 71 observations and 8 variables.
NOTE: Compressing data set WORK.RESOUT increased size by 100.00 percent.
Compressed is 2 pages; un-compressed would require 1 pages.
NOTE: PROCEDURE PHREG used (Total process time):
real time 0.10 seconds
cpu time 0.03 seconds

71
72 proc print data=resOut; where resmart is missing; run;

NOTE: There were 37 observations read from the data set WORK.RESOUT.
WHERE resmart is null;
NOTE: At least one W.D format was too small for the number to be printed. The decimal may be shifted by the "BEST" format.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds

73
74
75

Reeza
Super User
Look at the RESOUT data set to see which records have a missing value and those are the ones that are excluded. Visual examination should give you a clue as to what is going on.

Note that you have no censor variable indicated and D1-D3 should be in the CLASS statement. I would have expected something closer to what's below.

proc phreg data=test1;
class d1 d2 d3 / param=REF;
model &var6*censorVariable(censoredLevel) = &var17 &var8 D1 D2 D3/ selection=stepwise slentry=0.05 slstay=0.05 details rl;
output out=resOut resmart=resmart;
run;
Toni2
Lapis Lazuli | Level 10

thanks, i had a look in the RESOUT data set and the only this that looks strange is for &var6 all the 38 observations are negative. Does it mean anything ? 

ballardw
Super User

How many records had missing values for one or more of the variables on your model statement?

When there is no value for one of the variables the entire record is discarded from the model because the information is incomplete, the interaction between all of the variables cannot be modeled and is the most common cause of the record being dropped from the model.

Toni2
Lapis Lazuli | Level 10

there are 38 lines which have not been included in the calculation. However, there are no missing values (please see the output from the proc freq above) 

ballardw
Super User

What sort of unit is the variable DDEPPC supposed to be? Have you counted how many negative values you have for that variable? Does it happen to be 37 with the missing values of those other two variables on a record where DDEPPC is positive?

 

I don't think Phreg expects negative time values. In details of the documentation for Phreg in Failure Time Distribution it starts with the following. The modeled variable is that T

Let T be a nonnegative random variable representing the failure time of an individual from a homogeneous population.

Toni2
Lapis Lazuli | Level 10

i think you are right. After some examination of the missing observations i found that all the 37 are negative values of the DDEPPC variable. I also run the Phreg with a non-negative variable and it seems that worked. Do we have any workaround for this? 

 

Reeza
Super User
How can you have negative times.....
Toni2
Lapis Lazuli | Level 10

The DDEPPC is the difference for another variable and for this reason has negative values

Reeza
Super User
Ok, but conceptually you understand how negative time isn't reasonable?
How you deal with it depends on the context of your data, one option is to rescale everything so that your lowest minus is now 0 and everything is relative from there.
Toni2
Lapis Lazuli | Level 10

yes, actually, it is not negative time. It is a reduction of the variable on between 2 periods.

 

I think, it is difficult for the purpose of my work to transform the data....

 

Thank you for your time and support 🙂

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 4251 views
  • 3 likes
  • 3 in conversation