Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- SAS Procedures
- /
- proc logistic where all dependent variable observations have same resp...

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 08-05-2018 10:46 AM
(2359 views)

All

I am trying to verify Microsoft R Client output from a logistic regression model with SAS. The dependent variable (yBinom2) has all values == 0 intentionally (and is realistic my area of work -- e.g., all technologists do not see red blood cells in a normal urine sample). Simulated data are below and attached (delimiter = "|"). When I run the regression model in R, estimation completes (code and output below for logit link function). When run in SAS with the logit link function, I receive the error message "All observations have the same response."

I am most interested in the predicted probabilities for each sample id. Does anyone know why the SAS solution will not estimate? Are there options in SAS to handle the scenario where the dependent variable has all 0's or 1's? Thank you in advance.

**simulated data set:**

sampleNo | y | x | yBinom1 | yBinom2 | yBinom3 |

1 | 6.68 | 6.92 | 1 | 0 | 1 |

2 | 6.75 | 6.66 | 0 | 0 | 1 |

3 | 6.85 | 6.93 | 1 | 0 | 1 |

4 | 6.86 | 6.98 | 1 | 0 | 1 |

5 | 6.67 | 6.95 | 1 | 0 | 1 |

6 | 6.96 | 6.69 | 1 | 0 | 1 |

7 | 6.91 | 6.55 | 0 | 0 | 1 |

8 | 6.82 | 6.87 | 1 | 0 | 1 |

9 | 6.55 | 6.81 | 0 | 0 | 1 |

10 | 6.87 | 6.75 | 1 | 0 | 1 |

11 | 6.52 | 6.94 | 0 | 0 | 1 |

12 | 6.59 | 6.79 | 0 | 0 | 1 |

13 | 6.6 | 6.87 | 1 | 0 | 1 |

14 | 6.56 | 6.68 | 0 | 0 | 1 |

15 | 6.65 | 6.53 | 1 | 0 | 1 |

16 | 6.68 | 6.88 | 0 | 0 | 1 |

17 | 6.64 | 6.91 | 0 | 0 | 1 |

18 | 6.96 | 6.99 | 0 | 0 | 1 |

19 | 6.9 | 6.83 | 1 | 0 | 1 |

20 | 6.6 | 6.91 | 0 | 0 | 1 |

**R code:**

A <- read.csv("C:/Users/BodnarJ/Desktop/functionalRequirement4_9_X/dataSim_FR_4_9_X.csv", sep = "|", header = TRUE, colClasses="character")

for(j in 2:ncol(A)){ A[,j] <- as.numeric(A[,j]) }

######################################################################

######################################################################

# binomial logistic regression model

vv <- A

logitMod <- glm( yBinom2 ~ x , data=vv , family=binomial(link="logit"))

predicted <- plogis(predict(logitMod, vv)) # predicted scores

vv$prob <- predicted

vv$probFlag <- ifelse(vv$prob > 0.5 , 1 , 0)

vv$resid <- logitMod$residuals

print( vv , row.names = F)

**R Output:**

sampleNo y x yBinom1 yBinom2 yBinom3 prob probFlag resid

1 6.68 6.92 1 0 1 7.884924e-12 0 -1

2 6.75 6.66 0 0 1 7.884924e-12 0 -1

3 6.85 6.93 1 0 1 7.884924e-12 0 -1

4 6.86 6.98 1 0 1 7.884924e-12 0 -1

5 6.67 6.95 1 0 1 7.884924e-12 0 -1

6 6.96 6.69 1 0 1 7.884924e-12 0 -1

7 6.91 6.55 0 0 1 7.884924e-12 0 -1

8 6.82 6.87 1 0 1 7.884924e-12 0 -1

9 6.55 6.81 0 0 1 7.884924e-12 0 -1

10 6.87 6.75 1 0 1 7.884924e-12 0 -1

11 6.52 6.94 0 0 1 7.884924e-12 0 -1

12 6.59 6.79 0 0 1 7.884924e-12 0 -1

13 6.60 6.87 1 0 1 7.884924e-12 0 -1

14 6.56 6.68 0 0 1 7.884924e-12 0 -1

15 6.65 6.53 1 0 1 7.884924e-12 0 -1

16 6.68 6.88 0 0 1 7.884924e-12 0 -1

17 6.64 6.91 0 0 1 7.884924e-12 0 -1

18 6.96 6.99 0 0 1 7.884924e-12 0 -1

19 6.90 6.83 1 0 1 7.884924e-12 0 -1

20 6.60 6.91 0 0 1 7.884924e-12 0 -1

**SAS Code:**

proc import datafile="C:\Users\BodnarJ\Desktop\functionalRequirement4_9_X\dataSim_FR_4_9_X.csv" dbms=csv out=work.anaDS replace;

delimiter="|";

getnames=yes;

guessingrows=7000;

run;

proc logistic data = anaDS ;

model yBinom2 = x / LINK = logit;

run;

**SAS Log:**

748 proc import datafile="C:\Users\BodnarJ\Desktop\functionalRequirement4_9_X\dataSim_FR_4_9_X.csv"

748! dbms=csv out=work.anaDS replace;

749 delimiter="|";

750 getnames=yes;

751 guessingrows=7000;

752 run;

753 /**********************************************************************

754 * PRODUCT: SAS

755 * VERSION: 9.4

756 * CREATOR: External File Interface

757 * DATE: 05AUG18

758 * DESC: Generated SAS Datastep Code

759 * TEMPLATE SOURCE: (None Specified.)

760 ***********************************************************************/

761 data WORK.ANADS ;

762 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */

763 infile 'C:\Users\BodnarJ\Desktop\functionalRequirement4_9_X\dataSim_FR_4_9_X.csv' delimiter =

763! '|' MISSOVER DSD lrecl=32767 firstobs=2 ;

764 informat sampleNo best32. ;

765 informat y best32. ;

766 informat x best32. ;

767 informat yBinom1 best32. ;

768 informat yBinom2 best32. ;

769 informat yBinom3 best32. ;

770 format sampleNo best12. ;

771 format y best12. ;

772 format x best12. ;

773 format yBinom1 best12. ;

774 format yBinom2 best12. ;

775 format yBinom3 best12. ;

776 input

777 sampleNo

778 y

779 x

780 yBinom1

781 yBinom2

782 yBinom3

783 ;

784 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */

785 run;

NOTE: The infile 'C:\Users\BodnarJ\Desktop\functionalRequirement4_9_X\dataSim_FR_4_9_X.csv' is:

Filename=C:\Users\BodnarJ\Desktop\functionalRequirement4_9_X\dataSim_FR_4_9_X.csv,

RECFM=V,LRECL=32767,File Size (bytes)=438,

Last Modified=05Aug2018:08:39:09,

Create Time=05Aug2018:08:26:59

NOTE: 20 records were read from the infile

'C:\Users\BodnarJ\Desktop\functionalRequirement4_9_X\dataSim_FR_4_9_X.csv'.

The minimum record length was 17.

The maximum record length was 18.

NOTE: The data set WORK.ANADS has 20 observations and 6 variables.

NOTE: DATA statement used (Total process time):

real time 0.05 seconds

cpu time 0.03 seconds

20 rows created in WORK.ANADS from

C:\Users\BodnarJ\Desktop\functionalRequirement4_9_X\dataSim_FR_4_9_X.csv.

NOTE: WORK.ANADS data set was successfully created.

NOTE: The data set WORK.ANADS has 20 observations and 6 variables.

NOTE: PROCEDURE IMPORT used (Total process time):

real time 0.13 seconds

cpu time 0.06 seconds

786

787 proc logistic data = anaDS ;

788 model yBinom2 = x / LINK = logit;

789 run;

ERROR: All observations have the same response. No statistics are computed.

NOTE: The SAS System stopped processing this step because of errors.

NOTE: There were 20 observations read from the data set WORK.ANADS.

NOTE: PROCEDURE LOGISTIC used (Total process time):

real time 0.02 seconds

cpu time 0.01 seconds

9 REPLIES 9

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Seems to me you are asking the wrong people, SAS is giving the correct answer. You need to ask the R gurus how R can fit a model in the situation where all values of the dependent variable are constant.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You could estimate proportion confidence limits for the intercept only model, as done with proc freq. But for models involving explanatory variables, any estimate you may get will depend heavily on pretty strong assumptions. Proc glimmix will give you estimates, but check yow they depend on convergence criteria:

```
data test;
do x = 1 to 20;
event = 0;
output;
end;
run;
/* decent estimates */
proc freq data=test;
table event / binomial(cl=exact);
run;
/* Frivolous estimates */
proc glimmix data=test;
model event = / dist=binomial link=logit s cl;
run;
proc glimmix data=test;
model event = / dist=binomial link=logit s cl;
nloptions absgconv=0.000001;
run;
```

PG

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Illustration of a possible scenario with all events=0

```
data test;
mu = 500;
sigma = 75;
do x = 1 to 2000;
p = logistic( (x-mu)/sigma );
if x <= 20 then event = 0; else call missing(event);
output;
end;
label p="true probability" ;
keep x p event;
run;
proc freq data=test;
table event / binomial(cl=exact);
ods output BinomialCLs=testCL;
run;
data testgraph;
if _n_=1 then set testCL;
set test;
if event = 0 then do;
lowerLimit = 1-upperCL;
upperLimit = 1-lowerCL;
end;
keep x p event lowerLimit upperLimit;
run;
ods listing style=journal;
proc sgplot data=testgraph;
band x=x lower=lowerLimit upper=upperLimit / legendlabel="95% confidence band";
series x=x y=p;
scatter x=x y=event;
xaxis type=log;
yaxis label="event probability";
run;
```

PG

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I don't think you can build a statistically valid model with all responses 1 or 0, simply because it means you don't need to predict anything. If you predict that all are 1 or 0 then you're good to go, why bother with a model at all?

@stat12000 wrote:

All

I am trying to verify Microsoft R Client output from a logistic regression model with SAS. The dependent variable (yBinom2) has all values == 0 intentionally (and is realistic my area of work -- e.g., all technologists do not see red blood cells in a normal urine sample). Simulated data are below and attached (delimiter = "|"). When I run the regression model in R, estimation completes (code and output below for logit link function). When run in SAS with the logit link function, I receive the error message "All observations have the same response."

I am most interested in the predicted probabilities for each sample id. Does anyone know why the SAS solution will not estimate? Are there options in SAS to handle the scenario where the dependent variable has all 0's or 1's? Thank you in advance.

simulated data set:

sampleNo y x yBinom1 yBinom2 yBinom3 1 6.68 6.92 1 0 1 2 6.75 6.66 0 0 1 3 6.85 6.93 1 0 1 4 6.86 6.98 1 0 1 5 6.67 6.95 1 0 1 6 6.96 6.69 1 0 1 7 6.91 6.55 0 0 1 8 6.82 6.87 1 0 1 9 6.55 6.81 0 0 1 10 6.87 6.75 1 0 1 11 6.52 6.94 0 0 1 12 6.59 6.79 0 0 1 13 6.6 6.87 1 0 1 14 6.56 6.68 0 0 1 15 6.65 6.53 1 0 1 16 6.68 6.88 0 0 1 17 6.64 6.91 0 0 1 18 6.96 6.99 0 0 1 19 6.9 6.83 1 0 1 20 6.6 6.91 0 0 1

R code:

A <- read.csv("C:/Users/BodnarJ/Desktop/functionalRequirement4_9_X/dataSim_FR_4_9_X.csv", sep = "|", header = TRUE, colClasses="character")

for(j in 2:ncol(A)){ A[,j] <- as.numeric(A[,j]) }

######################################################################

######################################################################

# binomial logistic regression model

vv <- A

logitMod <- glm( yBinom2 ~ x , data=vv , family=binomial(link="logit"))

predicted <- plogis(predict(logitMod, vv)) # predicted scores

vv$prob <- predicted

vv$probFlag <- ifelse(vv$prob > 0.5 , 1 , 0)

vv$resid <- logitMod$residualsprint( vv , row.names = F)

R Output:

sampleNo y x yBinom1 yBinom2 yBinom3 prob probFlag resid

1 6.68 6.92 1 0 1 7.884924e-12 0 -1

2 6.75 6.66 0 0 1 7.884924e-12 0 -1

3 6.85 6.93 1 0 1 7.884924e-12 0 -1

4 6.86 6.98 1 0 1 7.884924e-12 0 -1

5 6.67 6.95 1 0 1 7.884924e-12 0 -1

6 6.96 6.69 1 0 1 7.884924e-12 0 -1

7 6.91 6.55 0 0 1 7.884924e-12 0 -1

8 6.82 6.87 1 0 1 7.884924e-12 0 -1

9 6.55 6.81 0 0 1 7.884924e-12 0 -1

10 6.87 6.75 1 0 1 7.884924e-12 0 -1

11 6.52 6.94 0 0 1 7.884924e-12 0 -1

12 6.59 6.79 0 0 1 7.884924e-12 0 -1

13 6.60 6.87 1 0 1 7.884924e-12 0 -1

14 6.56 6.68 0 0 1 7.884924e-12 0 -1

15 6.65 6.53 1 0 1 7.884924e-12 0 -1

16 6.68 6.88 0 0 1 7.884924e-12 0 -1

17 6.64 6.91 0 0 1 7.884924e-12 0 -1

18 6.96 6.99 0 0 1 7.884924e-12 0 -1

19 6.90 6.83 1 0 1 7.884924e-12 0 -1

20 6.60 6.91 0 0 1 7.884924e-12 0 -1

SAS Code:

proc import datafile="C:\Users\BodnarJ\Desktop\functionalRequirement4_9_X\dataSim_FR_4_9_X.csv" dbms=csv out=work.anaDS replace;

delimiter="|";

getnames=yes;

guessingrows=7000;

run;

proc logistic data = anaDS ;

model yBinom2 = x / LINK = logit;

run;

SAS Log:

748 proc import datafile="C:\Users\BodnarJ\Desktop\functionalRequirement4_9_X\dataSim_FR_4_9_X.csv"

748! dbms=csv out=work.anaDS replace;

749 delimiter="|";

750 getnames=yes;

751 guessingrows=7000;

752 run;753 /**********************************************************************

754 * PRODUCT: SAS

755 * VERSION: 9.4

756 * CREATOR: External File Interface

757 * DATE: 05AUG18

758 * DESC: Generated SAS Datastep Code

759 * TEMPLATE SOURCE: (None Specified.)

760 ***********************************************************************/

761 data WORK.ANADS ;

762 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */

763 infile 'C:\Users\BodnarJ\Desktop\functionalRequirement4_9_X\dataSim_FR_4_9_X.csv' delimiter =

763! '|' MISSOVER DSD lrecl=32767 firstobs=2 ;

764 informat sampleNo best32. ;

765 informat y best32. ;

766 informat x best32. ;

767 informat yBinom1 best32. ;

768 informat yBinom2 best32. ;

769 informat yBinom3 best32. ;

770 format sampleNo best12. ;

771 format y best12. ;

772 format x best12. ;

773 format yBinom1 best12. ;

774 format yBinom2 best12. ;

775 format yBinom3 best12. ;

776 input

777 sampleNo

778 y

779 x

780 yBinom1

781 yBinom2

782 yBinom3

783 ;

784 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */

785 run;NOTE: The infile 'C:\Users\BodnarJ\Desktop\functionalRequirement4_9_X\dataSim_FR_4_9_X.csv' is:

Filename=C:\Users\BodnarJ\Desktop\functionalRequirement4_9_X\dataSim_FR_4_9_X.csv,

RECFM=V,LRECL=32767,File Size (bytes)=438,

Last Modified=05Aug2018:08:39:09,

Create Time=05Aug2018:08:26:59NOTE: 20 records were read from the infile

'C:\Users\BodnarJ\Desktop\functionalRequirement4_9_X\dataSim_FR_4_9_X.csv'.

The minimum record length was 17.

The maximum record length was 18.

NOTE: The data set WORK.ANADS has 20 observations and 6 variables.

NOTE: DATA statement used (Total process time):

real time 0.05 seconds

cpu time 0.03 seconds

20 rows created in WORK.ANADS from

C:\Users\BodnarJ\Desktop\functionalRequirement4_9_X\dataSim_FR_4_9_X.csv.

NOTE: WORK.ANADS data set was successfully created.

NOTE: The data set WORK.ANADS has 20 observations and 6 variables.

NOTE: PROCEDURE IMPORT used (Total process time):

real time 0.13 seconds

cpu time 0.06 seconds

786

787 proc logistic data = anaDS ;

788 model yBinom2 = x / LINK = logit;

789 run;ERROR: All observations have the same response. No statistics are computed.

NOTE: The SAS System stopped processing this step because of errors.

NOTE: There were 20 observations read from the data set WORK.ANADS.

NOTE: PROCEDURE LOGISTIC used (Total process time):

real time 0.02 seconds

cpu time 0.01 seconds

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

So the larger piece of the puzzle is that I am building this type of model that will loop through say 30 parameters. Some of these parameters have all response values = 0, some have all response values = 1, and some have all responses values with a mixture of 0's and 1's. All of these data patterns are clinically expected.

Our other statistician tried the SAS proc logistic path to verify my results using R and her code crashed because of the all 0's and all 1's. Mine did not crash.

So R does this estimation (somehow), and the probabilities are pretty similar to PROC GLIMMIX which is giving probabilities around 4e-8. R is giving probabilities about 7e-12. So clearly these probabilities allow the same conclusion to be reached.

Thank you for your expertise.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

So R does this estimation (

somehow), and the probabilities are pretty similar to PROC GLIMMIX which is giving probabilities around4e-8. R is giving probabilities about7e-12. So clearly these probabilities allow the same conclusion to be reached.

Those are 0.

Are you confident enough in the 'somehow' when you have to explain it to someone else is really all that matters. I would also expect any of those variables to be excluded (or fall out with a selection algorithm) from a final model when a full model is fit.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Exclusion of such parameters is not acceptable by the FDA. For such models we are essentially trying to demonstrate that probabilities are high when expected and low also when expected. I clearly understand that the true probability is zero, but a predicted probability cannot be zero; likewise for the predicted probability asymptote for all 1's. I equate this with computation of the statistical power of an effect size. Such a probability has range space 0 < power < 1, exclusive of 0 and 1.

Another need for this type of analysis pertains to the concept of analyte carryover where you want to show the likelihood of detecting elements (RBCs, WBCs, etc.) in high concentration (abnormal) samples is very high, and then the likelihood of detecting elements (RBCs, WBCs, etc.) in low concentration (normal) samples is very low.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@stat12000 wrote:

Exclusion of such parameters is not acceptable by the FDA. For such models we are essentially trying to demonstrate that probabilities are high when expected and low also when expected. I clearly understand that the true probability is zero, but a predicted probability cannot be zero; likewise for the predicted probability asymptote for all 1's. I equate this with computation of the statistical power of an effect size. Such a probability has range space 0 < power < 1, exclusive of 0 and 1.

While I have no experience with the FDA, let me say that modeling does not always lead to truth. The truth is, if your data is all zeros, then the probability of zero is 1, regardless of the fact that your modeling method doesn't get that number. In essense, when your data is all zeros, you have the wrong modeling method.

I clearly understand that the true probability is zero, but a predicted probability cannot be zero; likewise for the predicted probability asymptote for all 1's. I equate this with computation of the statistical power of an effect size. Such a probability has range space 0 < power < 1, exclusive of 0 and 1.

Your modeling method fails when there are all zeros in the Y variable. So don't use it.

--

Paige Miller

Paige Miller

**SAS Innovate 2025** is scheduled for May 6-9 in Orlando, FL. Sign up to be **first to learn** about the agenda and registration!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Ready to level-up your skills? Choose your own adventure.