BookmarkSubscribeRSS Feed
stat12000
Calcite | Level 5

All

 

I am trying to verify Microsoft R Client output from a logistic regression model with SAS. The dependent variable (yBinom2) has all values == 0 intentionally (and is realistic my area of work -- e.g., all technologists do not see red blood cells in a normal urine sample). Simulated data are below and attached (delimiter = "|"). When I run the regression model in R, estimation completes (code and output below for logit link function). When run in SAS with the logit link function, I receive the error message "All observations have the same response." 

 

I am most interested in the predicted probabilities for each sample id. Does anyone know why the SAS solution will not estimate? Are there options in SAS to handle the scenario where the dependent variable has all 0's or 1's? Thank you in advance.

 

simulated data set:

 

sampleNoyxyBinom1yBinom2yBinom3
16.686.92101
26.756.66001
36.856.93101
46.866.98101
56.676.95101
66.966.69101
76.916.55001
86.826.87101
96.556.81001
106.876.75101
116.526.94001
126.596.79001
136.66.87101
146.566.68001
156.656.53101
166.686.88001
176.646.91001
186.966.99001
196.96.83101
206.66.91001

 

R code:

 

A <- read.csv("C:/Users/BodnarJ/Desktop/functionalRequirement4_9_X/dataSim_FR_4_9_X.csv", sep = "|",  header = TRUE, colClasses="character")

 

for(j in 2:ncol(A)){ A[,j] <- as.numeric(A[,j]) }

 

######################################################################
######################################################################
# binomial logistic regression model

 

vv <- A
logitMod <- glm( yBinom2 ~ x , data=vv , family=binomial(link="logit"))
predicted <- plogis(predict(logitMod, vv)) # predicted scores
vv$prob <- predicted
vv$probFlag <- ifelse(vv$prob > 0.5 , 1 , 0)
vv$resid <- logitMod$residuals

print( vv , row.names = F)

 

R Output:

 

sampleNo y x yBinom1 yBinom2 yBinom3 prob probFlag resid

1 6.68 6.92 1 0 1 7.884924e-12 0 -1
2 6.75 6.66 0 0 1 7.884924e-12 0 -1
3 6.85 6.93 1 0 1 7.884924e-12 0 -1
4 6.86 6.98 1 0 1 7.884924e-12 0 -1
5 6.67 6.95 1 0 1 7.884924e-12 0 -1
6 6.96 6.69 1 0 1 7.884924e-12 0 -1
7 6.91 6.55 0 0 1 7.884924e-12 0 -1
8 6.82 6.87 1 0 1 7.884924e-12 0 -1
9 6.55 6.81 0 0 1 7.884924e-12 0 -1
10 6.87 6.75 1 0 1 7.884924e-12 0 -1
11 6.52 6.94 0 0 1 7.884924e-12 0 -1
12 6.59 6.79 0 0 1 7.884924e-12 0 -1
13 6.60 6.87 1 0 1 7.884924e-12 0 -1
14 6.56 6.68 0 0 1 7.884924e-12 0 -1
15 6.65 6.53 1 0 1 7.884924e-12 0 -1
16 6.68 6.88 0 0 1 7.884924e-12 0 -1
17 6.64 6.91 0 0 1 7.884924e-12 0 -1
18 6.96 6.99 0 0 1 7.884924e-12 0 -1
19 6.90 6.83 1 0 1 7.884924e-12 0 -1
20 6.60 6.91 0 0 1 7.884924e-12 0 -1

 

SAS Code:

 

proc import datafile="C:\Users\BodnarJ\Desktop\functionalRequirement4_9_X\dataSim_FR_4_9_X.csv" dbms=csv out=work.anaDS replace;

delimiter="|";
getnames=yes;
guessingrows=7000;
run;

 

proc logistic data = anaDS ;
model yBinom2 = x / LINK = logit;
run;

 

SAS Log:

 

748 proc import datafile="C:\Users\BodnarJ\Desktop\functionalRequirement4_9_X\dataSim_FR_4_9_X.csv"
748! dbms=csv out=work.anaDS replace;
749 delimiter="|";
750 getnames=yes;
751 guessingrows=7000;
752 run;

753 /**********************************************************************
754 * PRODUCT: SAS
755 * VERSION: 9.4
756 * CREATOR: External File Interface
757 * DATE: 05AUG18
758 * DESC: Generated SAS Datastep Code
759 * TEMPLATE SOURCE: (None Specified.)
760 ***********************************************************************/
761 data WORK.ANADS ;
762 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
763 infile 'C:\Users\BodnarJ\Desktop\functionalRequirement4_9_X\dataSim_FR_4_9_X.csv' delimiter =
763! '|' MISSOVER DSD lrecl=32767 firstobs=2 ;
764 informat sampleNo best32. ;
765 informat y best32. ;
766 informat x best32. ;
767 informat yBinom1 best32. ;
768 informat yBinom2 best32. ;
769 informat yBinom3 best32. ;
770 format sampleNo best12. ;
771 format y best12. ;
772 format x best12. ;
773 format yBinom1 best12. ;
774 format yBinom2 best12. ;
775 format yBinom3 best12. ;
776 input
777 sampleNo
778 y
779 x
780 yBinom1
781 yBinom2
782 yBinom3
783 ;
784 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
785 run;

NOTE: The infile 'C:\Users\BodnarJ\Desktop\functionalRequirement4_9_X\dataSim_FR_4_9_X.csv' is:
Filename=C:\Users\BodnarJ\Desktop\functionalRequirement4_9_X\dataSim_FR_4_9_X.csv,
RECFM=V,LRECL=32767,File Size (bytes)=438,
Last Modified=05Aug2018:08:39:09,
Create Time=05Aug2018:08:26:59

NOTE: 20 records were read from the infile
'C:\Users\BodnarJ\Desktop\functionalRequirement4_9_X\dataSim_FR_4_9_X.csv'.
The minimum record length was 17.
The maximum record length was 18.
NOTE: The data set WORK.ANADS has 20 observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time 0.05 seconds
cpu time 0.03 seconds


20 rows created in WORK.ANADS from
C:\Users\BodnarJ\Desktop\functionalRequirement4_9_X\dataSim_FR_4_9_X.csv.

 

NOTE: WORK.ANADS data set was successfully created.
NOTE: The data set WORK.ANADS has 20 observations and 6 variables.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.13 seconds
cpu time 0.06 seconds


786
787 proc logistic data = anaDS ;
788 model yBinom2 = x / LINK = logit;
789 run;

ERROR: All observations have the same response. No statistics are computed.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 20 observations read from the data set WORK.ANADS.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds

9 REPLIES 9
PaigeMiller
Diamond | Level 26

Seems to me you are asking the wrong people, SAS is giving the correct answer. You need to ask the R gurus how R can fit a model in the situation where all values of the dependent variable are constant.

--
Paige Miller
PGStats
Opal | Level 21

You could estimate proportion confidence limits for the intercept only model, as done with proc freq. But for models involving explanatory variables, any estimate you may get will depend heavily on pretty strong assumptions. Proc glimmix will give you estimates, but check yow they depend on convergence criteria:

 

data test;
do x = 1 to 20;
    event = 0;
    output;
    end;
run;

/* decent estimates */
proc freq data=test;
table event / binomial(cl=exact);
run;

/* Frivolous estimates */
proc glimmix data=test;
model event = / dist=binomial link=logit s cl;
run;

proc glimmix data=test;
model event = / dist=binomial link=logit s cl;
nloptions absgconv=0.000001;
run;
PG
PGStats
Opal | Level 21

Illustration of a possible scenario with all events=0

 

data test;
mu = 500;
sigma = 75;
do x = 1 to 2000;
    p = logistic( (x-mu)/sigma );
    if x <= 20 then event = 0; else call missing(event);
    output;
    end;
label p="true probability" ;
keep x p event;
run;

proc freq data=test;
table event / binomial(cl=exact);
ods output BinomialCLs=testCL;
run;

data testgraph;
if _n_=1 then set testCL;
set test;
if event = 0 then do;
    lowerLimit = 1-upperCL;
    upperLimit = 1-lowerCL;
    end;
keep x p event lowerLimit upperLimit;
run;

ods listing style=journal;
proc sgplot data=testgraph;
band x=x lower=lowerLimit upper=upperLimit / legendlabel="95% confidence band";
series x=x y=p;
scatter x=x y=event;
xaxis type=log;
yaxis label="event probability";
run;

SGPlot3.png

 

PG
Reeza
Super User

I don't think you can build a statistically valid model with all responses 1 or 0, simply because it means you don't need to predict anything. If you predict that all are 1 or 0 then you're good to go, why bother with a model at all?

 


@stat12000 wrote:

All

 

I am trying to verify Microsoft R Client output from a logistic regression model with SAS. The dependent variable (yBinom2) has all values == 0 intentionally (and is realistic my area of work -- e.g., all technologists do not see red blood cells in a normal urine sample). Simulated data are below and attached (delimiter = "|"). When I run the regression model in R, estimation completes (code and output below for logit link function). When run in SAS with the logit link function, I receive the error message "All observations have the same response." 

 

I am most interested in the predicted probabilities for each sample id. Does anyone know why the SAS solution will not estimate? Are there options in SAS to handle the scenario where the dependent variable has all 0's or 1's? Thank you in advance.

 

simulated data set:

 

sampleNo y x yBinom1 yBinom2 yBinom3
1 6.68 6.92 1 0 1
2 6.75 6.66 0 0 1
3 6.85 6.93 1 0 1
4 6.86 6.98 1 0 1
5 6.67 6.95 1 0 1
6 6.96 6.69 1 0 1
7 6.91 6.55 0 0 1
8 6.82 6.87 1 0 1
9 6.55 6.81 0 0 1
10 6.87 6.75 1 0 1
11 6.52 6.94 0 0 1
12 6.59 6.79 0 0 1
13 6.6 6.87 1 0 1
14 6.56 6.68 0 0 1
15 6.65 6.53 1 0 1
16 6.68 6.88 0 0 1
17 6.64 6.91 0 0 1
18 6.96 6.99 0 0 1
19 6.9 6.83 1 0 1
20 6.6 6.91 0 0 1

 

R code:

 

A <- read.csv("C:/Users/BodnarJ/Desktop/functionalRequirement4_9_X/dataSim_FR_4_9_X.csv", sep = "|",  header = TRUE, colClasses="character")

 

for(j in 2:ncol(A)){ A[,j] <- as.numeric(A[,j]) }

 

######################################################################
######################################################################
# binomial logistic regression model

 

vv <- A
logitMod <- glm( yBinom2 ~ x , data=vv , family=binomial(link="logit"))
predicted <- plogis(predict(logitMod, vv)) # predicted scores
vv$prob <- predicted
vv$probFlag <- ifelse(vv$prob > 0.5 , 1 , 0)
vv$resid <- logitMod$residuals

print( vv , row.names = F)

 

R Output:

 

sampleNo y x yBinom1 yBinom2 yBinom3 prob probFlag resid

1 6.68 6.92 1 0 1 7.884924e-12 0 -1
2 6.75 6.66 0 0 1 7.884924e-12 0 -1
3 6.85 6.93 1 0 1 7.884924e-12 0 -1
4 6.86 6.98 1 0 1 7.884924e-12 0 -1
5 6.67 6.95 1 0 1 7.884924e-12 0 -1
6 6.96 6.69 1 0 1 7.884924e-12 0 -1
7 6.91 6.55 0 0 1 7.884924e-12 0 -1
8 6.82 6.87 1 0 1 7.884924e-12 0 -1
9 6.55 6.81 0 0 1 7.884924e-12 0 -1
10 6.87 6.75 1 0 1 7.884924e-12 0 -1
11 6.52 6.94 0 0 1 7.884924e-12 0 -1
12 6.59 6.79 0 0 1 7.884924e-12 0 -1
13 6.60 6.87 1 0 1 7.884924e-12 0 -1
14 6.56 6.68 0 0 1 7.884924e-12 0 -1
15 6.65 6.53 1 0 1 7.884924e-12 0 -1
16 6.68 6.88 0 0 1 7.884924e-12 0 -1
17 6.64 6.91 0 0 1 7.884924e-12 0 -1
18 6.96 6.99 0 0 1 7.884924e-12 0 -1
19 6.90 6.83 1 0 1 7.884924e-12 0 -1
20 6.60 6.91 0 0 1 7.884924e-12 0 -1

 

SAS Code:

 

proc import datafile="C:\Users\BodnarJ\Desktop\functionalRequirement4_9_X\dataSim_FR_4_9_X.csv" dbms=csv out=work.anaDS replace;

delimiter="|";
getnames=yes;
guessingrows=7000;
run;

 

proc logistic data = anaDS ;
model yBinom2 = x / LINK = logit;
run;

 

SAS Log:

 

748 proc import datafile="C:\Users\BodnarJ\Desktop\functionalRequirement4_9_X\dataSim_FR_4_9_X.csv"
748! dbms=csv out=work.anaDS replace;
749 delimiter="|";
750 getnames=yes;
751 guessingrows=7000;
752 run;

753 /**********************************************************************
754 * PRODUCT: SAS
755 * VERSION: 9.4
756 * CREATOR: External File Interface
757 * DATE: 05AUG18
758 * DESC: Generated SAS Datastep Code
759 * TEMPLATE SOURCE: (None Specified.)
760 ***********************************************************************/
761 data WORK.ANADS ;
762 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
763 infile 'C:\Users\BodnarJ\Desktop\functionalRequirement4_9_X\dataSim_FR_4_9_X.csv' delimiter =
763! '|' MISSOVER DSD lrecl=32767 firstobs=2 ;
764 informat sampleNo best32. ;
765 informat y best32. ;
766 informat x best32. ;
767 informat yBinom1 best32. ;
768 informat yBinom2 best32. ;
769 informat yBinom3 best32. ;
770 format sampleNo best12. ;
771 format y best12. ;
772 format x best12. ;
773 format yBinom1 best12. ;
774 format yBinom2 best12. ;
775 format yBinom3 best12. ;
776 input
777 sampleNo
778 y
779 x
780 yBinom1
781 yBinom2
782 yBinom3
783 ;
784 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
785 run;

NOTE: The infile 'C:\Users\BodnarJ\Desktop\functionalRequirement4_9_X\dataSim_FR_4_9_X.csv' is:
Filename=C:\Users\BodnarJ\Desktop\functionalRequirement4_9_X\dataSim_FR_4_9_X.csv,
RECFM=V,LRECL=32767,File Size (bytes)=438,
Last Modified=05Aug2018:08:39:09,
Create Time=05Aug2018:08:26:59

NOTE: 20 records were read from the infile
'C:\Users\BodnarJ\Desktop\functionalRequirement4_9_X\dataSim_FR_4_9_X.csv'.
The minimum record length was 17.
The maximum record length was 18.
NOTE: The data set WORK.ANADS has 20 observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time 0.05 seconds
cpu time 0.03 seconds


20 rows created in WORK.ANADS from
C:\Users\BodnarJ\Desktop\functionalRequirement4_9_X\dataSim_FR_4_9_X.csv.

 

NOTE: WORK.ANADS data set was successfully created.
NOTE: The data set WORK.ANADS has 20 observations and 6 variables.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.13 seconds
cpu time 0.06 seconds


786
787 proc logistic data = anaDS ;
788 model yBinom2 = x / LINK = logit;
789 run;

ERROR: All observations have the same response. No statistics are computed.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 20 observations read from the data set WORK.ANADS.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds


 

 

 

stat12000
Calcite | Level 5

So the larger piece of the puzzle is that I am building this type of model that will loop through say 30 parameters. Some of these parameters have all response values = 0, some have all response values = 1, and some have all responses values with a mixture of 0's and 1's. All of these data patterns are clinically expected.

 

Our other statistician tried the SAS proc logistic path to verify my results using R and her code crashed because of the all 0's and all 1's. Mine did not crash.

 

So R does this estimation (somehow), and the probabilities are pretty similar to PROC GLIMMIX which is giving probabilities around 4e-8. R is giving probabilities about 7e-12. So clearly these probabilities allow the same conclusion to be reached.

 

Thank you for your expertise.

 

Reeza
Super User

So R does this estimation (somehow), and the probabilities are pretty similar to PROC GLIMMIX which is giving probabilities around 4e-8. R is giving probabilities about 7e-12. So clearly these probabilities allow the same conclusion to be reached.

 

Those are 0.

 

Are you confident enough in the 'somehow' when you have to explain it to someone else is really all that matters. I would also expect any of those variables to be excluded (or fall out with a selection algorithm) from a final model when a full model is fit. 

stat12000
Calcite | Level 5

Exclusion of such parameters is not acceptable by the FDA. For such models we are essentially trying to demonstrate that probabilities are high when expected and low also when expected. I clearly understand that the true probability is zero, but a predicted probability cannot be zero; likewise for the predicted probability asymptote for all 1's. I equate this with computation of the statistical power of an effect size. Such a probability has range space 0 < power < 1, exclusive of 0 and 1.

 

Another need for this type of analysis pertains to the concept of analyte carryover where you want to show the likelihood of detecting elements (RBCs, WBCs, etc.) in high concentration (abnormal) samples is very high, and then the likelihood of detecting elements (RBCs, WBCs, etc.) in low concentration (normal) samples is very low.

PaigeMiller
Diamond | Level 26

@stat12000 wrote:

Exclusion of such parameters is not acceptable by the FDA. For such models we are essentially trying to demonstrate that probabilities are high when expected and low also when expected. I clearly understand that the true probability is zero, but a predicted probability cannot be zero; likewise for the predicted probability asymptote for all 1's. I equate this with computation of the statistical power of an effect size. Such a probability has range space 0 < power < 1, exclusive of 0 and 1.

 


While I have no experience with the FDA, let me say that modeling does not always lead to truth. The truth is, if your data is all zeros, then the probability of zero is 1, regardless of the fact that your modeling method doesn't get that number. In essense, when your data is all zeros, you have the wrong modeling method.

 

I clearly understand that the true probability is zero, but a predicted probability cannot be zero; likewise for the predicted probability asymptote for all 1's. I equate this with computation of the statistical power of an effect size. Such a probability has range space 0 < power < 1, exclusive of 0 and 1.

 

Your modeling method fails when there are all zeros in the Y variable. So don't use it. 

--
Paige Miller

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1856 views
  • 3 likes
  • 4 in conversation