BookmarkSubscribeRSS Feed
dcortell
Pyrite | Level 9

Hi expert

 

I'm trying to model a binary outcome event (1=opportunity generated, 0=opportunity missing), based on a set of covariate, one of which is categorical and binary (presence or not of a marketing interaction)

 

Output provided below

 

15pg.png

 

I'm getting for the categorical covariate estimate="0". I remember I have read somewhere that whenever a variable can be represented as linear outcome of other covariates it will not be represented with a coefficient. Then the question is how to check if that covariate is significant?

 

Bests

14 REPLIES 14
ballardw
Super User

You may have more serious data issues than co-linearity.

Run Proc freq on the variable Mkt_flag60_1, since that is the only one showing an estimate of 0. The DF=0 makes me strongly suspect you only have one level of non-missing values for that variable.

 

If those 12 and 24 and possibly that 60 in the variable names refer to time values you may have a data structure issue as well

dcortell
Pyrite | Level 9

15pg.png

The mkt_flag60 has both positive and negative value, despite the negative being only associated to target=1.

 

This make sense as the mkt_flag tell us if a marketing touchpoint was present, and if to that touchpoint followed up an opportunity or not (there are no record where a marketing touchpoint did not happen; but there are records where an opportunity happened BUT no marketing touchpoint was not present)

 

Those opp 12 and 24 records count the number of opportunities in the previous 12 and 24 months preceding the marketing date (or before the 60th day before the opportunity date, if not marketing touchpoint did happen)

 

The int_60 days measure if there were other marketing touchpoints BEFORE the last touchpoint described bu the mkt_flag60. There are only values ONLY IF there is a mkt_flag=1 (as if there is no mkt touchpoint, then there can't be a chain of mkt touchpoint before that)

PaigeMiller
Diamond | Level 26

What happened to variable mkt_flag60_t in your original output? Why are you showing us different variables now?

--
Paige Miller
dcortell
Pyrite | Level 9

What do you mean what happen to the covariate? I just tabulated the value of the variable. did nothing more.

dcortell
Pyrite | Level 9

Sorry, I understand what you are saying. The suffix "_t" is missing because I tabulated the numeric version of the same variable. But the values are exactly the same.

PaigeMiller
Diamond | Level 26

@dcortell wrote:

Sorry, I understand what you are saying. The suffix "_t" is missing because I tabulated the numeric version of the same variable. But the values are exactly the same.


The only thing I can say is show us the _t variable run through PROC FREQ.

--
Paige Miller
ballardw
Super User

Probably time to share the code for the logistic submitted as well.  Best would be to copy the text from the LOG with the code submitted and all messages generated by the procedure.

 

We can't tell at this point if you may have subset the data with a WHERE statement, or if using BY group processing to generate subsets. At which point your "problem" may only be with one subset of the data,  or just enough missing values for other variables in the model to remove all of the 0 values from the final model.

dcortell
Pyrite | Level 9
1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 SYMBOLGEN:  Macro variable _SASWSTEMP_ resolves to 
             /r/ge.unx.sas.com/vol/vol110/u11/spndac/.sasstudio/.images/4f40afea-269a-46b8-867a-731823c2ccda
 SYMBOLGEN:  Some characters in the above value which were subject to macro quoting have been unquoted for printing.
 SYMBOLGEN:  Macro variable GRAPHINIT resolves to GOPTIONS RESET=ALL GSFNAME=_GSFNAME;
 NOTE: ODS statements in the SAS Studio environment may disable some output features.
 73         
 74         proc logistic data=ateamd.model_source_imp_single ;
 75          class mkt_flag60_t / descending ;
 76         
 77           model opp_flag60(event="1")=n_opp_prev12 n_opp_prev24
 78           tot_int_prev60 mkt_flag60_t n_opp_prev12_w n_opp_prev24_w
 79         
 80           n_opp_prev12*n_opp_prev24*
 81           tot_int_prev60*mkt_flag60*n_opp_prev24_w*n_opp_prev12_w
 82         
 83            / expb;
 84           ods output ParameterEstimates = model_win ;
 85         run;
 
 NOTE: PROC LOGISTIC is modeling the probability that opp_flag60=1.
 NOTE: Convergence criterion (GCONV=1E-8) satisfied.
 WARNING: The information matrix is singular and thus the convergence is questionable.  Try specifying a larger SINGULAR= value.
 NOTE: The data set WORK.MODEL_WIN has 8 observations and 9 variables.
 NOTE: Compressing data set WORK.MODEL_WIN increased size by 100.00 percent. 
       Compressed is 2 pages; un-compressed would require 1 pages.
 NOTE: There were 2018674 observations read from the data set ATEAMD.MODEL_SOURCE_IMP_SINGLE.
 NOTE: PROCEDURE LOGISTIC used (Total process time):
       real time           9.15 seconds
       cpu time            8.06 seconds
       
 
 86         
 87         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 SYMBOLGEN:  Macro variable GRAPHTERM resolves to GOPTIONS NOACCESSIBLE;
 99         
PaigeMiller
Diamond | Level 26
 WARNING: The information matrix is singular and thus the convergence is questionable.  Try specifying a larger SINGULAR= value.

Your model cannot return coefficients for all variables because the matrix is singular; your variable mkt_flag_t is completely correlated with a linear combination of some of the other x-variables. So SAS sets the coefficient to zero.

 

You would have to remove one or more terms from the model to fix this.

--
Paige Miller
Ksharp
Super User

Ksharp_0-1729041276957.png

when mkt_flag60_t =0 , you only have opp_flag60=1, not have opp_flag60=0, 

your data got separately problem. you can not include mkt_flag60_t  in your model(delete this variable). 

that is the reason why you get DF=0 for mkt_flag60_t .

dcortell
Pyrite | Level 9
Yeah, the data are like that. An opp can or cannot be generated with a marketing touch point. However, when there is a marketing touch point then not always an opp is generated. The mkt_flag60 is the variable we wanna study as it is the one representing the mkt touch points
Ksharp
Super User
Then you can not use proc logistic if you really need include "mkt_flag60 " variable.
You could try other way . like PROC FREQ + Chisq Test, Bayese Logistic Model(proc genmod), Poisson Regression :
http://support.sas.com/kb/24/188.html
PaigeMiller
Diamond | Level 26

@dcortell wrote:
Yeah, the data are like that. An opp can or cannot be generated with a marketing touch point. However, when there is a marketing touch point then not always an opp is generated. The mkt_flag60 is the variable we wanna study as it is the one representing the mkt touch points

As I said, you cannot estimate all the terms in this model. You have to remove one (or more) terms from the model.

--
Paige Miller

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 544 views
  • 0 likes
  • 4 in conversation