Hi expert
I'm trying to model a binary outcome event (1=opportunity generated, 0=opportunity missing), based on a set of covariate, one of which is categorical and binary (presence or not of a marketing interaction)
Output provided below
I'm getting for the categorical covariate estimate="0". I remember I have read somewhere that whenever a variable can be represented as linear outcome of other covariates it will not be represented with a coefficient. Then the question is how to check if that covariate is significant?
Bests
You may have more serious data issues than co-linearity.
Run Proc freq on the variable Mkt_flag60_1, since that is the only one showing an estimate of 0. The DF=0 makes me strongly suspect you only have one level of non-missing values for that variable.
If those 12 and 24 and possibly that 60 in the variable names refer to time values you may have a data structure issue as well
The mkt_flag60 has both positive and negative value, despite the negative being only associated to target=1.
This make sense as the mkt_flag tell us if a marketing touchpoint was present, and if to that touchpoint followed up an opportunity or not (there are no record where a marketing touchpoint did not happen; but there are records where an opportunity happened BUT no marketing touchpoint was not present)
Those opp 12 and 24 records count the number of opportunities in the previous 12 and 24 months preceding the marketing date (or before the 60th day before the opportunity date, if not marketing touchpoint did happen)
The int_60 days measure if there were other marketing touchpoints BEFORE the last touchpoint described bu the mkt_flag60. There are only values ONLY IF there is a mkt_flag=1 (as if there is no mkt touchpoint, then there can't be a chain of mkt touchpoint before that)
What happened to variable mkt_flag60_t in your original output? Why are you showing us different variables now?
What do you mean what happen to the covariate? I just tabulated the value of the variable. did nothing more.
Sorry, I understand what you are saying. The suffix "_t" is missing because I tabulated the numeric version of the same variable. But the values are exactly the same.
@dcortell wrote:
Sorry, I understand what you are saying. The suffix "_t" is missing because I tabulated the numeric version of the same variable. But the values are exactly the same.
The only thing I can say is show us the _t variable run through PROC FREQ.
here you go
Probably time to share the code for the logistic submitted as well. Best would be to copy the text from the LOG with the code submitted and all messages generated by the procedure.
We can't tell at this point if you may have subset the data with a WHERE statement, or if using BY group processing to generate subsets. At which point your "problem" may only be with one subset of the data, or just enough missing values for other variables in the model to remove all of the 0 values from the final model.
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
SYMBOLGEN: Macro variable _SASWSTEMP_ resolves to
/r/ge.unx.sas.com/vol/vol110/u11/spndac/.sasstudio/.images/4f40afea-269a-46b8-867a-731823c2ccda
SYMBOLGEN: Some characters in the above value which were subject to macro quoting have been unquoted for printing.
SYMBOLGEN: Macro variable GRAPHINIT resolves to GOPTIONS RESET=ALL GSFNAME=_GSFNAME;
NOTE: ODS statements in the SAS Studio environment may disable some output features.
73
74 proc logistic data=ateamd.model_source_imp_single ;
75 class mkt_flag60_t / descending ;
76
77 model opp_flag60(event="1")=n_opp_prev12 n_opp_prev24
78 tot_int_prev60 mkt_flag60_t n_opp_prev12_w n_opp_prev24_w
79
80 n_opp_prev12*n_opp_prev24*
81 tot_int_prev60*mkt_flag60*n_opp_prev24_w*n_opp_prev12_w
82
83 / expb;
84 ods output ParameterEstimates = model_win ;
85 run;
NOTE: PROC LOGISTIC is modeling the probability that opp_flag60=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
WARNING: The information matrix is singular and thus the convergence is questionable. Try specifying a larger SINGULAR= value.
NOTE: The data set WORK.MODEL_WIN has 8 observations and 9 variables.
NOTE: Compressing data set WORK.MODEL_WIN increased size by 100.00 percent.
Compressed is 2 pages; un-compressed would require 1 pages.
NOTE: There were 2018674 observations read from the data set ATEAMD.MODEL_SOURCE_IMP_SINGLE.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 9.15 seconds
cpu time 8.06 seconds
86
87 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
SYMBOLGEN: Macro variable GRAPHTERM resolves to GOPTIONS NOACCESSIBLE;
99
WARNING: The information matrix is singular and thus the convergence is questionable. Try specifying a larger SINGULAR= value.
Your model cannot return coefficients for all variables because the matrix is singular; your variable mkt_flag_t is completely correlated with a linear combination of some of the other x-variables. So SAS sets the coefficient to zero.
You would have to remove one or more terms from the model to fix this.
when mkt_flag60_t =0 , you only have opp_flag60=1, not have opp_flag60=0,
your data got separately problem. you can not include mkt_flag60_t in your model(delete this variable).
that is the reason why you get DF=0 for mkt_flag60_t .
@dcortell wrote:
Yeah, the data are like that. An opp can or cannot be generated with a marketing touch point. However, when there is a marketing touch point then not always an opp is generated. The mkt_flag60 is the variable we wanna study as it is the one representing the mkt touch points
As I said, you cannot estimate all the terms in this model. You have to remove one (or more) terms from the model.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.