Logistic regression

malena · Posted 10-25-2017 12:13 PM

Good afternoon,

I recently took over running some logistic code from a previous programmer and I am note sure why a "freq" line is included before the class statement in the logistic procedure. Does anyone know what it does ? the probabilities I get are much higher when this line of code is included than when it is not.

proc logistic data=sponly DESCEND PLOTS(MAXPOINTS=NONE) ;
freq pending;

class

ageg4 (ref='5') /*ref=15-34*/
foreign (ref='1')
marital_status (ref='1') /*ref=Single*/
race (ref='2') /*ref=WNH*/
edu (ref='1') /*ref=High School or Less*/
gender (ref='2') /*ref=male*/
weekend (ref='0') /*ref=weekday*/
veteran_deaths(ref='2') /*ref=no veteran*/
;

model opioidtotal = ageg4 foreign marital_status race edu gender weekend veteran_deaths

/details rsquare ExpEst lackfit hierarchy=none CLPARM=Both CLODDS=Both /*CORRB CTABLE*/; roc;

** Model only for all knowns **;

where gender <> 9 and ageg4 <> 99 and race <>9 ;

** Create output data set for predictions **;

output OUT=PRED_SP p=_prob lower=_lower upper=_upper / /*alpha=.01 this will be for 99% CI*/;

run;

Reeza · Posted 10-25-2017 12:21 PM

FREQ is used when you have a single line of data representing multiple observations.

So, it will definitely change your results.

The FREQ statement identifies a variable that contains the frequency of occurrence of each observation.

http://documentation.sas.com/?docsetId=statug&docsetTarget=statug_logistic_syntax18.htm&docsetVersio...

Two other things:

1. Verify that <> is working as Not Equals, it actually means MAX in a SAS data step, but I'm not sure how it evaluates in a WHERE clause. Pretty sure it will be Not Equals, but it's worth a quick check.

2. On the CLASS statement I would expect PARAM=REF to be somewhere, but I don't see that, unless you do want GLM parameterization.

malena · Posted 10-25-2017 12:31 PM

thank you- then we should remove that line since the dataset only has one observation per person

Reeza · Posted 10-25-2017 12:37 PM

@malena wrote:

thank you- then we should remove that line since the dataset only has one observation per person

Then why do the results change if FREQ PENDING is included? The variable is obviously not all 1's. Is it 1/0 so it's a method to exclude pending observations or only considering pending cases?

PaigeMiller · Posted 10-25-2017 12:50 PM

@malena wrote:

thank you- then we should remove that line since the dataset only has one observation per person

Not necessarily, if the intent is to weight some observations more than others, or to exclude some observations which have a zero weight.

--
Paige Miller

WarrenKuhfeld · Posted 10-25-2017 05:06 PM

To understand how a FREQ variable works, work through this example. Compare the data sets X and X2 (the freq variable COUNT in X2 contains a count for each unique value). Compare the two analyses. They are identical. Yes, you can use zero weights and freqs to exclude observations.

data x;
   do i = 1 to 200;
      x = ceil(uniform(7) * 10);
      y = ceil(uniform(7) * 10);
      output;
      end;
   run;
   
proc reg data=x;
   model y = x;
   quit;
   
proc freq data=x noprint;
   tables x * y / out=x2;
   run;
   
proc reg data=x2;
   freq count;
   model y = x;
   quit;

Logistic regression

Re: Logistic regression

Re: Logistic regression

Re: Logistic regression

Re: Logistic regression

Re: Logistic regression

Registration is open