Good afternoon,
I recently took over running some logistic code from a previous programmer and I am note sure why a "freq" line is included before the class statement in the logistic procedure. Does anyone know what it does ? the probabilities I get are much higher when this line of code is included than when it is not.
proc logistic data=sponly DESCEND PLOTS(MAXPOINTS=NONE) ;
freq pending;
class
ageg4 (ref='5') /*ref=15-34*/
foreign (ref='1')
marital_status (ref='1') /*ref=Single*/
race (ref='2') /*ref=WNH*/
edu (ref='1') /*ref=High School or Less*/
gender (ref='2') /*ref=male*/
weekend (ref='0') /*ref=weekday*/
veteran_deaths(ref='2') /*ref=no veteran*/
;
model opioidtotal = ageg4 foreign marital_status race edu gender weekend veteran_deaths
/details rsquare ExpEst lackfit hierarchy=none CLPARM=Both CLODDS=Both /*CORRB CTABLE*/; roc;
** Model only for all knowns **;
where gender <> 9 and ageg4 <> 99 and race <>9 ;
** Create output data set for predictions **;
output OUT=PRED_SP p=_prob lower=_lower upper=_upper / /*alpha=.01 this will be for 99% CI*/;
run;
FREQ is used when you have a single line of data representing multiple observations.
So, it will definitely change your results.
The FREQ statement identifies a variable that contains the frequency of occurrence of each observation.
Two other things:
1. Verify that <> is working as Not Equals, it actually means MAX in a SAS data step, but I'm not sure how it evaluates in a WHERE clause. Pretty sure it will be Not Equals, but it's worth a quick check.
2. On the CLASS statement I would expect PARAM=REF to be somewhere, but I don't see that, unless you do want GLM parameterization.
thank you- then we should remove that line since the dataset only has one observation per person
@malena wrote:
thank you- then we should remove that line since the dataset only has one observation per person
Then why do the results change if FREQ PENDING is included? The variable is obviously not all 1's. Is it 1/0 so it's a method to exclude pending observations or only considering pending cases?
@malena wrote:
thank you- then we should remove that line since the dataset only has one observation per person
Not necessarily, if the intent is to weight some observations more than others, or to exclude some observations which have a zero weight.
To understand how a FREQ variable works, work through this example. Compare the data sets X and X2 (the freq variable COUNT in X2 contains a count for each unique value). Compare the two analyses. They are identical. Yes, you can use zero weights and freqs to exclude observations.
data x;
do i = 1 to 200;
x = ceil(uniform(7) * 10);
y = ceil(uniform(7) * 10);
output;
end;
run;
proc reg data=x;
model y = x;
quit;
proc freq data=x noprint;
tables x * y / out=x2;
run;
proc reg data=x2;
freq count;
model y = x;
quit;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.