Hi,
How can I adjust for oversampling in my Binary Logistic analysis?
This is where I left off:
ods noproctitle;
ods graphics / imagemap=on;
proc logistic data=WORK.BANK;
class combinedages 'married or not'n 'binned job'n 'binned education'n housing
loan contact poutcome 'contacted since by # months'n / param=glm;
model purchased(event='1')=combinedages 'married or not'n 'binned job'n
'binned education'n housing loan contact poutcome
'contacted since by # months'n balance duration 'calls this campaing'n /
ctable link=logit technique=fisher;
output out=work.bankstast2 predicted=pred_;
score out=work.bankscores2;
run;
ods noproctitle;
ods graphics / imagemap=on;
proc logistic data=WORK.BANK;
class combinedages 'married or not'n 'binned job'n 'binned education'n housing
loan contact poutcome 'contacted since by # months'n / param=glm;
model purchased(event='1')=combinedages 'married or not'n 'binned job'n
'binned education'n housing loan contact poutcome
'contacted since by # months'n balance duration 'calls this campaing'n /
ctable link=logit technique=fisher;
output out=work.bankstast2 predicted=pred_;
score out=work.bankscores2;
run;
ods noproctitle;
ods graphics / imagemap=on;
proc logistic data=WORK.BANK;
class combinedages 'married or not'n 'binned job'n 'binned education'n housing
loan contact poutcome 'contacted since by # months'n / param=glm;
model purchased(event='1')=combinedages 'married or not'n 'binned job'n
'binned education'n housing loan contact poutcome
'contacted since by # months'n balance duration 'calls this campaing'n /
ctable link=logit technique=fisher;
output out=work.bankstast2 predicted=pred_;
score out=work.bankscores2;
run;
ods noproctitle; ods graphics / imagemap=on; proc logistic data=WORK.BANK; class combinedages 'married or not'n 'binned job'n 'binned education'n housing loan contact poutcome 'contacted since by # months'n / param=glm; model purchased(event='1')=combinedages 'married or not'n 'binned job'n 'binned education'n housing loan contact poutcome 'contacted since by # months'n balance duration 'calls this campaing'n / ctable link=logit technique=fisher; output out=work.bankstast2 predicted=pred_; score out=work.bankscores2; run;
You should describe how the sample and over-sample were done.
You would assign a weight for each observation which is typically the inverse of the probability of selecting that subject for the sample.
Also, if the sample methodology is complex you would use Proc Surveylogistic (and other Survey procs) for analysis where you provide additional information about the sample such as stratification and cluster variables, type of sample - simple random, proportional size, sequential or others.
Oversample typically means that some rule for selecting the sample was different for some part of the population. Without that differences in rules and the affect it would have an participation then it is extremely difficult to guess what may be needed to handle your particular "oversample".
@sxking2 wrote:
oh,,, I thought it was when you had too few events per number of
observations.
Oversample is one of the techniques to improve counts of events to work with.
Suppose that in the general population of the country that one person in 1,000,000 has a characteristic.
But you have information that left-handed red-headed people under 5 ft tall have the occurrence 1 in 1000 (or some other more accessible rate) then you include more left-handed red-headed short people in the sample than would occur with a simple random selection of people in the population in hopes of getting more "events". The difference in probabilities allows you to weight data so the result is more generally useful. And tends to be a complex sample in some cases.
Check SCORE statement's option PRIOR= and PRIOREVENT= , put real probability in it .
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.