I'm using SAS 9.4. I have a dataset with the following variables: IDnum, exposures, cost.
I want to divide my dataset into 25 bins of equal exposures and then calculate the average cost of each of the 25 bins. I tried the following:
proc hpbin data=TESTDATA output=out numbin=25 PSEUDO_QUANTILE;
input exposures ;
ID IDnum;
run;
PROC SORT DATA=TESTDATA;
BY IDnum;
PROC SORT DATA=OUT;
BY IDnum;
RUN;
DATA ALL;
MERGE TESTDATA OUT;
BY IDnum;
RUN;
PROC MEANS DATA=ALL;
BY BIN_EXPOSURES;
VAR cost;
WEIGHT exposures;
OUTPUT OUT=TWO MEAN=AVG_cost;
RUN;
PROC HPBIN does not give the equal size exposure bins.
Any suggestions?
Do you mean equal FREQ bin ?
If 'exposures' has tie value, it would not give you equal size bin, but almost equal .
Also could try PROC RANKS :
proc ranks data=have ties=high group=10 out=want ;
var exposures;
ranks rank_exposures;
run;
Thank you for the reply.
Not the same FREQ but approximately the same number of "exposures" (the variable in the dataset) in all 25 bin.
I tried PROC RANK but it doesn't seem to do the trick.
@pmotlagh wrote:
I tried PROC RANK but it doesn't seem to do the trick.
Could you explain further?
PROC RANK did not give me the equal "exposures" bins.
Your PROC MEANS code is using
BY BIN_EXPOSURES;
but the data are not sorted by that variable. Perhaps you meant to use CLASS BIN_EXPOSURES.
Anyway, as KSharp said, you will get approximately equat bins, but tied observations might make some bins contain more than others. The Mapping table from PROC HPBIN tells you how many observations are in each bin. Run the following example and notice that all of the bins have approximately 40 observations, but some have a few more or less:
data testdata;
call streaminit(1);
do idnum = 1 to 1000;
exposures = round(rand("Normal"), 0.01);
cost = rand("Lognormal");
output;
end;
run;
proc hpbin data=testdatA output=out numbin=25 PSEUDO_QUANTILE;
input exposures;
ID idnum;
ods select Mapping;
run;
For more information about various binning methods and techniques in SAS, see "The essential guide to binning in SAS,"
which includes a discussion of PROC HPBIN.
Thanks Rick.
I think the problem then must be the many tied "exposures" in the data. Is there an option that can be used to get around it?
In the article "The essential guide to binning in SAS,":
- Scroll down the section "Quantile binning in SAS."
- Read the fourth bullet.
In the article, I say"I do not endorse that practice" because it does not make sense to place observations are in the different bins when they have exactly the same values. Caveat emptor!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.