Re: creating equal size bins

pmotlagh · Posted 02-26-2020 03:12 PM

I'm using SAS 9.4. I have a dataset with the following variables: IDnum, exposures, cost.

I want to divide my dataset into 25 bins of equal exposures and then calculate the average cost of each of the 25 bins. I tried the following:

proc hpbin data=TESTDATA output=out numbin=25 PSEUDO_QUANTILE;

input exposures ;

ID IDnum;

run;

PROC SORT DATA=TESTDATA;

BY IDnum;

PROC SORT DATA=OUT;

BY IDnum;

RUN;

DATA ALL;

MERGE TESTDATA OUT;

BY IDnum;

RUN;

PROC MEANS DATA=ALL;

BY BIN_EXPOSURES;

VAR cost;

WEIGHT exposures;

OUTPUT OUT=TWO MEAN=AVG_cost;

RUN;

PROC HPBIN does not give the equal size exposure bins.

Any suggestions?

Ksharp · Posted 02-26-2020 10:39 PM

Do you mean equal FREQ bin ?

If 'exposures' has tie value, it would not give you equal size bin, but almost equal .

Also could try PROC RANKS :

proc ranks data=have ties=high group=10 out=want ;

var exposures;

ranks rank_exposures;

run;

pmotlagh · Posted 02-27-2020 08:43 AM

Thank you for the reply.

Not the same FREQ but approximately the same number of "exposures" (the variable in the dataset) in all 25 bin.

I tried PROC RANK but it doesn't seem to do the trick.

PaigeMiller · Posted 02-27-2020 08:50 AM

@pmotlagh wrote:

I tried PROC RANK but it doesn't seem to do the trick.

Could you explain further?

--
Paige Miller

pmotlagh · Posted 02-27-2020 08:52 AM

PROC RANK did not give me the equal "exposures" bins.

Rick_SAS · Posted 02-27-2020 08:59 AM

Your PROC MEANS code is using

BY BIN_EXPOSURES;

but the data are not sorted by that variable. Perhaps you meant to use CLASS BIN_EXPOSURES.

Anyway, as KSharp said, you will get approximately equat bins, but tied observations might make some bins contain more than others. The Mapping table from PROC HPBIN tells you how many observations are in each bin. Run the following example and notice that all of the bins have approximately 40 observations, but some have a few more or less:

data testdata;
call streaminit(1);
do idnum = 1 to 1000;
  exposures = round(rand("Normal"), 0.01);
  cost = rand("Lognormal");
  output;
end;
run;

proc hpbin data=testdatA output=out numbin=25 PSEUDO_QUANTILE;
   input exposures;
   ID idnum;
   ods select Mapping;
run;

For more information about various binning methods and techniques in SAS, see "The essential guide to binning in SAS,"

which includes a discussion of PROC HPBIN.

pmotlagh · Posted 02-27-2020 10:54 AM

Thanks Rick.

I think the problem then must be the many tied "exposures" in the data. Is there an option that can be used to get around it?

Rick_SAS · Posted 02-27-2020 11:11 AM

In the article "The essential guide to binning in SAS,":

- Scroll down the section "Quantile binning in SAS."

- Read the fourth bullet.

In the article, I say"I do not endorse that practice" because it does not make sense to place observations are in the different bins when they have exactly the same values. Caveat emptor!

pmotlagh · Posted 03-02-2020 02:54 PM

Thank you.

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!