BookmarkSubscribeRSS Feed
pmotlagh
Calcite | Level 5

I'm using SAS 9.4.  I have a dataset with the following variables: IDnum, exposures, cost.

I want to divide my dataset into 25 bins of equal exposures and then calculate the average cost of each of the 25 bins. I tried the following:

 

proc hpbin data=TESTDATA output=out numbin=25 PSEUDO_QUANTILE;

   input exposures ;

   ID IDnum;

run;

 

PROC SORT DATA=TESTDATA;

      BY IDnum;

PROC SORT DATA=OUT;

      BY IDnum;

RUN;

DATA ALL;

      MERGE TESTDATA OUT;

      BY IDnum;

RUN;

 

PROC MEANS DATA=ALL;

      BY BIN_EXPOSURES;

      VAR cost;

      WEIGHT exposures;

      OUTPUT OUT=TWO MEAN=AVG_cost;

RUN;

 

PROC HPBIN does not give the equal size exposure bins.

Any suggestions?

8 REPLIES 8
Ksharp
Super User

Do you mean equal FREQ bin ?

If 'exposures' has tie value, it would not give you equal size bin, but almost equal .

 

Also could try PROC RANKS :

 

proc ranks data=have ties=high group=10 out=want ;

var exposures;

ranks rank_exposures;

run;

pmotlagh
Calcite | Level 5

Thank you for the reply.

Not the same FREQ but approximately the same number of "exposures" (the variable in the dataset) in all 25 bin.

I tried PROC RANK but it doesn't seem to do the trick. 

PaigeMiller
Diamond | Level 26

@pmotlagh wrote:

 

I tried PROC RANK but it doesn't seem to do the trick. 


Could you explain further?

--
Paige Miller
pmotlagh
Calcite | Level 5

PROC RANK did not give me the equal "exposures" bins.

Rick_SAS
SAS Super FREQ

Your PROC MEANS code is using 

 BY BIN_EXPOSURES;

but the data are not sorted by that variable. Perhaps you meant to use CLASS BIN_EXPOSURES.

 

Anyway, as KSharp said, you will get approximately equat bins, but tied observations might make some bins contain more than others. The Mapping table from PROC HPBIN tells you how many observations are in each bin. Run the following example and notice that all of the bins have approximately 40 observations, but some have a few more or less:

 

data testdata;
call streaminit(1);
do idnum = 1 to 1000;
  exposures = round(rand("Normal"), 0.01);
  cost = rand("Lognormal");
  output;
end;
run;

proc hpbin data=testdatA output=out numbin=25 PSEUDO_QUANTILE;
   input exposures;
   ID idnum;
   ods select Mapping;
run;

For more information about various binning methods and techniques in SAS, see "The essential guide to binning in SAS,"

which includes a discussion of PROC HPBIN.

pmotlagh
Calcite | Level 5

Thanks Rick.

I think the problem then must be the many tied "exposures" in the data.  Is there an option that can be used to get around it?

Rick_SAS
SAS Super FREQ

In the article "The essential guide to binning in SAS,":

- Scroll down the section "Quantile binning in SAS."

- Read the fourth bullet.

 

In the article, I say"I do not endorse that practice" because it does not make sense to place observations are in the different bins when they have exactly the same values. Caveat emptor!

pmotlagh
Calcite | Level 5
Thank you.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 3221 views
  • 0 likes
  • 4 in conversation