BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
benvar
Fluorite | Level 6

I have data of purchases with different payment amounts.  I want to bucket the data so that each bucket has an equal proportion of total dollars.  I am experimenting with PROC HPBIN but I don't know which bucketing method to use or if I need to use a different PROC.

 

Example of data:

 

Purchase #       Dollar Amount

1                               $10

2                               $20

3                               $10

4                               $10

5                               $50

6                               $30

7                               $10

8                               $20

9                               $10

10                             $20

11                             $10

 

Code should result in buckets like this:

 

Bucket 1: $0-$10: total of $50 (Purchases #1, 3, 4, 7, 9)

Bucket 2: $10-$20: total of $50 (Purchases #11, 2, 😎

Bucket 3: $20-$30: total of $50 (Purchases #10, 6)

Bucket 4: More than $30: total of $50 (Purchase #5)

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

As Paul says, there are assumptions that may or may not hold. Here's a program that enables the cumulative sums to exceed the target values (50).

 

proc sort data=Have out=HaveSort;
by Amount;
run;

/* manually inspect the sum */
proc means data=Have Sum;
var Amount;
run;

%let nGroups = 4;
%let SUM = 200;    /* <== input sum from PROC MEANS */

data Cumul;
retain Bin 1;
set HaveSort;
d = &SUM / &nGroups;
cusum + Amount;
diff = cusum - d;
if diff > 0 then do;
   cusum = cusum - d;
   Bin + 1;
end;
run;

proc print; run;

 

The code won't work perfectly if there are very large amounts like $200.

View solution in original post

4 REPLIES 4
Rick_SAS
SAS Super FREQ

I think that most people would not consider this "binning." Binning usually means assigning similar units to groups: For example, you could bin by the dollar amount and get one bin that has small amounts, another that has medium amounts, and one that has large amounts.

 

From your description, it sounds like you want to split the data into groups that have (approximately) sum (or proportion) of dollars. If that is correct, you might want to read the article "Split data into groups that have the same mean " 

See also the comments at the end of the article for an alternative solution.

 

benvar
Fluorite | Level 6
Yes, I need my data to be in order of dollar amount (from lowest to highest) and then divided into groups that each have the same sum of total dollars. The article you linked splits data into groups with the same mean which is not what I want. Can you please help?
hashman
Ammonite | Level 13

@benvar :

Your presumption that what you need to do is possible is based on the assumption that the data are actually structured in such a way that as you go through the amounts in ascending order, first N1 items will tally up to 50, next N2 items will also tally up  to 50, and so on. Your sample data shown here are in fact structured in this manner. If your real data also are, then try this:

data have ;                         
  input purchase amount ;           
  cards ;                           
 1  10                              
 2  20                              
 3  10                              
 4  10                              
 5  50                              
 6  30                              
 7  10                              
 8  20                              
 9  10                              
10  20                              
11  10                              
run ;                               
                                    
proc sort ;                         
  by amount ;                       
run ;                               
                                    
data want (drop = _:) ;             
  do _q = 1 by 1 until (_sum = 50) ;
    set have ;                      
    _sum = sum (_sum, amount) ;     
  end ;                             
  bucket = _n_ ;                    
  do _q = 1 to _q ;                 
    set have ;                      
    output ;                        
  end ;                             
run ;                               

Kind regards

Paul D.  

Rick_SAS
SAS Super FREQ

As Paul says, there are assumptions that may or may not hold. Here's a program that enables the cumulative sums to exceed the target values (50).

 

proc sort data=Have out=HaveSort;
by Amount;
run;

/* manually inspect the sum */
proc means data=Have Sum;
var Amount;
run;

%let nGroups = 4;
%let SUM = 200;    /* <== input sum from PROC MEANS */

data Cumul;
retain Bin 1;
set HaveSort;
d = &SUM / &nGroups;
cusum + Amount;
diff = cusum - d;
if diff > 0 then do;
   cusum = cusum - d;
   Bin + 1;
end;
run;

proc print; run;

 

The code won't work perfectly if there are very large amounts like $200.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1780 views
  • 1 like
  • 3 in conversation