BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
yukine
Calcite | Level 5

Hi I have a variable that has value from for example -3000 to 20,000, i want to assign them into 10,000 buckets evenly spaced, and calculate the percentage of observations that fall between the values. How is this done in sas steps?

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Without knowing how this is going to be used and which assignment makes sense for some I would start with this.

data have;
   do x = -200 to 3000;
      output;
   end;
run;

proc sql;
select max(x) into : max from have;
select min(x) into : min from have;
quit;
 
data buckets;
   set have;
   bin = floor((x-&min)/((&max-&min)/10000)) + 1;
run;

Note that with a range of -200 to 3000 if your values are integers you are not going to use all 10,000 bins. Ever. You only have 3200 values if integer valued so roughly two thirds of the bins will receive no assignment.

 

View solution in original post

6 REPLIES 6
PeterClemmensen
Tourmaline | Level 20

Use PROC RANK with the GROUPS option

yukine
Calcite | Level 5

Hi Draycut, I tried out proc rank but it gives me equal number of observations in each assigned group. However, my want the group to threshold to be evenly spaced between maxinum and minimum with 10,000 buckets. I should assign the variable based on the value of the variable I looked at, so maybe group 1 will only have 1 observation, and group 30 will have 360 observations, do you know a solution for this? thank you!

RW9
Diamond | Level 26 RW9
Diamond | Level 26

You would use do loops or a calculation.  To get further answers to you question provide the information suggested in the box next to post a question - provide test data in the form of a datastep, what the output should look like and describe the logic between them.

At a rough guess:

data want;
  set have;
bin=ceil(value / 10000);
run;
proc means data=want;
...
run;
yukine
Calcite | Level 5

my dataset would be something like:

I have about 2330000 observations, I want to assign basically evenly spaced 10,000 buckets. The bucket criteira would the max(var)-min(var). So for example, my maximum value is 3000 and my minimum value is -200, So my bucket spread will be (3000+200)/10,000=0.32. And bucket 1 will be value between -200 to (-200+0.32), bucket 2 will be value between (-200+0.32) to (-200+0.32*2) and so on. I want to have a new variable that see if what bucket is falls into.

 

Var            New_var

-200              1

-53                ?

-5                  ?

-46                ?

5

8

4

56

7542

242

....

Astounding
PROC Star

I'd set up a format to define the buckets.  For example:

 

proc sql;

select max(var) into : max from have;

select min(var) into : min from have;

quit;

 

data buckets;

interval = (&max - &min) / 10000;

fmtname = 'bucket';

do label = 1 to 10000;

   start = &min + (label - 1) * interval;

   end = start + interval;

   if label=10000 then end=&max;

   output;

end;

run;

proc format cntlin=buckets;

run;

 

This gives you a format that translates each range into a bucket (1 to 10000).  For example, you could use:

 

data want;

set have;

bucket = put(var, bucket.);

run;

 

In this example, BUCKET would be character, but you could always convert to numeric if you want to:

 

bucket = input(put(var, bucket.), 5.);

 

The end point of the final bucket gets hard-coded to the max, because 10,000 iterations of floating point math might produce a slight difference in the endpoint.

 

The code is untested.  It looks right, but you will have to do the testing.

 

ballardw
Super User

Without knowing how this is going to be used and which assignment makes sense for some I would start with this.

data have;
   do x = -200 to 3000;
      output;
   end;
run;

proc sql;
select max(x) into : max from have;
select min(x) into : min from have;
quit;
 
data buckets;
   set have;
   bin = floor((x-&min)/((&max-&min)/10000)) + 1;
run;

Note that with a range of -200 to 3000 if your values are integers you are not going to use all 10,000 bins. Ever. You only have 3200 values if integer valued so roughly two thirds of the bins will receive no assignment.

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2530 views
  • 0 likes
  • 5 in conversation