BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
cwebb85
Calcite | Level 5

Hello,

I am trying to create 10 equal bins/deciles on my dataset by points. My data set is aggregated by week, customer, and points. 

 

I have tried using PROC RANK, however it does not create 10 equal bins every time by week because of the way that function uses Ties. 

 

I am trying to use the FLOOR function in a data step as it forces 10 bins everytime, the issues I am having with this way, is that i cannot get it to work to create to 10 bins for every week and it just does it on the entire data set. 

 

My question, is it possible to use the below FLOOR function in a PROC SQL statement that would allow me to use by GROUPS. (also, yes, i can add a by week to the below, but it doesnt acutally group it by week)

 

data xx; set x nobs=numobs;
decile=floor(_n_*10/(numobs+1));
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @cwebb85 and welcome to the SAS Support Communities!

 


@cwebb85 wrote:

I have tried using PROC RANK, however it does not create 10 equal bins every time by week because of the way that function uses Ties. 


Would it be acceptable to use random numbers as tie breakers?

/* Add random numbers as tie breakers */

data tmp / view=tmp;
call streaminit('MT64',27182818);
set have;
group=points+rand('uniform'); /* assuming only integer values in variable POINTS */
run;

/* Create deciles */

proc rank data=tmp out=want groups=10;
by week;
var group;
run;

proc sort data=want;
by week points group;
run;

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

@cwebb85 wrote:

Hello,

I am trying to create 10 equal bins/deciles on my dataset by points. My data set is aggregated by week, customer, and points. 

 

I have tried using PROC RANK, however it does not create 10 equal bins every time by week because of the way that function uses Ties. 

 

I am trying to use the FLOOR function in a data step as it forces 10 bins everytime, the issues I am having with this way, is that i cannot get it to work to create to 10 bins for every week and it just does it on the entire data set. 

 

My question, is it possible to use the below FLOOR function in a PROC SQL statement that would allow me to use by GROUPS. (also, yes, i can add a by week to the below, but it doesnt acutally group it by week)

 

data xx; set x nobs=numobs;
decile=floor(_n_*10/(numobs+1));
run;

 


I doubt SQL will work directly, as there really isn't an equivalent in SQL to the data step variable _N_. You could probably do this in a DATA step and I request that you provide sample data for us to work with, as working data step code (which you can type in yourself, or follow these instructions) and not in any other format.

--
Paige Miller
cwebb85
Calcite | Level 5

Thanks for the help, attached is a sample of the data i am working with. Actual file is much large with many weeks of data. 

PaigeMiller
Diamond | Level 26

I refuse to download files from this (or any other) public forum. So repeating:

 

I request that you provide sample data for us to work with, as working data step code (which you can type in yourself, or follow these instructions) and not in any other format.

--
Paige Miller
fja
Lapis Lazuli | Level 10 fja
Lapis Lazuli | Level 10

"Actual file is much large with many weeks of data."
Hello!
And your worry with that would be what? I haven't quite caught the point ...
--FJa

FreelanceReinh
Jade | Level 19

Hello @cwebb85 and welcome to the SAS Support Communities!

 


@cwebb85 wrote:

I have tried using PROC RANK, however it does not create 10 equal bins every time by week because of the way that function uses Ties. 


Would it be acceptable to use random numbers as tie breakers?

/* Add random numbers as tie breakers */

data tmp / view=tmp;
call streaminit('MT64',27182818);
set have;
group=points+rand('uniform'); /* assuming only integer values in variable POINTS */
run;

/* Create deciles */

proc rank data=tmp out=want groups=10;
by week;
var group;
run;

proc sort data=want;
by week points group;
run;
cwebb85
Calcite | Level 5

Thank you, I was just testing this and comparing results. I believe this will do the trick in the application i am using it for. 

ballardw
Super User

Do want equal size groups or "deciles"? They are different as a decile would mean grouping on the values of some variable.

If you want groups, not actual deciles you might try Proc Surveyselect with the groups option.

data example;
   do week=1 to 10;
      recsperweek = rand('integer',10,100);
      do rec=1 to recsperweek;
         othervar= rand('uniform');
         output;
      end;
   end;
run;

proc surveyselect data=example out=selected
   groups=10;
   strata week ;
run;

proc freq data=selected;
   tables week*groupid/norow nocol nocum nopercent;
run;

The example data set is just to have something to work with. Note that you should  be able to copy into your SAS editor and run this code. The Groupid id is added by Proc Survey select and the Proc Freq code shows how many records are assigned to each of the 10 bins. This is a random selection so where the not quite a multiple of 10 observations go is not set.

The Strata statement in Surveyselect does require the data to be sorted by that variable in the input data set.

 

To do something similar to your data step for anything resembling a "week" you would need to get a count of how many observations are in a week, then use a counter reset at the first value of each week. However, this would require each week to have an integer multiple of number of bins to have each week with equal size groups. So, does your data support that?

Run proc freq on whatever variable is defining week and see if value has a multiple of 10 in the count. I suspect not.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 906 views
  • 0 likes
  • 5 in conversation