- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am trying to create 10 equal bins/deciles on my dataset by points. My data set is aggregated by week, customer, and points.
I have tried using PROC RANK, however it does not create 10 equal bins every time by week because of the way that function uses Ties.
I am trying to use the FLOOR function in a data step as it forces 10 bins everytime, the issues I am having with this way, is that i cannot get it to work to create to 10 bins for every week and it just does it on the entire data set.
My question, is it possible to use the below FLOOR function in a PROC SQL statement that would allow me to use by GROUPS. (also, yes, i can add a by week to the below, but it doesnt acutally group it by week)
data xx; set x nobs=numobs;
decile=floor(_n_*10/(numobs+1));
run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @cwebb85 and welcome to the SAS Support Communities!
@cwebb85 wrote:
I have tried using PROC RANK, however it does not create 10 equal bins every time by week because of the way that function uses Ties.
Would it be acceptable to use random numbers as tie breakers?
/* Add random numbers as tie breakers */
data tmp / view=tmp;
call streaminit('MT64',27182818);
set have;
group=points+rand('uniform'); /* assuming only integer values in variable POINTS */
run;
/* Create deciles */
proc rank data=tmp out=want groups=10;
by week;
var group;
run;
proc sort data=want;
by week points group;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@cwebb85 wrote:
Hello,
I am trying to create 10 equal bins/deciles on my dataset by points. My data set is aggregated by week, customer, and points.
I have tried using PROC RANK, however it does not create 10 equal bins every time by week because of the way that function uses Ties.
I am trying to use the FLOOR function in a data step as it forces 10 bins everytime, the issues I am having with this way, is that i cannot get it to work to create to 10 bins for every week and it just does it on the entire data set.
My question, is it possible to use the below FLOOR function in a PROC SQL statement that would allow me to use by GROUPS. (also, yes, i can add a by week to the below, but it doesnt acutally group it by week)
data xx; set x nobs=numobs; decile=floor(_n_*10/(numobs+1)); run;
I doubt SQL will work directly, as there really isn't an equivalent in SQL to the data step variable _N_. You could probably do this in a DATA step and I request that you provide sample data for us to work with, as working data step code (which you can type in yourself, or follow these instructions) and not in any other format.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the help, attached is a sample of the data i am working with. Actual file is much large with many weeks of data.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I refuse to download files from this (or any other) public forum. So repeating:
I request that you provide sample data for us to work with, as working data step code (which you can type in yourself, or follow these instructions) and not in any other format.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
"Actual file is much large with many weeks of data."
Hello!
And your worry with that would be what? I haven't quite caught the point ...
--FJa
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @cwebb85 and welcome to the SAS Support Communities!
@cwebb85 wrote:
I have tried using PROC RANK, however it does not create 10 equal bins every time by week because of the way that function uses Ties.
Would it be acceptable to use random numbers as tie breakers?
/* Add random numbers as tie breakers */
data tmp / view=tmp;
call streaminit('MT64',27182818);
set have;
group=points+rand('uniform'); /* assuming only integer values in variable POINTS */
run;
/* Create deciles */
proc rank data=tmp out=want groups=10;
by week;
var group;
run;
proc sort data=want;
by week points group;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you, I was just testing this and comparing results. I believe this will do the trick in the application i am using it for.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do want equal size groups or "deciles"? They are different as a decile would mean grouping on the values of some variable.
If you want groups, not actual deciles you might try Proc Surveyselect with the groups option.
data example; do week=1 to 10; recsperweek = rand('integer',10,100); do rec=1 to recsperweek; othervar= rand('uniform'); output; end; end; run; proc surveyselect data=example out=selected groups=10; strata week ; run; proc freq data=selected; tables week*groupid/norow nocol nocum nopercent; run;
The example data set is just to have something to work with. Note that you should be able to copy into your SAS editor and run this code. The Groupid id is added by Proc Survey select and the Proc Freq code shows how many records are assigned to each of the 10 bins. This is a random selection so where the not quite a multiple of 10 observations go is not set.
The Strata statement in Surveyselect does require the data to be sorted by that variable in the input data set.
To do something similar to your data step for anything resembling a "week" you would need to get a count of how many observations are in a week, then use a counter reset at the first value of each week. However, this would require each week to have an integer multiple of number of bins to have each week with equal size groups. So, does your data support that?
Run proc freq on whatever variable is defining week and see if value has a multiple of 10 in the count. I suspect not.