If my table has 1Million records, I want to:
1) sort by largest to smallest by price
2) take the first 100K and call it 'group 1'.....then the next 100K and call it 'group 2' and so on
How can I do it?
proc sort would sort the data
but I am not sure how can I label the records based on the a group of x records.
As an example, I have attached a list of 30 records, and grouped them in 3 creating 10 groups(attached excel)
Let's say you want exactly 10 equal size groups, subject to
Program below is corrected, by entering an explicit OUTPUT statement, which prevents premature increment to the GROUP variable.
proc sort data=have out=need;
by descending price;
run;
data want;
set need nobs=n_need;
by descending price;
retain group 1;
output;
if last.price=1 and _n_ > group*(n_need/10) then group+1;
run;
If you have tied prices, do you have a preferred order among those ties?
Can a set of observations with tied prices be assigned to different (consecutive) groups? Or must tied prices all be in the same group?
Are you using a fixed group size (100,000), or are you aiming for 10 groups of equal size (or as near equal as tied prices might allow)?
Creating an counter is easy to do with the SUM statement.
Do read 100K per group just put the SET statement inside a do loop.
data want;
group+1;
do _n_=1 to 100000;
set have;
output;
end;
run;
But that is probably NOT what you want to do since it could very easily split some tied observations into separate groups.
Did you look at PROC RANK?
I would also vote for PROC RANK to do this.
What are you going to do with the 10 groups once you have them? Depending on your answer, we may be able to advise further.
PS: most of us will not download Excel files or other Microsoft Office files, as they can be security threats. In the future, if you want to show us something, show us what you want as text in your message (not as an attachment). If you want to show us a graphic, paste the graphic into your message (do not include it in an attachment).
data want;
set have;
if mod(_n_,10000)=1 then group+1;
run;
Let's say you want exactly 10 equal size groups, subject to
Program below is corrected, by entering an explicit OUTPUT statement, which prevents premature increment to the GROUP variable.
proc sort data=have out=need;
by descending price;
run;
data want;
set need nobs=n_need;
by descending price;
retain group 1;
output;
if last.price=1 and _n_ > group*(n_need/10) then group+1;
run;
HAVE | WANT | |||||
sub_no | price | label | sub_no | price | ||
abc | 10 | group 1 | az | 140 | ||
efg | 15 | group 1 | ay | 136 | ||
hij | 88 | group 1 | ax | 132 | ||
lmn | 20 | group2 | af | 128 | ||
opq | 26 | group2 | ae | 124 | ||
abc | 113 | group2 | ad | 121 | ||
ac | 117 | group 3 | ac | 117 | ||
ad | 121 | group 3 | abc | 113 | ||
ae | 124 | group 3 | ar | 109 | ||
af | 128 | group 4 | aq | 105 | ||
ag | 67 | group 4 | ap | 102 | ||
ah | 71 | group 4 | av | 100 | ||
ai | 75 | group 5 | ao | 98 | ||
aj | 79 | group 5 | an | 94 | ||
ak | 82 | group 5 | am | 90 | ||
al | 86 | group 6 | hij | 88 | ||
am | 90 | group 6 | al | 86 | ||
an | 94 | group 6 | ak | 82 | ||
ao | 98 | group 7 | aj | 79 | ||
ap | 102 | group 7 | at | 78 | ||
aq | 105 | group 7 | ai | 75 | ||
ar | 109 | group 8 | ah | 71 | ||
as | 59 | group 8 | ag | 67 | ||
at | 78 | group 8 | as | 59 | ||
au | 59 | group 9 | au | 59 | ||
av | 100 | group 9 | opq | 26 | ||
aw | 7 | group 9 | lmn | 20 | ||
ax | 132 | group 10 | efg | 15 | ||
ay | 136 | group 10 | abc | 10 | ||
az | 140 | group 10 | aw | 7 | ||
Ideally I want group 1 (example) to have no overlap with the price of group 2. but if there is an overlap in price, then in that case I would want to have that price grouped under group 1 ....so essentially the price is tied to the group.
Please answer my earlier question: "What are you going to do with the 10 groups once you have them? Depending on your answer, we may be able to advise further."
A suggestion: present data as text, not as a copy/paste from Excel or elsewhere. Ideally present data as working SAS data step code (examples and instructions). Your table has a lot of white space, and is too long vertically for us to see all of it at once without scrolling up and down, so it is very hard to grasp any understanding from your table of data.
The solution provided worked 🙂 thanks @mkeintz
I confirmed that I really just need to sort by price and then divide the volume of rows by 10 🙂 so below is what worked (modified @mkeintz 's query)
and it would be my final output.
proc sort data=have out=need;
by descending price;
run;
data want;
set need nobs=n_need;
by descending price;
retain group 1;
output;
if _n_ > group*(n_need/10) then group+1;
run;
Thanks everyone for helping out 🙂
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.