BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.

If my table has 1Million records, I want to:

1) sort by largest to smallest by price

2) take the first 100K and call it 'group 1'.....then the next 100K and call it 'group 2' and so on 

 

How can I do it?

proc sort would sort the data

but I am not sure how can I label the records based on the a group of x records.

As an example, I have attached a list of 30 records, and grouped them in 3 creating 10 groups(attached excel)

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Let's say you want exactly 10 equal size groups, subject to

  1. you don't know exactly how many obs are in the dataset
  2. you may have tied price values
  3. you want to keep tied prices in the same group:

Program below is corrected, by entering an explicit OUTPUT statement, which prevents premature increment to the GROUP variable.

 

proc sort data=have out=need;
  by descending price;
run;
data want;
  set need nobs=n_need;
  by descending price;
  retain group 1;
  output;
  if last.price=1 and _n_ > group*(n_need/10) then group+1;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

9 REPLIES 9
mkeintz
PROC Star

If you have tied prices, do you have a preferred order among those ties?

 

Can a set of observations with tied prices be assigned to different (consecutive) groups?  Or must tied prices all be in the same group?

 

Are you using a fixed group size (100,000), or are you aiming for 10 groups of equal size (or as near equal as tied prices might allow)?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

Creating an counter is easy to do with the SUM statement.

Do read 100K per group just put the SET statement inside a do loop.

data want;
  group+1;
  do _n_=1 to 100000;
    set have;
    output;
  end;
run;

But that is probably NOT what you want to do since it could very easily split some tied observations into separate groups.

 

Did you look at PROC RANK?

PaigeMiller
Diamond | Level 26

@new_sas_user 

I would also vote for PROC RANK to do this.

 

What are you going to do with the 10 groups once you have them? Depending on your answer, we may be able to advise further.

 

PS: most of us will not download Excel files or other Microsoft Office files, as they can be security threats. In the future, if you want to show us something, show us what you want as text in your message (not as an attachment). If you want to show us a graphic, paste the graphic into your message (do not include it in an attachment).

--
Paige Miller
Ksharp
Super User
data want;
 set have;
if mod(_n_,10000)=1 then group+1;
run;
mkeintz
PROC Star

Let's say you want exactly 10 equal size groups, subject to

  1. you don't know exactly how many obs are in the dataset
  2. you may have tied price values
  3. you want to keep tied prices in the same group:

Program below is corrected, by entering an explicit OUTPUT statement, which prevents premature increment to the GROUP variable.

 

proc sort data=have out=need;
  by descending price;
run;
data want;
  set need nobs=n_need;
  by descending price;
  retain group 1;
  output;
  if last.price=1 and _n_ > group*(n_need/10) then group+1;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
new_sas_user_4
Quartz | Level 8
HAVE       WANT    
sub_no price     label sub_no price
abc               10     group 1 az            140
efg               15     group 1 ay            136
hij               88     group 1 ax            132
lmn               20     group2 af            128
opq               26     group2 ae            124
abc            113     group2 ad            121
ac            117     group 3 ac            117
ad            121     group 3 abc            113
ae            124     group 3 ar            109
af            128     group 4 aq            105
ag               67     group 4 ap            102
ah               71     group 4 av            100
ai               75     group 5 ao               98
aj               79     group 5 an               94
ak               82     group 5 am               90
al               86     group 6 hij               88
am               90     group 6 al               86
an               94     group 6 ak               82
ao               98     group 7 aj               79
ap            102     group 7 at               78
aq            105     group 7 ai               75
ar            109     group 8 ah               71
as               59     group 8 ag               67
at               78     group 8 as               59
au               59     group 9 au               59
av            100     group 9 opq               26
aw                 7     group 9 lmn               20
ax            132     group 10 efg               15
ay            136     group 10 abc               10
az            140     group 10 aw                 7
             

 

Ideally I want group 1 (example) to have no overlap with the price of group 2. but if there is an overlap in price, then in that case I would want to have that price grouped under group 1 ....so essentially the price is tied to the group.

Ksharp
Super User
Then try Paige 's suggestion PROC RANK + " ties=dense " option.
PaigeMiller
Diamond | Level 26

@new_sas_user_4

Please answer my earlier question: "What are you going to do with the 10 groups once you have them? Depending on your answer, we may be able to advise further."

 

A suggestion: present data as text, not as a copy/paste from Excel or elsewhere. Ideally present data as working SAS data step code (examples and instructions). Your table has a lot of white space, and is too long vertically for us to see all of it at once without scrolling up and down, so it is very hard to grasp any understanding from your table of data.

--
Paige Miller
new_sas_user_4
Quartz | Level 8

The solution provided worked 🙂 thanks @mkeintz 

 

I confirmed that I really just need to sort by price and then divide the volume of rows by 10 🙂 so below is what worked (modified @mkeintz 's query)

and it would be my final output.

 

proc sort data=have out=need;
by descending price;
run;
data want;
set need nobs=n_need;
by descending price;
retain group 1;
output;
if _n_ > group*(n_need/10) then group+1;
run;

 

Thanks everyone for helping out 🙂

sas-innovate-white.png

Missed SAS Innovate in Orlando?

Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.

 

Register now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 843 views
  • 1 like
  • 5 in conversation