BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jdmarino
Fluorite | Level 6

I don't know how to pull out the top N rows per by-group.  I'm a long-time SQL querier, and this kind of thing, where N is specified, is not something SQL is good at.  For example, I'd like to pull out the rows of the top 2 returns per sector.  This needs to work on a large dataset (10M+ rows) where I need the top 100 values in a handful of categories.  All good ideas are welcome.

data Stocks;

    input Ticker :$3. Sector :$4. Return;

    datalines;

ABC Tech 0.10

DEF Tech 0.30

GHI Tech 0.01

JKL Tech 0.12

YZA Food 0.01

VWX Food 0.09

STU Food 0.08

PQR Food 0.05

MNO Food 0.02

BCD Booz 0.10

EFG Booz 0.11

GMC Cars 0.01

;

/*

desired output: a dataset with the rows which are the top 2 returns in each sector even if there are fewer than N rows in the sector

DEF Tech 0.30

JKL Tech 0.12

VWX Food 0.09

STU Food 0.08

BCD Booz 0.10

EFG Booz 0.11

GMC Cars 0.01

*/

1 ACCEPTED SOLUTION

Accepted Solutions
Jagadishkatam
Amethyst | Level 16

Since from your post i see you are interested in proc steps, i used PROC RANK  and it is generating the desired output.

proc sort data=    Stocks;

    by sector;

run;

proc rank data=Stocks out=r_test descending;

by Sector;

var Return;

ranks r_Return;

run;

proc sort data=r_test;

    by sector r_Return ;

    where r_Return <=2;

run;

Please try the code. Hope you like it Smiley Happy

Thanks,

Jag

Thanks,
Jag

View solution in original post

13 REPLIES 13
data_null__
Jade | Level 19

What is the rule for ties?

Is your 10+ million rows sorted or indexed.  Maybe on Sector?

jdmarino
Fluorite | Level 6

The rule for ties: do what is easiest/most obvious.  If that means including all rows that are tied so that the total count per group > N, that's OK.

The data can be sorted or indexed.

FYI, I'm pretty sure I can do this in a data-step, but that seems a little "grow you own".  I'm guessing there is a proc or feature I've never heard of that will do it for me.  Hence this post.

data_null__
Jade | Level 19

PROC UNIVARIATE has two options one or both should meet your needs.

NEXTROBS=

NEXTRVAL=

NEXT=Number of extream

You will ID statement too.

Message was edited by: data _null_

jdmarino
Fluorite | Level 6

I can't find those options mentioned for proc univariate.  I have SAS 9.3; could those be new options?

data_null__
Jade | Level 19

They are PROC statement options.

jdmarino
Fluorite | Level 6

http://support.sas.com/cdlsearch?charset=iso-8859-1&nh=25&ct=80000&qt=nextobs++spr%3A%229.3qq_%22&sa...

I searched for "nextobs" and found nothing.  But that's ok: Jag's solution seems right.

data_null__
Jade | Level 19

typo it is NEXTROBS.  Guess you didn't think to search for PROC UNIVARIATE.

Jagadishkatam
Amethyst | Level 16

Please try the below code by proc sql

proc sql;

  create table want as select distinct a.ticker, a.sector, a.return from stocks as a left join stocks as b on (a.sector=b.sector and a.return le b.return)

  group by a.sector,a.return having count(*) <=2 order by a.sector, a.return desc;

quit;

Thanks,

Jag

Thanks,
Jag
jdmarino
Fluorite | Level 6

You know, I have used this technique before, and it does work on my sample data.  I'll try it on my big dataset, but the intermediate data may become too large for my machine.

Update: after an hour, it's still running!  No wonder I put that technique out of my brain. 

Message was edited by: John Marino

Jagadishkatam
Amethyst | Level 16

Since from your post i see you are interested in proc steps, i used PROC RANK  and it is generating the desired output.

proc sort data=    Stocks;

    by sector;

run;

proc rank data=Stocks out=r_test descending;

by Sector;

var Return;

ranks r_Return;

run;

proc sort data=r_test;

    by sector r_Return ;

    where r_Return <=2;

run;

Please try the code. Hope you like it Smiley Happy

Thanks,

Jag

Thanks,
Jag
jdmarino
Fluorite | Level 6

This is the right way to do it.  Took 18 seconds on 500k rows.  I don't know why this did not occur to me; I use proc rank pretty often.  Viva la Community!

data_null__
Jade | Level 19

Can you move that WHERE statement to the PROC RANK step as a data set option on the OUT= dataset?

Jagadishkatam
Amethyst | Level 16

thank you data_null_, we can also use where option in the same proc rank as below

proc sort data=    Stocks;

    by sector;

run;

proc rank data=Stocks out=r_test(where=(r_Return <=2)) descending;

by Sector;

var Return;

ranks r_Return;

run;

proc sort data=r_test;

    by sector r_Return ;

run;

Thanks,

Jag

Thanks,
Jag

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 49781 views
  • 9 likes
  • 3 in conversation