DATA Step, Macro, Functions and more

Control number of Output following Proc sort

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 104
Accepted Solution

Control number of Output following Proc sort

Hi there,

Can somebody guide me to control the number of output following proc sort. Number of input can be controlled by firstobs and obs in proc sort. I want to control the number of record in my output. 

data test;
input id weight;
datalines;
1 10
2 20
3 30
4 20
5 30
;
run;

proc sort data =test (firstobs=1 obs=4) out =test1; by weight; run;

Swain

Accepted Solutions
Solution
‎06-11-2016 06:46 AM
Respected Advisor
Posts: 4,920

Re: Control number of Output following Proc sort

[ Edited ]
Posted in reply to DeepakSwain

The only way I know to do this is with the undocumented function MONOTONIC. MONOTONIC() returns 1 on the first call and increases by 1 every time it is called. So, to keep the first two obs from proc sort (or any other proc) output:

 


proc sort data=test out=test1(where=(monotonic()<=2)); by weight; run;

 

 

PG

View solution in original post


All Replies
Super Contributor
Posts: 439

Re: Control number of Output following Proc sort

Posted in reply to DeepakSwain

Proc sort will have to sort the entire table before determining the first n rows. So a subsequent datastep to achive that would not be much overhead.

 

proc sort data =test out =test1; 
   by weight; 
run;

data test2;
   set test1 (firstobs=4 obs=7);
run;

 Alternatively, if you just want the top n rows:

 

proc sql outobs=10;
   create table test2
     as select * from test
     order by weight;
quit;

Hope this helps,

- Jan.

Frequent Contributor
Posts: 104

Re: Control number of Output following Proc sort

Posted in reply to jklaverstijn

Hi Jklaverstijn,

 

 

 

Swain
Super User
Posts: 5,426

Re: Control number of Output following Proc sort

Posted in reply to DeepakSwain

Why not SQL?

Why not use a succeeding data step?

Motivate!

Data never sleeps
Frequent Contributor
Posts: 104

Re: Control number of Output following Proc sort

Hi there,

Just desire to explore alternative and keep enthusiasm.

Regards,

Deepak

Swain
Super Contributor
Posts: 439

Re: Control number of Output following Proc sort

Posted in reply to DeepakSwain

Hi Deepak,

 

I'm afraid you cannot do that in proc sort. I checked and obs= and firstobs= just don't work. Surprisingly the options are silently ignored.

 

As I said, sort will have to create the entire table before s subset base on rownumbercan be determined. So it is not much of a  matter of efficiency to have a datastep as follow-up.

 

- Jan.

Super User
Posts: 11,343

Re: Control number of Output following Proc sort

Posted in reply to jklaverstijn

jklaverstijn wrote:

Hi Deepak,

 

I'm afraid you cannot do that in proc sort. I checked and obs= and firstobs= just don't work. Surprisingly the options are silently ignored.

 

 

- Jan.


 

Not at all "silent", from the documentation:

Specifies the last observation that SAS processes in a data set.

OBS= Data Set Option



Specifies the last observation that SAS processes in a data set.

Valid in: DATA step and PROC steps
Category: Observation Control
Default: MAX
Restriction: Use with input data sets only
Restriction:

Cannot use with PROC SQL views


Respected Advisor
Posts: 4,920

Re: Control number of Output following Proc sort

I would prefer to get something like

 

NOTE: Dataset option OBS= is ignored for output datasets.

PG
Frequent Contributor
Posts: 104

Re: Control number of Output following Proc sort

Posted in reply to jklaverstijn

Hi Jan,

Thanks for your kind reply. It is helpful to me.

Regards,

Deepak

Swain
Solution
‎06-11-2016 06:46 AM
Respected Advisor
Posts: 4,920

Re: Control number of Output following Proc sort

[ Edited ]
Posted in reply to DeepakSwain

The only way I know to do this is with the undocumented function MONOTONIC. MONOTONIC() returns 1 on the first call and increases by 1 every time it is called. So, to keep the first two obs from proc sort (or any other proc) output:

 


proc sort data=test out=test1(where=(monotonic()<=2)); by weight; run;

 

 

PG
Super Contributor
Posts: 439

Re: Control number of Output following Proc sort




proc sort data=test out=test1(where=(monotonic()<=2)); by weight; run;

 

 


What I like is the cleverness. What I like less is the obscure character of monotonic(). It is undocumented for a reason. Eg, I am uncertain if it works reliably in a threaded environment. So your mileage may vary.

 

- Jan.

Trusted Advisor
Posts: 1,117

Re: Control number of Output following Proc sort

Posted in reply to DeepakSwain

Maybe a WHERE= dataset option on the output dataset is even better than enforcing a fixed number of observations, because in case of ties the selection might be a bit arbitrary otherwise:

proc sort data=test out=test1(where=(weight>=30));
by weight; 
run;

Drawback: You have to know a reasonable threshold value.

 

Or check PROC UNIVARIATE with the NEXTROBS= option (I learned about this from one of Ksharp's posts):

ods select none;
ods output ExtremeObs=check;
proc univariate data=test nextrobs=2;
id id;
var weight;
run;
ods select all;

Another approach would be PROC SUMMARY with the IDGROUP option of the OUTPUT statement.

 

If you want to select the n largest/smallest values (as opposed to extreme observations), you can use PROC UNIVARIATE with the NEXTRVAL= option.

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 457 views
  • 9 likes
  • 6 in conversation