- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi there,
Can somebody guide me to control the number of output following proc sort. Number of input can be controlled by firstobs and obs in proc sort. I want to control the number of record in my output.
data test;
input id weight;
datalines;
1 10
2 20
3 30
4 20
5 30
;
run;
proc sort data =test (firstobs=1 obs=4) out =test1; by weight; run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The only way I know to do this is with the undocumented function MONOTONIC. MONOTONIC() returns 1 on the first call and increases by 1 every time it is called. So, to keep the first two obs from proc sort (or any other proc) output:
proc sort data=test out=test1(where=(monotonic()<=2)); by weight; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Proc sort will have to sort the entire table before determining the first n rows. So a subsequent datastep to achive that would not be much overhead.
proc sort data =test out =test1;
by weight;
run;
data test2;
set test1 (firstobs=4 obs=7);
run;
Alternatively, if you just want the top n rows:
proc sql outobs=10;
create table test2
as select * from test
order by weight;
quit;
Hope this helps,
- Jan.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jklaverstijn,
Thank you for your prompt reply. Regarding use of sql to control the number of output, I am already aware of. That is why, I have specifically mentioned about the proc sort step. I want to avoid additional data step following proc sort. In other word, I want to control my output within proc sort itself.
Thank you in advance for your kindly guidance to move forward.
Regards,
Deepak
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Why not SQL?
Why not use a succeeding data step?
Motivate!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi there,
Just desire to explore alternative and keep enthusiasm.
Regards,
Deepak
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Deepak,
I'm afraid you cannot do that in proc sort. I checked and obs= and firstobs= just don't work. Surprisingly the options are silently ignored.
As I said, sort will have to create the entire table before s subset base on rownumbercan be determined. So it is not much of a matter of efficiency to have a datastep as follow-up.
- Jan.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@jklaverstijn wrote:
Hi Deepak,
I'm afraid you cannot do that in proc sort. I checked and obs= and firstobs= just don't work. Surprisingly the options are silently ignored.
- Jan.
Not at all "silent", from the documentation:
Specifies the last observation that SAS processes in a data set.
OBS= Data Set Option
Specifies the last observation that SAS processes in a data set.
Valid in: | DATA step and PROC steps |
Category: | Observation Control |
Default: | MAX |
Restriction: | Use with input data sets only |
Restriction: |
Cannot use with PROC SQL views |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I would prefer to get something like
NOTE: Dataset option OBS= is ignored for output datasets.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jan,
Thanks for your kind reply. It is helpful to me.
Regards,
Deepak
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The only way I know to do this is with the undocumented function MONOTONIC. MONOTONIC() returns 1 on the first call and increases by 1 every time it is called. So, to keep the first two obs from proc sort (or any other proc) output:
proc sort data=test out=test1(where=(monotonic()<=2)); by weight; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
proc sort data=test out=test1(where=(monotonic()<=2)); by weight; run;
What I like is the cleverness. What I like less is the obscure character of monotonic(). It is undocumented for a reason. Eg, I am uncertain if it works reliably in a threaded environment. So your mileage may vary.
- Jan.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Maybe a WHERE= dataset option on the output dataset is even better than enforcing a fixed number of observations, because in case of ties the selection might be a bit arbitrary otherwise:
proc sort data=test out=test1(where=(weight>=30));
by weight;
run;
Drawback: You have to know a reasonable threshold value.
Or check PROC UNIVARIATE with the NEXTROBS= option (I learned about this from one of Ksharp's posts😞
ods select none;
ods output ExtremeObs=check;
proc univariate data=test nextrobs=2;
id id;
var weight;
run;
ods select all;
Another approach would be PROC SUMMARY with the IDGROUP option of the OUTPUT statement.
If you want to select the n largest/smallest values (as opposed to extreme observations), you can use PROC UNIVARIATE with the NEXTRVAL= option.