Hi there,
Can somebody guide me to control the number of output following proc sort. Number of input can be controlled by firstobs and obs in proc sort. I want to control the number of record in my output.
data test;
input id weight;
datalines;
1 10
2 20
3 30
4 20
5 30
;
run;
proc sort data =test (firstobs=1 obs=4) out =test1; by weight; run;
The only way I know to do this is with the undocumented function MONOTONIC. MONOTONIC() returns 1 on the first call and increases by 1 every time it is called. So, to keep the first two obs from proc sort (or any other proc) output:
proc sort data=test out=test1(where=(monotonic()<=2)); by weight; run;
Proc sort will have to sort the entire table before determining the first n rows. So a subsequent datastep to achive that would not be much overhead.
proc sort data =test out =test1;
by weight;
run;
data test2;
set test1 (firstobs=4 obs=7);
run;
Alternatively, if you just want the top n rows:
proc sql outobs=10;
create table test2
as select * from test
order by weight;
quit;
Hope this helps,
- Jan.
Hi Jklaverstijn,
Thank you for your prompt reply. Regarding use of sql to control the number of output, I am already aware of. That is why, I have specifically mentioned about the proc sort step. I want to avoid additional data step following proc sort. In other word, I want to control my output within proc sort itself.
Thank you in advance for your kindly guidance to move forward.
Regards,
Deepak
Why not SQL?
Why not use a succeeding data step?
Motivate!
Hi there,
Just desire to explore alternative and keep enthusiasm.
Regards,
Deepak
Hi Deepak,
I'm afraid you cannot do that in proc sort. I checked and obs= and firstobs= just don't work. Surprisingly the options are silently ignored.
As I said, sort will have to create the entire table before s subset base on rownumbercan be determined. So it is not much of a matter of efficiency to have a datastep as follow-up.
- Jan.
@jklaverstijn wrote:
Hi Deepak,
I'm afraid you cannot do that in proc sort. I checked and obs= and firstobs= just don't work. Surprisingly the options are silently ignored.
- Jan.
Not at all "silent", from the documentation:
Specifies the last observation that SAS processes in a data set.
Specifies the last observation that SAS processes in a data set.
Valid in: | DATA step and PROC steps |
Category: | Observation Control |
Default: | MAX |
Restriction: | Use with input data sets only |
Restriction: |
Cannot use with PROC SQL views |
I would prefer to get something like
NOTE: Dataset option OBS= is ignored for output datasets.
Hi Jan,
Thanks for your kind reply. It is helpful to me.
Regards,
Deepak
The only way I know to do this is with the undocumented function MONOTONIC. MONOTONIC() returns 1 on the first call and increases by 1 every time it is called. So, to keep the first two obs from proc sort (or any other proc) output:
proc sort data=test out=test1(where=(monotonic()<=2)); by weight; run;
proc sort data=test out=test1(where=(monotonic()<=2)); by weight; run;
What I like is the cleverness. What I like less is the obscure character of monotonic(). It is undocumented for a reason. Eg, I am uncertain if it works reliably in a threaded environment. So your mileage may vary.
- Jan.
Maybe a WHERE= dataset option on the output dataset is even better than enforcing a fixed number of observations, because in case of ties the selection might be a bit arbitrary otherwise:
proc sort data=test out=test1(where=(weight>=30));
by weight;
run;
Drawback: You have to know a reasonable threshold value.
Or check PROC UNIVARIATE with the NEXTROBS= option (I learned about this from one of Ksharp's posts😞
ods select none;
ods output ExtremeObs=check;
proc univariate data=test nextrobs=2;
id id;
var weight;
run;
ods select all;
Another approach would be PROC SUMMARY with the IDGROUP option of the OUTPUT statement.
If you want to select the n largest/smallest values (as opposed to extreme observations), you can use PROC UNIVARIATE with the NEXTRVAL= option.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.