Hi there,
Can somebody guide me to control the number of output following proc sort. Number of input can be controlled by firstobs and obs in proc sort. I want to control the number of record in my output.
data test;
input id weight;
datalines;
1 10
2 20
3 30
4 20
5 30
;
run;
proc sort data =test (firstobs=1 obs=4) out =test1; by weight; run;
The only way I know to do this is with the undocumented function MONOTONIC. MONOTONIC() returns 1 on the first call and increases by 1 every time it is called. So, to keep the first two obs from proc sort (or any other proc) output:
proc sort data=test out=test1(where=(monotonic()<=2)); by weight; run;
Proc sort will have to sort the entire table before determining the first n rows. So a subsequent datastep to achive that would not be much overhead.
proc sort data =test out =test1;
by weight;
run;
data test2;
set test1 (firstobs=4 obs=7);
run;
Alternatively, if you just want the top n rows:
proc sql outobs=10;
create table test2
as select * from test
order by weight;
quit;
Hope this helps,
- Jan.
Hi Jklaverstijn,
Thank you for your prompt reply. Regarding use of sql to control the number of output, I am already aware of. That is why, I have specifically mentioned about the proc sort step. I want to avoid additional data step following proc sort. In other word, I want to control my output within proc sort itself.
Thank you in advance for your kindly guidance to move forward.
Regards,
Deepak
Why not SQL?
Why not use a succeeding data step?
Motivate!
Hi there,
Just desire to explore alternative and keep enthusiasm.
Regards,
Deepak
Hi Deepak,
I'm afraid you cannot do that in proc sort. I checked and obs= and firstobs= just don't work. Surprisingly the options are silently ignored.
As I said, sort will have to create the entire table before s subset base on rownumbercan be determined. So it is not much of a matter of efficiency to have a datastep as follow-up.
- Jan.
@jklaverstijn wrote:
Hi Deepak,
I'm afraid you cannot do that in proc sort. I checked and obs= and firstobs= just don't work. Surprisingly the options are silently ignored.
- Jan.
Not at all "silent", from the documentation:
Specifies the last observation that SAS processes in a data set.
Specifies the last observation that SAS processes in a data set.
Valid in: | DATA step and PROC steps |
Category: | Observation Control |
Default: | MAX |
Restriction: | Use with input data sets only |
Restriction: |
Cannot use with PROC SQL views |
I would prefer to get something like
NOTE: Dataset option OBS= is ignored for output datasets.
Hi Jan,
Thanks for your kind reply. It is helpful to me.
Regards,
Deepak
The only way I know to do this is with the undocumented function MONOTONIC. MONOTONIC() returns 1 on the first call and increases by 1 every time it is called. So, to keep the first two obs from proc sort (or any other proc) output:
proc sort data=test out=test1(where=(monotonic()<=2)); by weight; run;
proc sort data=test out=test1(where=(monotonic()<=2)); by weight; run;
What I like is the cleverness. What I like less is the obscure character of monotonic(). It is undocumented for a reason. Eg, I am uncertain if it works reliably in a threaded environment. So your mileage may vary.
- Jan.
Maybe a WHERE= dataset option on the output dataset is even better than enforcing a fixed number of observations, because in case of ties the selection might be a bit arbitrary otherwise:
proc sort data=test out=test1(where=(weight>=30));
by weight;
run;
Drawback: You have to know a reasonable threshold value.
Or check PROC UNIVARIATE with the NEXTROBS= option (I learned about this from one of Ksharp's posts😞
ods select none;
ods output ExtremeObs=check;
proc univariate data=test nextrobs=2;
id id;
var weight;
run;
ods select all;
Another approach would be PROC SUMMARY with the IDGROUP option of the OUTPUT statement.
If you want to select the n largest/smallest values (as opposed to extreme observations), you can use PROC UNIVARIATE with the NEXTRVAL= option.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.