Data is sorted in descending order (second column), over 100,000 rows.
_X _50501
_22001_1 1.51880
_23005_1 1.15927
_23403_1 1.12800
_23401_1 1.12679
_20104_1 1.09546
_20104_1 1.08488
_20204_0 1.06033
_21105_0 1.05820
_21506_0 1.05118
_21801a_0 1.04543
_20104_1 1.04470
I would like to use Proc Freq on just the first xxx rows.
How can I do that? Create a new data set (subset)? Use a particular IF or WHERE statement?
My objective is to get a 'tally' for the first column, but only the top so many.
My preference would be to use the following:
-- top 1%
-- top 2%
-- top 5%
-- top 10%
-- etc.
Is this somehow possible?
Help greatly appreciated.
Nicholas Kormanik
Make a macro I guess. data have; infile cards expandtabs truncover;; input _X : $20. _50501; cards; _22001_1 1.51880 _23005_1 1.15927 _23403_1 1.12800 _23401_1 1.12679 _20104_1 1.09546 _20104_1 1.08488 _20204_0 1.06033 _21105_0 1.05820 _21506_0 1.05118 _21801a_0 1.04543 _20104_1 1.04470 ; run; %let top=0.1 ; /*<---- Change it */ %let dsid=%sysfunc(open(have)); %let nobs=%sysevalf(%sysfunc(attrn(&dsid,nlobs))*&top,i); %let dsid=%sysfunc(close(&dsid)); proc freq data=have(obs=&nobs); table _x; run;
Use OBS= in the set or infile statement;
data want; set have (obs=xxx);
There is probably a way to get number of obs in the data set _obs_ but I dont know how to get it into a sas variable to do a calculation on it.
To quickly get the number of total obs:
data _null_;
call symput('total_obs',put(numobs,best.));
set have nobs=numobs;
stop;
run;
You could already calculate your percentage(s) in the same step.
Then use the obs dataset option:
proc freq data=have (obs=&wanted_obs) ......
Example, please? All Greek to me....
Thanks!
Expanding Kurt's code.
****************************************;
%let percent=20;
****************************************;
data _null_; set have nobs=numobs;
xxx=int(&percent*numobs/100);
call symput('topxxx',put(xxx,best.)); put xxx;
stop;
proc print data=have(obs=&topxxx); run;
Hi,
You can also use score variable to create a rank variable and use that in proc freq with by processing to observe count within top 5%, 10% etc.
Make a macro I guess. data have; infile cards expandtabs truncover;; input _X : $20. _50501; cards; _22001_1 1.51880 _23005_1 1.15927 _23403_1 1.12800 _23401_1 1.12679 _20104_1 1.09546 _20104_1 1.08488 _20204_0 1.06033 _21105_0 1.05820 _21506_0 1.05118 _21801a_0 1.04543 _20104_1 1.04470 ; run; %let top=0.1 ; /*<---- Change it */ %let dsid=%sysfunc(open(have)); %let nobs=%sysevalf(%sysfunc(attrn(&dsid,nlobs))*&top,i); %let dsid=%sysfunc(close(&dsid)); proc freq data=have(obs=&nobs); table _x; run;
Xia Keshan, your code looks terrific. Problem, though, SAS freezes. Error message in Output title bar:
PROC FREQ suspended.
Never completes, for some reason.
Any ideas?
?? Look right for me .
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
51
52
53
54 data have;
55 infile cards expandtabs truncover;;
56 input _X : $20. _50501;
57 cards;
NOTE: 数据集 WORK.HAVE 有 11 个观测和 2 个变量。
NOTE: “DATA 语句”所用时间(总处理时间):
实际时间 0.00 秒
CPU 时间 0.01 秒
69 ;
70 run;
71
72
73 %let top=0.1 ; /*<---- Change it */
74
75
76
77 %let dsid=%sysfunc(open(have));
78 %let nobs=%sysevalf(%sysfunc(attrn(&dsid,nlobs))*&top,i);
79 %let dsid=%sysfunc(close(&dsid));
80 proc freq data=have(obs=&nobs);
81 table _x;
82 run;
NOTE: 从数据集 WORK.HAVE. 读取了 1 个观测
NOTE: “PROCEDURE FREQ”所用时间(总处理时间):
实际时间 0.06 秒
CPU 时间 0.02 秒
83
84
85 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
95
My bad. Sorry Xia Keshan. For some unknown reason my SAS was waiting. Had to type END at command prompt to keep it going.
Thank you for rechecking your code. And writing it.
Nicholas
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.