Help using Base SAS procedures

Use only the top xxx rows of the data set

Accepted Solution Solved
Reply
Regular Contributor
Posts: 223
Accepted Solution

Use only the top xxx rows of the data set

[ Edited ]

Data is sorted in descending order (second column), over 100,000 rows.

 

_X	_50501
_22001_1	1.51880
_23005_1	1.15927
_23403_1	1.12800
_23401_1	1.12679
_20104_1	1.09546
_20104_1	1.08488
_20204_0	1.06033
_21105_0	1.05820
_21506_0	1.05118
_21801a_0	1.04543
_20104_1	1.04470

 

I would like to use Proc Freq on just the first xxx rows.

 

How can I do that?  Create a new data set (subset)?  Use a particular IF or WHERE statement?

 

My objective is to get a 'tally' for the first column, but only the top so many.

 

My preference would be to use the following:

 

-- top 1%

-- top 2%

-- top 5%

-- top 10%

-- etc.

 

Is this somehow possible?

 

Help greatly appreciated.

 

Nicholas Kormanik

 


Accepted Solutions
Solution
‎07-04-2016 04:01 AM
Super User
Posts: 10,020

Re: Use only the top xxx rows of the data set

Posted in reply to NicholasKormanik
Make a macro I guess.







data have;
infile cards expandtabs truncover;;
input _X : $20. _50501;
cards;
_22001_1	1.51880
_23005_1	1.15927
_23403_1	1.12800
_23401_1	1.12679
_20104_1	1.09546
_20104_1	1.08488
_20204_0	1.06033
_21105_0	1.05820
_21506_0	1.05118
_21801a_0	1.04543
_20104_1	1.04470
;
run;


%let top=0.1 ;  /*<---- Change it */



%let dsid=%sysfunc(open(have));
%let nobs=%sysevalf(%sysfunc(attrn(&dsid,nlobs))*&top,i);
%let dsid=%sysfunc(close(&dsid));
proc freq data=have(obs=&nobs);
table _x;
run;






View solution in original post


All Replies
Frequent Contributor
Posts: 95

Re: Use only the top xxx rows of the data set

Posted in reply to NicholasKormanik

Use OBS=  in the set or infile statement;

 

data want;   set have (obs=xxx);

 

There is probably a way to get number of obs in the data set   _obs_  but I dont know how to get it into a sas variable to do a calculation on it.

Super User
Posts: 7,762

Re: Use only the top xxx rows of the data set

Posted in reply to NicholasKormanik

To quickly get the number of total obs:

data _null_;
call symput('total_obs',put(numobs,best.));
set have nobs=numobs;
stop;
run;

You could already calculate your percentage(s) in the same step.

Then use the obs dataset option:

proc freq data=have (obs=&wanted_obs) ......
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Regular Contributor
Posts: 223

Re: Use only the top xxx rows of the data set

Posted in reply to KurtBremser

Example, please?  All Greek to me....

 

Thanks!

 

Frequent Contributor
Posts: 95

Re: Use only the top xxx rows of the data set

[ Edited ]
Posted in reply to NicholasKormanik

Expanding Kurt's code.

 

****************************************;
%let percent=20;
****************************************;

data _null_; set have nobs=numobs;
xxx=int(&percent*numobs/100);
call symput('topxxx',put(xxx,best.)); put xxx;
stop;

 

proc print data=have(obs=&topxxx); run;

Trusted Advisor
Posts: 1,228

Re: Use only the top xxx rows of the data set

Posted in reply to NicholasKormanik

Hi,

 

You can also use score variable to create a rank variable and use that in proc freq with by processing to observe count within top 5%, 10% etc. 

Solution
‎07-04-2016 04:01 AM
Super User
Posts: 10,020

Re: Use only the top xxx rows of the data set

Posted in reply to NicholasKormanik
Make a macro I guess.







data have;
infile cards expandtabs truncover;;
input _X : $20. _50501;
cards;
_22001_1	1.51880
_23005_1	1.15927
_23403_1	1.12800
_23401_1	1.12679
_20104_1	1.09546
_20104_1	1.08488
_20204_0	1.06033
_21105_0	1.05820
_21506_0	1.05118
_21801a_0	1.04543
_20104_1	1.04470
;
run;


%let top=0.1 ;  /*<---- Change it */



%let dsid=%sysfunc(open(have));
%let nobs=%sysevalf(%sysfunc(attrn(&dsid,nlobs))*&top,i);
%let dsid=%sysfunc(close(&dsid));
proc freq data=have(obs=&nobs);
table _x;
run;






Regular Contributor
Posts: 223

Re: Use only the top xxx rows of the data set

Xia Keshan, your code looks terrific.  Problem, though, SAS freezes.  Error message in Output title bar:

 

PROC FREQ suspended.

 

Never completes, for some reason.

 

Any ideas?

 

Super User
Posts: 10,020

Re: Use only the top xxx rows of the data set

Posted in reply to NicholasKormanik

?? Look right for me .

 


 1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 51         
 52         
 53         
 54         data have;
 55         infile cards expandtabs truncover;;
 56         input _X : $20. _50501;
 57         cards;
 
 NOTE: 数据集 WORK.HAVE 有 11 个观测和 2 个变量。
 NOTE: “DATA 语句”所用时间(总处理时间):
       实际时间          0.00 秒
       CPU 时间          0.01 秒
       
 69         ;
 
 70         run;
 71         
 72         
 73         %let top=0.1 ;  /*<---- Change it */
 74         
 75         
 76         
 77         %let dsid=%sysfunc(open(have));
 78         %let nobs=%sysevalf(%sysfunc(attrn(&dsid,nlobs))*&top,i);
 79         %let dsid=%sysfunc(close(&dsid));
 80         proc freq data=have(obs=&nobs);
 81         table _x;
 82         run;
 
 NOTE: 从数据集 WORK.HAVE. 读取了 1 个观测
 NOTE: “PROCEDURE FREQ”所用时间(总处理时间):
       实际时间          0.06 秒
       CPU 时间          0.02 秒
       
 
 83         
 84         
 85         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 95         
Regular Contributor
Posts: 223

Re: Use only the top xxx rows of the data set

My bad.  Sorry Xia Keshan.  For some unknown reason my SAS was waiting.  Had to type END at command prompt to keep it going.

 

Thank you for rechecking your code.  And writing it.

 

Nicholas

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 827 views
  • 10 likes
  • 5 in conversation