Barite | Level 11

## Use only the top xxx rows of the data set

Data is sorted in descending order (second column), over 100,000 rows.

``````_X	_50501
_22001_1	1.51880
_23005_1	1.15927
_23403_1	1.12800
_23401_1	1.12679
_20104_1	1.09546
_20104_1	1.08488
_20204_0	1.06033
_21105_0	1.05820
_21506_0	1.05118
_21801a_0	1.04543
_20104_1	1.04470
``````

I would like to use Proc Freq on just the first xxx rows.

How can I do that?  Create a new data set (subset)?  Use a particular IF or WHERE statement?

My objective is to get a 'tally' for the first column, but only the top so many.

My preference would be to use the following:

-- top 1%

-- top 2%

-- top 5%

-- top 10%

-- etc.

Is this somehow possible?

Help greatly appreciated.

Nicholas Kormanik

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: Use only the top xxx rows of the data set

```Make a macro I guess.

data have;
infile cards expandtabs truncover;;
input _X : \$20. _50501;
cards;
_22001_1	1.51880
_23005_1	1.15927
_23403_1	1.12800
_23401_1	1.12679
_20104_1	1.09546
_20104_1	1.08488
_20204_0	1.06033
_21105_0	1.05820
_21506_0	1.05118
_21801a_0	1.04543
_20104_1	1.04470
;
run;

%let top=0.1 ;  /*<---- Change it */

%let dsid=%sysfunc(open(have));
%let nobs=%sysevalf(%sysfunc(attrn(&dsid,nlobs))*&top,i);
%let dsid=%sysfunc(close(&dsid));
proc freq data=have(obs=&nobs);
table _x;
run;

```
9 REPLIES 9
Pyrite | Level 9

## Re: Use only the top xxx rows of the data set

Use OBS=  in the set or infile statement;

data want;   set have (obs=xxx);

There is probably a way to get number of obs in the data set   _obs_  but I dont know how to get it into a sas variable to do a calculation on it.

Super User

## Re: Use only the top xxx rows of the data set

To quickly get the number of total obs:

``````data _null_;
call symput('total_obs',put(numobs,best.));
set have nobs=numobs;
stop;
run;``````

Then use the obs dataset option:

``proc freq data=have (obs=&wanted_obs) ......``
Barite | Level 11

## Re: Use only the top xxx rows of the data set

Example, please?  All Greek to me....

Thanks!

Pyrite | Level 9

## Re: Use only the top xxx rows of the data set

Expanding Kurt's code.

****************************************;
%let percent=20;
****************************************;

data _null_; set have nobs=numobs;
xxx=int(&percent*numobs/100);
call symput('topxxx',put(xxx,best.)); put xxx;
stop;

proc print data=have(obs=&topxxx); run;

Ammonite | Level 13

## Re: Use only the top xxx rows of the data set

Hi,

You can also use score variable to create a rank variable and use that in proc freq with by processing to observe count within top 5%, 10% etc.

Super User

## Re: Use only the top xxx rows of the data set

```Make a macro I guess.

data have;
infile cards expandtabs truncover;;
input _X : \$20. _50501;
cards;
_22001_1	1.51880
_23005_1	1.15927
_23403_1	1.12800
_23401_1	1.12679
_20104_1	1.09546
_20104_1	1.08488
_20204_0	1.06033
_21105_0	1.05820
_21506_0	1.05118
_21801a_0	1.04543
_20104_1	1.04470
;
run;

%let top=0.1 ;  /*<---- Change it */

%let dsid=%sysfunc(open(have));
%let nobs=%sysevalf(%sysfunc(attrn(&dsid,nlobs))*&top,i);
%let dsid=%sysfunc(close(&dsid));
proc freq data=have(obs=&nobs);
table _x;
run;

```
Barite | Level 11

## Re: Use only the top xxx rows of the data set

Xia Keshan, your code looks terrific.  Problem, though, SAS freezes.  Error message in Output title bar:

PROC FREQ suspended.

Never completes, for some reason.

Any ideas?

Super User

## Re: Use only the top xxx rows of the data set

?? Look right for me .

``````
1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
51
52
53
54         data have;
55         infile cards expandtabs truncover;;
56         input _X : \$20. _50501;
57         cards;

NOTE: 数据集 WORK.HAVE 有 11 个观测和 2 个变量。
NOTE: “DATA 语句”所用时间（总处理时间）:
实际时间          0.00 秒
CPU 时间          0.01 秒

69         ;

70         run;
71
72
73         %let top=0.1 ;  /*<---- Change it */
74
75
76
77         %let dsid=%sysfunc(open(have));
78         %let nobs=%sysevalf(%sysfunc(attrn(&dsid,nlobs))*&top,i);
79         %let dsid=%sysfunc(close(&dsid));
80         proc freq data=have(obs=&nobs);
81         table _x;
82         run;

NOTE: 从数据集 WORK.HAVE. 读取了 1 个观测
NOTE: “PROCEDURE FREQ”所用时间（总处理时间）:
实际时间          0.06 秒
CPU 时间          0.02 秒

83
84
85         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
95         ``````
Barite | Level 11

## Re: Use only the top xxx rows of the data set

My bad.  Sorry Xia Keshan.  For some unknown reason my SAS was waiting.  Had to type END at command prompt to keep it going.

Thank you for rechecking your code.  And writing it.

Nicholas

Discussion stats
• 9 replies
• 2406 views
• 10 likes
• 5 in conversation