BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
NKormanik
Barite | Level 11

Data is sorted in descending order (second column), over 100,000 rows.

 

_X	_50501
_22001_1	1.51880
_23005_1	1.15927
_23403_1	1.12800
_23401_1	1.12679
_20104_1	1.09546
_20104_1	1.08488
_20204_0	1.06033
_21105_0	1.05820
_21506_0	1.05118
_21801a_0	1.04543
_20104_1	1.04470

 

I would like to use Proc Freq on just the first xxx rows.

 

How can I do that?  Create a new data set (subset)?  Use a particular IF or WHERE statement?

 

My objective is to get a 'tally' for the first column, but only the top so many.

 

My preference would be to use the following:

 

-- top 1%

-- top 2%

-- top 5%

-- top 10%

-- etc.

 

Is this somehow possible?

 

Help greatly appreciated.

 

Nicholas Kormanik

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
Make a macro I guess.







data have;
infile cards expandtabs truncover;;
input _X : $20. _50501;
cards;
_22001_1	1.51880
_23005_1	1.15927
_23403_1	1.12800
_23401_1	1.12679
_20104_1	1.09546
_20104_1	1.08488
_20204_0	1.06033
_21105_0	1.05820
_21506_0	1.05118
_21801a_0	1.04543
_20104_1	1.04470
;
run;


%let top=0.1 ;  /*<---- Change it */



%let dsid=%sysfunc(open(have));
%let nobs=%sysevalf(%sysfunc(attrn(&dsid,nlobs))*&top,i);
%let dsid=%sysfunc(close(&dsid));
proc freq data=have(obs=&nobs);
table _x;
run;






View solution in original post

9 REPLIES 9
Jim_G
Pyrite | Level 9

Use OBS=  in the set or infile statement;

 

data want;   set have (obs=xxx);

 

There is probably a way to get number of obs in the data set   _obs_  but I dont know how to get it into a sas variable to do a calculation on it.

Kurt_Bremser
Super User

To quickly get the number of total obs:

data _null_;
call symput('total_obs',put(numobs,best.));
set have nobs=numobs;
stop;
run;

You could already calculate your percentage(s) in the same step.

Then use the obs dataset option:

proc freq data=have (obs=&wanted_obs) ......
NKormanik
Barite | Level 11

Example, please?  All Greek to me....

 

Thanks!

 

Jim_G
Pyrite | Level 9

Expanding Kurt's code.

 

****************************************;
%let percent=20;
****************************************;

data _null_; set have nobs=numobs;
xxx=int(&percent*numobs/100);
call symput('topxxx',put(xxx,best.)); put xxx;
stop;

 

proc print data=have(obs=&topxxx); run;

stat_sas
Ammonite | Level 13

Hi,

 

You can also use score variable to create a rank variable and use that in proc freq with by processing to observe count within top 5%, 10% etc. 

Ksharp
Super User
Make a macro I guess.







data have;
infile cards expandtabs truncover;;
input _X : $20. _50501;
cards;
_22001_1	1.51880
_23005_1	1.15927
_23403_1	1.12800
_23401_1	1.12679
_20104_1	1.09546
_20104_1	1.08488
_20204_0	1.06033
_21105_0	1.05820
_21506_0	1.05118
_21801a_0	1.04543
_20104_1	1.04470
;
run;


%let top=0.1 ;  /*<---- Change it */



%let dsid=%sysfunc(open(have));
%let nobs=%sysevalf(%sysfunc(attrn(&dsid,nlobs))*&top,i);
%let dsid=%sysfunc(close(&dsid));
proc freq data=have(obs=&nobs);
table _x;
run;






NKormanik
Barite | Level 11

Xia Keshan, your code looks terrific.  Problem, though, SAS freezes.  Error message in Output title bar:

 

PROC FREQ suspended.

 

Never completes, for some reason.

 

Any ideas?

 

Ksharp
Super User

?? Look right for me .

 


 1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 51         
 52         
 53         
 54         data have;
 55         infile cards expandtabs truncover;;
 56         input _X : $20. _50501;
 57         cards;
 
 NOTE: 数据集 WORK.HAVE 有 11 个观测和 2 个变量。
 NOTE: “DATA 语句”所用时间(总处理时间):
       实际时间          0.00 秒
       CPU 时间          0.01 秒
       
 69         ;
 
 70         run;
 71         
 72         
 73         %let top=0.1 ;  /*<---- Change it */
 74         
 75         
 76         
 77         %let dsid=%sysfunc(open(have));
 78         %let nobs=%sysevalf(%sysfunc(attrn(&dsid,nlobs))*&top,i);
 79         %let dsid=%sysfunc(close(&dsid));
 80         proc freq data=have(obs=&nobs);
 81         table _x;
 82         run;
 
 NOTE: 从数据集 WORK.HAVE. 读取了 1 个观测
 NOTE: “PROCEDURE FREQ”所用时间(总处理时间):
       实际时间          0.06 秒
       CPU 时间          0.02 秒
       
 
 83         
 84         
 85         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 95         
NKormanik
Barite | Level 11

My bad.  Sorry Xia Keshan.  For some unknown reason my SAS was waiting.  Had to type END at command prompt to keep it going.

 

Thank you for rechecking your code.  And writing it.

 

Nicholas

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 2604 views
  • 10 likes
  • 5 in conversation