BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ybz12003
Rhodochrosite | Level 12

Hello:

I have the follow sample dataset.  I would like to get 20% observation of RR column and 10% observation of RR column.  Please advice how to do it.  Thanks.

 

data test;

infile datalines dsd;

input Group RR;

datalines;

372, 9.78,

840, 10.2,

320, 10.87,

341, 11.02,

341, 11.02

370, 11.18,

320, 11.61,

322, 11.63,

340, 11.85,

346, 11.97,

343, 12.47,

371, 13.51,

620, 13.89,

830, 16.98,

350, 26.8,

145, 29.9,

830, 32.53,

510, 36.05,

180, 40.64,

181, 46.82,

;

 

 My final data set for Top 20% would be:

372, 9.78,

840, 10.2,

320, 10.87,

341, 11.02,

 

My final data set for Bottom 10% would be:

180, 40.64,

181, 46.82,

 

My actual dataset is over 1K observations. Its observations varies based on the different conditions, and kind of hard to court. 

1 ACCEPTED SOLUTION

Accepted Solutions
s_lassen
Meteorite | Level 14

SET with POINT= is a good solution here:

Data top20 bottom10;
  do _N_=1 to ceil(nobs*.2);
    set have point=_N_ nobs=nobs;
    output top20;
    end;
  do _N_=ceil(nobs*.9) to nobs;
    set have point=_N_;
    output bottom10;
    end;
  stop;
run;

I used the CEIL for the first loop so as to get at least one observation, even if there are less than 5 observations in the input. But you may want to use ROUND instead. In the bottom you will always get at least one obs, unless you use something like ROUND((NOBS+1)*.9).

View solution in original post

14 REPLIES 14
ballardw
Super User

Please clarify: by "top 20%" do you mean the first 20% of the records as they appear in your data set or the 20% smallest values for the variable RR? If the first is your data sorted as you want it?

RW9
Diamond | Level 26 RW9
Diamond | Level 26

I assumed he wanted first and last 20% of obs.

ybz12003
Rhodochrosite | Level 12

As you could see the sample dataset, I did sort the data first by RR.  Therefore,  I would like to get the top 20% of smallest RR value.

ballardw
Super User

@ybz12003 wrote:

As you could see the sample dataset, I did sort the data first by RR.  Therefore,  I would like to get the top 20% of smallest RR value.


I asked for clarification because you showed a data set. It could very well just have happed that you entered data by order.

Most of the requests about "top" anything usually relate to largest though. So seeing "top" with the smallest values begged for some clarity.

 

A better example would have been to show your data input with data in an unordered fashion then show sort code.

Reeza
Super User

Use PROC RANK with groups = 10 to get the variable into deciles. Then select the top 20 (rank 8 &9) and the bottom 10 (Rank 0). 

Note how ties are handled which is something you should think about, most other solutions will not deal with ties.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Something like:

proc sql noprint;
  select int(NOBS * 0.2) into :TWENTY from DICTIONARY.TABLES where libname="SASHELP" and memname="CARS";
  select int(NOBS * 0.1) into :TEN from DICTIONARY.TABLES where libname="SASHELP" and memname="CARS";
  select NOBS into :TOT from DICTIONARY.TABLES where libname="SASHELP" and memname="CARS";  
quit;

%put 20=&twenty., 10=&ten., Tot=&tot.;

data want_twenty want_ten;
  set sashelp.cars;
  if _n_ < &twenty. or _n_ > (&tot. - &twenty.) then output want_twenty;
  if _n_ < &ten. or _n_ > (&tot. - &ten.) then output want_ten;
run;
 
ybz12003
Rhodochrosite | Level 12

Hi RW9:

 

I modified your code as list below.  However,  I got an error message from the log.  Any idea why?

 

 

proc sort data=ABC.notnosmisc; by RR; run;

proc sql;

select int(NOBS *0.2) into : TWENTY from ABC.notnosmisc;

quit;

ERROR: The following columns were not found in the contributing tables: NOBS.

 

 

PaigeMiller
Diamond | Level 26

@Reeza has provided the best answer, in my opinion. It's extremely simple, and it has the ability to handle ties properly, while the other methods do not. Don't try to create code that does this for yourself, when SAS has already done so.

--
Paige Miller
RW9
Diamond | Level 26 RW9
Diamond | Level 26

I would agree here, that @Reeza has the better answer, assuming that is what you want and only that.

To note, the reason you get an error is because you have replace dictionary.tables with the name of your dataset.  Dictionary.tables is the same as sashelp.vtable in that it holds metadata about the table.  It is from there that the observation count is being taken.  So you need to select from dictionary.tables where libname and memname are your libname and memname.

Astounding
PROC Star

Given that your data set is already sorted by RR, you could take the simple route:

 

data want;

set have nobs=_nobs_;

if _n_ / _nobs_ < 0.2 or _n_ / _nobs_ > 0.9;

run;

s_lassen
Meteorite | Level 14

SET with POINT= is a good solution here:

Data top20 bottom10;
  do _N_=1 to ceil(nobs*.2);
    set have point=_N_ nobs=nobs;
    output top20;
    end;
  do _N_=ceil(nobs*.9) to nobs;
    set have point=_N_;
    output bottom10;
    end;
  stop;
run;

I used the CEIL for the first loop so as to get at least one observation, even if there are less than 5 observations in the input. But you may want to use ROUND instead. In the bottom you will always get at least one obs, unless you use something like ROUND((NOBS+1)*.9).

soham_sas
Quartz | Level 8

Hi @ybz12003 .. it will be something like below , 

 

data test;
infile datalines dsd;
input Group RR;
obs=_n_;
datalines;
372, 9.78,
840, 10.2,
320, 10.87,
341, 11.02,
341, 11.02
370, 11.18,
320, 11.61,
322, 11.63,
340, 11.85,
346, 11.97,
343, 12.47,
371, 13.51,
620, 13.89,
830, 16.98,
350, 26.8,
145, 29.9,
830, 32.53,
510, 36.05,
180, 40.64,
181, 46.82,
;
run;

proc sort data= test out=test1;
by descending obs;
run;

data test2;
set test1;
oops1=_n_;
run;

proc sql;
create table first_20 as select group,RR from test
having obs/ Max(obs) <=0.2;
create table last_10 as select group,RR from test2
having oops1/ Max(oops1) <=0.1;
quit;

 

ybz12003
Rhodochrosite | Level 12

Thanks for everyone's great advice.   So far, I have the codes written by Astound and S_lassen are working in my practice.   But one observation is different between Astound and S_lassen's codes.   Though Astound is simpler, S_lassen is more comprehensive to cover the decimal issue.  

In the end, I truly thank everyone spending your precious time answer my question and helping me!

Reeza
Super User

Percentiles don't have a standard definition, kind of like a month, so you can have variability in the answers. 

 


@ybz12003 wrote:

Thanks for everyone's great advice.   So far, I have the codes written by Astound and S_lassen are working in my practice.   But one observation is different between Astound and S_lassen's codes.   Though Astound is simpler, S_lassen is more comprehensive to cover the decimal issue.  

In the end, I truly thank everyone spending your precious time answer my question and helping me!


 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 5979 views
  • 10 likes
  • 8 in conversation