BookmarkSubscribeRSS Feed
Ronein
Meteorite | Level 14

Hello

Let's say that I have a data set with 3 columns: rank,min,max(Note- this table was created by proc rank)

I wanted to calculated percentiles 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25

But in the output i didn't get all the percentiles I have asked.

My question;

What is the way to add rows of percentiles that doesn't exist and  they will get values of columns Min,MAX from the upper row.

For example:

There is no row for P6  so we will add row with P6 and it  will get value of min and max from row P5  

There is no row fro P19 till P24 so we will add these rows and min,max values will come from row P18 

 

What is the way to add the missing rows?

 

 


Data have;
infile datalines dlm=',';
input rank min max;
cards;
0, 0,800
1,1000,1901
2,2000,2425
3,2500,3000
4,3001,4000
5,4015,4935
7,5000,5000
9,5001,6000
10,6001,6480
13,6500,6500
16,6501,6934
17,7000,7200
18,7244,7400
25,7500,7500
;
Run;



Data want;
infile datalines dlm=',';
input rank min max;
cards;
0, 0,800
1,1000,1901
2,2000,2425
3,2500,3000
4,3001,4000
5,4015,4935
6,4015,4935
7,5000,5000
8,5000,5000
9,5001,6000
10,6001,6480
11,6001,6480
12,6001,6480
13,6500,6500
14,6500,6500
15,6500,6500
16,6501,6934
17,7000,7200
18,7244,7400
19,7244,7400
20,7244,7400
21,7244,7400
22,7244,7400
23,7244,7400
24,7244,7400
25,7500,7500
;
Run;  

 

9 REPLIES 9
sbxkoenk
SAS Super FREQ

Not sure why you haven't got all percentiles you have asked for ... but here's a way that is waterproof.

proc univariate data=sashelp.cars noprint;
var mpg_city mpg_highway;
output out=UniWidePctls 
       pctlpre=CityP_ HwyP_ 
       pctlpts=1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25; 
run; 

proc print data=UniWidePctls noobs; run;
/* end of program */

See :
Output percentiles of multiple variables in a tabular format
By Rick Wicklin on The DO Loop October 23, 2013
https://blogs.sas.com/content/iml/2013/10/23/percentiles-in-a-tabular-format.html

for even more ways to achieve the same.

 

BR, Koen

Ronein
Meteorite | Level 14
Thank you,
However the question was how to add the missing rows and get values from upper row
sbxkoenk
SAS Super FREQ

Hello ,

 

  1. Merge (join) with a dataset that contains all the rows (percentiles) you want. This way you will have the gaps to be filled.
  2. Then look for the numerous topics in these communities that discuss "Last Value Carry Forward" (LVCF), "Last Value Carried Forward" (LVCF), "Last Observation Carried Forward" (LOCF) and "Forward Filling" ... these are all synonyms for the same operation. I always use PROX EXPAND (SAS/ETS) with STEP interpolation for this forward filling if vars are numeric ... and don't forget the extrapolate option.

Koen

PaigeMiller
Diamond | Level 26

@Ronein wrote:
Thank you,
However the question was how to add the missing rows and get values from upper row

So let's see, you have recently posted three different threads on subjects related to coding percentiles. And it feels as if no progress has been made. How about you take a different approach? How about, instead of focusing on how to write specific pieces of code, you go ahead and describe the big problem you are trying to solve.

 

For example:

 

I want to do an analysis/table/chart/graph/report (pick the appropriate one) that performs the task of ____________________ leading to the desired end analysis/table/chart/graph/report. 

 

Note how coding is not yet mentioned. Note how this is a more wholistic description of your problem. If you had provided this (and ideally sample data), instead of getting bogged down in certain specific coding issues (which is where we are now), the people here could go ahead and think up a good way to get to the desired results, and show you example code that gets you there (and that works). But we don't really know where you want to end up, and so we can't get you there.

 

Please read up on the XY Problem and X-Y problem. Please approach asking questions in a more wholistic way, and less in a way asking about specific pieces of code. (Or at the very least, describe the big picture in a wholistic way and show us the code you are having trouble with)

--
Paige Miller
mkeintz
PROC Star

@PaigeMiller wrote:


So let's see, you have recently posted three different threads on subjects related to coding percentiles. And it feels as if no progress has been made. How about you take a different approach? How about, instead of focusing on how to write specific pieces of code, you go ahead and describe the big problem you are trying to solve.

and

If you had provided this (and ideally sample data), instead of getting bogged down in certain specific coding issues (which is where we are now), the people here could go ahead and think up a good way to get to the desired results, and show you example code that gets you there (and that works). But we don't really know where you want to end up, and so we can't get you there.

In short @RoneinHelp us help you.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

There is probably an easier solution for creating the percentile dataset that does not require this post processing.  But if we just look at the task you asked for help with:

 

If you just want to generate all integer ranks between the two values you could use some type of simple look ahead to tell if the next RANK is the current RANK+1 or not.  So something like this:

data have;
  input rank min max;
cards;
 0    0  800
 1 1000 1901
 2 2000 2425
 3 2500 3000
 4 3001 4000
 5 4015 4935
 7 5000 5000
 9 5001 6000
10 6001 6480
13 6500 6500
16 6501 6934
17 7000 7200
18 7244 7400
25 7500 7500
;

data want;
  set have end=eof;
  if not eof then set have(firstobs=2 keep=rank rename=(rank=next_rank));
  output;
  do rank=rank+1 to next_rank-1;
    output;
  end;
  drop next_rank;
run;
Kurt_Bremser
Super User
data want;
merge
  have
  have (firstobs=2 keep=rank rename=(rank=_rank))
;
output;
do rank = rank + 1 to _rank - 1;
  output;
end;
drop _rank;
run;

Untested, posted from my tablet.

ballardw
Super User

I suggest that you go back to your proc rank step, share the code you ran and the input data set.

What are the "min" "max" supposed to represent? Perhaps you need some additional options on the Proc Rank code.

How did you handle ties for example?

 

"missing" percentiles I would generally expect to occur if the  input data set did not have at least 100 non-missing non-tied values.

If you ask for percentiles from an input data set with 3 observations of non-tied values you tend to get percentiles of 25, 50 and 75

 

I would almost certainly if completing "ranks" have duplicate ranges of values, i.e. your copy the previous min max. I might INTERPOLATE values but copy, no.

What is this data set to be used for?

mkeintz
PROC Star

A minor tweak of the code provided by @ballardw, in the pursuit of tighter code:

 

data have;
  input rank min max;
cards;
 0    0  800
 1 1000 1901
 2 2000 2425
 3 2500 3000
 4 3001 4000
 5 4015 4935
 7 5000 5000
 9 5001 6000
10 6001 6480
13 6500 6500
16 6501 6934
17 7000 7200
18 7244 7400
25 7500 7500
run;

data want (drop=nxt:);
  merge have  
        have (firstobs=2 keep=rank rename=(rank=nxt_rank)) ;
  do rank=rank by 1 until (rank=coalesce(nxt_rank,26)-1);
    output;
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-white.png

Missed SAS Innovate in Orlando?

Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.

 

Register now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1341 views
  • 6 likes
  • 7 in conversation