BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sujaytalesara
Fluorite | Level 6

I have column value (Factor 1) which I need to compare with 1 to 100 percentile values to determine under which bucket it fall.

Iteratively I have to perform this comparision for all the row in my column/factor 1 ?

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Take your percentile data and convert it to a format.

data percentiles;
  input  _NAME_ $ Percentile @@ ;
cards;
 P_0 -10  P_1 -5  P_2 0  P_3 5  P_4 10  P_5 15
 P_6 17  P_7 21  P_8 25  P_9 30  P_10 34
;
data formats;
retain fmtname 'rank';
  set percentiles;
  label=_name_;
  end=percentile;
  start=lag(percentile);
  if missing(start) then hlo='L';
  keep fmtname start end label hlo;
run;

proc format cntlin=formats ;
run;

Then use the format to encode your actual data.

data have;
 input factor1 expected $ @@ ;
cards;
10 P_4 15 P_5 21 P_7 -8 P_0
4 P_3 -3 P_1 33 P_10
;

data want;
  set have ;
  percentile=put(factor1,rank.);
run;

Results:

Obs    factor1    expected    percentile

 1        10        P_4          P_4
 2        15        P_5          P_5
 3        21        P_7          P_7
 4        -8        P_0          P_1
 5         4        P_3          P_3
 6        -3        P_1          P_2
 7        33        P_10         P_10

View solution in original post

10 REPLIES 10
PaigeMiller
Diamond | Level 26

Without seeing a portion of your actual data, I cannot provide more specific advice.

 

However, PROC RANK can do this. If you use the GROUPS=100 option, each observation is placed into the proper percentile. No arrays or macros needed.

--
Paige Miller
Sujaytalesara
Fluorite | Level 6

Thanks Paige  for your response.

In table 1, I have a column value (Factor 1) which I need to compare from percentile values from table 2 ( which contains)1 to 100 percentile values) and create a column in table 1 to store the percentile bucket (eg. added in below table 1) under which the values falls ?

 

Iteratively I have to perform this comparision for all the row in my column/factor 1 ?

 

Table 1 
Factor 1Percentile Bucket
10P_4
15P_5
21P_7
-8P_0
4P_3
-3P_1
33P_10

 

 

Table 2 
_NAME_Percentile
P_0-10
P_1-5
P_20
P_35
P_410
P_515
P_617
P_721
P_825
P_930
P_1034
PaigeMiller
Diamond | Level 26

Macros and arrays and iteration are not necessary here. Nevertheless, let me ask one more question: if Factor 1 equals 23, you have a tie between P_7 and P_8, what should the answer be in that case?

 

If you are willing to accept a default answer, then @art297 has the answer:

https://communities.sas.com/t5/SAS-Programming/Finding-closest-value/td-p/352655

--
Paige Miller
Sujaytalesara
Fluorite | Level 6

I already have a macro and wanted to add this functionality to the exisiting one.

 

Comming to your question, if the factor_1 has value 23 then it will fall under P_8.

The condition I am using here is -  if (P_7< Factor_1 <= P_8) then P_8 and so will apply for all values.

 

I am not sure how it could be done without array's as for each value in table 1 I am required to iterate over all the values of table 2.

Would be great if you can help me out here ! Thanks.

PaigeMiller
Diamond | Level 26

The answer I linked to still works, you just have to change the condition so that it covers the case where 

if (P_7< Factor_1 <= P_8) then P_8

 

So, untested code:

 

proc sql;
  create table want as
    select b.factor1,a._name_
      from table2 a, table1 b
      group by b.factor1
        having (b.factor1-a.percentile) eq min(b.factor1-a.percentile) and 
             (b.factor1-a.percentile)<=0
  ;
quit;
--
Paige Miller
Sujaytalesara
Fluorite | Level 6

Sorry but this way it does not work.

Inside having condition, the value of  min(b.factor1-a.percentile) cannot be computed corrected as every time during the substraction and with min () condition the factor_1 will be substracted with the highest value ie. of P_100 (value 507) which yeild a value -455 and thus every time P_100 percentile band will be selected.

 Example -

 

Factor 1 P_0P_1P_2P_42P_43P_75P_76P_77P_97P_98P_99P_100
152 -431-246-2092630149154159320346379507
28 -431-246-2092630149154159320346379507

 

In the above table, the correct percentile band of 152 should be P_75 (value 154) and 28 should be P_43 but using your mentioned query it gives

Factor 1_NAME_COL1
152P_100507
28P_100507

 

 

What I need is comparing the value of factor_1 with each percentile value and find the percentile bucket in which the factor_1 value is less than or equal to.

 

Why I have been using array is bcoz I have been used to Object oriented language and with array its easy to compare Factor_1 value iteratively with every single percentile value and saving the index where it matches -- this is how I am doing  and as it is a part of a process chain it needs to be run within pre-defined macro

 

SAS code for calculating the percentile band but does not work under macro as I am unable to figure out arrays inside macro -
data data_band2 ;

set data;

array p{*} P_0 - P_100;

score_perc = 0;

do i = 1 to 100;

if Factor_1 > p{i} and Factor_1 <= p{i+1} then

score_perc = i;

end;

if Factor_1 < P_1 then score_perc=1;

run;

 

 

PaigeMiller
Diamond | Level 26

Ok, at this point you need to provide us your data as a SAS data step, as explained here:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

 

That way we can work with your exact data, instead of looking at tables that you include in your posts.

 

I still contend that if your data is structured as in your original data table (and not in the latest data table), then the code I provided will work. But until we have your actual data as a SAS data step (and not in any other format), I can't provide code.

 

 

--
Paige Miller
Tom
Super User Tom
Super User

Take your percentile data and convert it to a format.

data percentiles;
  input  _NAME_ $ Percentile @@ ;
cards;
 P_0 -10  P_1 -5  P_2 0  P_3 5  P_4 10  P_5 15
 P_6 17  P_7 21  P_8 25  P_9 30  P_10 34
;
data formats;
retain fmtname 'rank';
  set percentiles;
  label=_name_;
  end=percentile;
  start=lag(percentile);
  if missing(start) then hlo='L';
  keep fmtname start end label hlo;
run;

proc format cntlin=formats ;
run;

Then use the format to encode your actual data.

data have;
 input factor1 expected $ @@ ;
cards;
10 P_4 15 P_5 21 P_7 -8 P_0
4 P_3 -3 P_1 33 P_10
;

data want;
  set have ;
  percentile=put(factor1,rank.);
run;

Results:

Obs    factor1    expected    percentile

 1        10        P_4          P_4
 2        15        P_5          P_5
 3        21        P_7          P_7
 4        -8        P_0          P_1
 5         4        P_3          P_3
 6        -3        P_1          P_2
 7        33        P_10         P_10
Sujaytalesara
Fluorite | Level 6

Woow ! Thanks a million Tom. It worked exactly.

Tom
Super User Tom
Super User

@Sujaytalesara wrote:

I have column value (Factor 1) which I need to compare with 1 to 100 percentile values to determine under which bucket it fall.

Iteratively I have to perform this comparision for all the row in my column/factor 1 ?


Not sure why macro would be needed for any implementation of this.

 

You could just join the tables and figure out the percentile.

You could convert the percentile table into a format and use the format to group the other variable.

 

Where you might get value from macro coding is if you wanted to apply the same functionality to multiple datasets with different variable names.  But before you can work on making a macro make sure you know what SAS code you want it to generate.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1107 views
  • 2 likes
  • 3 in conversation