I have column value (Factor 1) which I need to compare with 1 to 100 percentile values to determine under which bucket it fall.
Iteratively I have to perform this comparision for all the row in my column/factor 1 ?
Take your percentile data and convert it to a format.
data percentiles;
input _NAME_ $ Percentile @@ ;
cards;
P_0 -10 P_1 -5 P_2 0 P_3 5 P_4 10 P_5 15
P_6 17 P_7 21 P_8 25 P_9 30 P_10 34
;
data formats;
retain fmtname 'rank';
set percentiles;
label=_name_;
end=percentile;
start=lag(percentile);
if missing(start) then hlo='L';
keep fmtname start end label hlo;
run;
proc format cntlin=formats ;
run;
Then use the format to encode your actual data.
data have;
input factor1 expected $ @@ ;
cards;
10 P_4 15 P_5 21 P_7 -8 P_0
4 P_3 -3 P_1 33 P_10
;
data want;
set have ;
percentile=put(factor1,rank.);
run;
Results:
Obs factor1 expected percentile 1 10 P_4 P_4 2 15 P_5 P_5 3 21 P_7 P_7 4 -8 P_0 P_1 5 4 P_3 P_3 6 -3 P_1 P_2 7 33 P_10 P_10
Without seeing a portion of your actual data, I cannot provide more specific advice.
However, PROC RANK can do this. If you use the GROUPS=100 option, each observation is placed into the proper percentile. No arrays or macros needed.
Thanks Paige for your response.
In table 1, I have a column value (Factor 1) which I need to compare from percentile values from table 2 ( which contains)1 to 100 percentile values) and create a column in table 1 to store the percentile bucket (eg. added in below table 1) under which the values falls ?
Iteratively I have to perform this comparision for all the row in my column/factor 1 ?
Table 1 | |
Factor 1 | Percentile Bucket |
10 | P_4 |
15 | P_5 |
21 | P_7 |
-8 | P_0 |
4 | P_3 |
-3 | P_1 |
33 | P_10 |
Table 2 | |
_NAME_ | Percentile |
P_0 | -10 |
P_1 | -5 |
P_2 | 0 |
P_3 | 5 |
P_4 | 10 |
P_5 | 15 |
P_6 | 17 |
P_7 | 21 |
P_8 | 25 |
P_9 | 30 |
P_10 | 34 |
Macros and arrays and iteration are not necessary here. Nevertheless, let me ask one more question: if Factor 1 equals 23, you have a tie between P_7 and P_8, what should the answer be in that case?
If you are willing to accept a default answer, then @art297 has the answer:
https://communities.sas.com/t5/SAS-Programming/Finding-closest-value/td-p/352655
I already have a macro and wanted to add this functionality to the exisiting one.
Comming to your question, if the factor_1 has value 23 then it will fall under P_8.
The condition I am using here is - if (P_7< Factor_1 <= P_8) then P_8 and so will apply for all values.
I am not sure how it could be done without array's as for each value in table 1 I am required to iterate over all the values of table 2.
Would be great if you can help me out here ! Thanks.
The answer I linked to still works, you just have to change the condition so that it covers the case where
if (P_7< Factor_1 <= P_8) then P_8
So, untested code:
proc sql;
create table want as
select b.factor1,a._name_
from table2 a, table1 b
group by b.factor1
having (b.factor1-a.percentile) eq min(b.factor1-a.percentile) and
(b.factor1-a.percentile)<=0
;
quit;
Sorry but this way it does not work.
Inside having condition, the value of min(b.factor1-a.percentile) cannot be computed corrected as every time during the substraction and with min () condition the factor_1 will be substracted with the highest value ie. of P_100 (value 507) which yeild a value -455 and thus every time P_100 percentile band will be selected.
Example -
Factor 1 | P_0 | P_1 | P_2 | P_42 | P_43 | P_75 | P_76 | P_77 | P_97 | P_98 | P_99 | P_100 | |
152 | -431 | -246 | -209 | 26 | 30 | 149 | 154 | 159 | 320 | 346 | 379 | 507 | |
28 | -431 | -246 | -209 | 26 | 30 | 149 | 154 | 159 | 320 | 346 | 379 | 507 |
In the above table, the correct percentile band of 152 should be P_75 (value 154) and 28 should be P_43 but using your mentioned query it gives
Factor 1 | _NAME_ | COL1 |
152 | P_100 | 507 |
28 | P_100 | 507 |
What I need is comparing the value of factor_1 with each percentile value and find the percentile bucket in which the factor_1 value is less than or equal to.
Why I have been using array is bcoz I have been used to Object oriented language and with array its easy to compare Factor_1 value iteratively with every single percentile value and saving the index where it matches -- this is how I am doing and as it is a part of a process chain it needs to be run within pre-defined macro
SAS code for calculating the percentile band but does not work under macro as I am unable to figure out arrays inside macro -
data data_band2 ;
set data;
array p{*} P_0 - P_100;
score_perc = 0;
do i = 1 to 100;
if Factor_1 > p{i} and Factor_1 <= p{i+1} then
score_perc = i;
end;
if Factor_1 < P_1 then score_perc=1;
run;
Ok, at this point you need to provide us your data as a SAS data step, as explained here:
That way we can work with your exact data, instead of looking at tables that you include in your posts.
I still contend that if your data is structured as in your original data table (and not in the latest data table), then the code I provided will work. But until we have your actual data as a SAS data step (and not in any other format), I can't provide code.
Take your percentile data and convert it to a format.
data percentiles;
input _NAME_ $ Percentile @@ ;
cards;
P_0 -10 P_1 -5 P_2 0 P_3 5 P_4 10 P_5 15
P_6 17 P_7 21 P_8 25 P_9 30 P_10 34
;
data formats;
retain fmtname 'rank';
set percentiles;
label=_name_;
end=percentile;
start=lag(percentile);
if missing(start) then hlo='L';
keep fmtname start end label hlo;
run;
proc format cntlin=formats ;
run;
Then use the format to encode your actual data.
data have;
input factor1 expected $ @@ ;
cards;
10 P_4 15 P_5 21 P_7 -8 P_0
4 P_3 -3 P_1 33 P_10
;
data want;
set have ;
percentile=put(factor1,rank.);
run;
Results:
Obs factor1 expected percentile 1 10 P_4 P_4 2 15 P_5 P_5 3 21 P_7 P_7 4 -8 P_0 P_1 5 4 P_3 P_3 6 -3 P_1 P_2 7 33 P_10 P_10
Woow ! Thanks a million Tom. It worked exactly.
@Sujaytalesara wrote:
I have column value (Factor 1) which I need to compare with 1 to 100 percentile values to determine under which bucket it fall.
Iteratively I have to perform this comparision for all the row in my column/factor 1 ?
Not sure why macro would be needed for any implementation of this.
You could just join the tables and figure out the percentile.
You could convert the percentile table into a format and use the format to group the other variable.
Where you might get value from macro coding is if you wanted to apply the same functionality to multiple datasets with different variable names. But before you can work on making a macro make sure you know what SAS code you want it to generate.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.