Solved: Re: Can we create array inside a sas macro and use it ?

Sujaytalesara · Posted 05-03-2019 01:58 PM

I have column value (Factor 1) which I need to compare with 1 to 100 percentile values to determine under which bucket it fall.

Iteratively I have to perform this comparision for all the row in my column/factor 1 ?

Tom · Posted 05-07-2019 11:36 AM

Take your percentile data and convert it to a format.

data percentiles;
  input  _NAME_ $ Percentile @@ ;
cards;
 P_0 -10  P_1 -5  P_2 0  P_3 5  P_4 10  P_5 15
 P_6 17  P_7 21  P_8 25  P_9 30  P_10 34
;
data formats;
retain fmtname 'rank';
  set percentiles;
  label=_name_;
  end=percentile;
  start=lag(percentile);
  if missing(start) then hlo='L';
  keep fmtname start end label hlo;
run;

proc format cntlin=formats ;
run;

Then use the format to encode your actual data.

data have;
 input factor1 expected $ @@ ;
cards;
10 P_4 15 P_5 21 P_7 -8 P_0
4 P_3 -3 P_1 33 P_10
;

data want;
  set have ;
  percentile=put(factor1,rank.);
run;

Results:

Obs    factor1    expected    percentile

 1        10        P_4          P_4
 2        15        P_5          P_5
 3        21        P_7          P_7
 4        -8        P_0          P_1
 5         4        P_3          P_3
 6        -3        P_1          P_2
 7        33        P_10         P_10

View solution in original post

PaigeMiller · Posted 05-03-2019 03:31 PM

Without seeing a portion of your actual data, I cannot provide more specific advice.

However, PROC RANK can do this. If you use the GROUPS=100 option, each observation is placed into the proper percentile. No arrays or macros needed.

--
Paige Miller

Sujaytalesara · Posted 05-07-2019 06:23 AM

Thanks Paige for your response.

In table 1, I have a column value (Factor 1) which I need to compare from percentile values from table 2 ( which contains)1 to 100 percentile values) and create a column in table 1 to store the percentile bucket (eg. added in below table 1) under which the values falls ?

Iteratively I have to perform this comparision for all the row in my column/factor 1 ?

Table 1
Factor 1	Percentile Bucket
10	P_4
15	P_5
21	P_7
-8	P_0
4	P_3
-3	P_1
33	P_10

Table 2
_NAME_	Percentile
P_0	-10
P_1	-5
P_2	0
P_3	5
P_4	10
P_5	15
P_6	17
P_7	21
P_8	25
P_9	30
P_10	34

PaigeMiller · Posted 05-07-2019 06:49 AM

Macros and arrays and iteration are not necessary here. Nevertheless, let me ask one more question: if Factor 1 equals 23, you have a tie between P_7 and P_8, what should the answer be in that case?

If you are willing to accept a default answer, then @art297 has the answer:

https://communities.sas.com/t5/SAS-Programming/Finding-closest-value/td-p/352655

--
Paige Miller

Sujaytalesara · Posted 05-07-2019 09:09 AM

I already have a macro and wanted to add this functionality to the exisiting one.

Comming to your question, if the factor_1 has value 23 then it will fall under P_8.

The condition I am using here is - if (P_7< Factor_1 <= P_8) then P_8 and so will apply for all values.

I am not sure how it could be done without array's as for each value in table 1 I am required to iterate over all the values of table 2.

Would be great if you can help me out here ! Thanks.

PaigeMiller · Posted 05-07-2019 09:46 AM

The answer I linked to still works, you just have to change the condition so that it covers the case where

if (P_7< Factor_1 <= P_8) then P_8

So, untested code:

proc sql;
  create table want as
    select b.factor1,a._name_
      from table2 a, table1 b
      group by b.factor1
        having (b.factor1-a.percentile) eq min(b.factor1-a.percentile) and 
             (b.factor1-a.percentile)<=0
  ;
quit;

--
Paige Miller

Sujaytalesara · Posted 05-07-2019 11:16 AM

Sorry but this way it does not work.

Inside having condition, the value of min(b.factor1-a.percentile) cannot be computed corrected as every time during the substraction and with min () condition the factor_1 will be substracted with the highest value ie. of P_100 (value 507) which yeild a value -455 and thus every time P_100 percentile band will be selected.

Example -

Factor 1	P_0	P_1	P_2	P_42	P_43	P_75	P_76	P_77	P_97	P_98	P_99	P_100
152	-431	-246	-209	26	30	149	154	159	320	346	379	507
28	-431	-246	-209	26	30	149	154	159	320	346	379	507

In the above table, the correct percentile band of 152 should be P_75 (value 154) and 28 should be P_43 but using your mentioned query it gives

Factor 1	_NAME_	COL1
152	P_100	507
28	P_100	507

What I need is comparing the value of factor_1 with each percentile value and find the percentile bucket in which the factor_1 value is less than or equal to.

Why I have been using array is bcoz I have been used to Object oriented language and with array its easy to compare Factor_1 value iteratively with every single percentile value and saving the index where it matches -- this is how I am doing and as it is a part of a process chain it needs to be run within pre-defined macro

SAS code for calculating the percentile band but does not work under macro as I am unable to figure out arrays inside macro -
data data_band2 ;

set data;

array p{*} P_0 - P_100;

score_perc = 0;

do i = 1 to 100;

if Factor_1 > p{i} and Factor_1 <= p{i+1} then

score_perc = i;

end;

if Factor_1 < P_1 then score_perc=1;

run;

PaigeMiller · Posted 05-07-2019 11:37 AM

Ok, at this point you need to provide us your data as a SAS data step, as explained here:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

That way we can work with your exact data, instead of looking at tables that you include in your posts.

I still contend that if your data is structured as in your original data table (and not in the latest data table), then the code I provided will work. But until we have your actual data as a SAS data step (and not in any other format), I can't provide code.

--
Paige Miller

Tom · Posted 05-07-2019 11:36 AM

Take your percentile data and convert it to a format.

data percentiles;
  input  _NAME_ $ Percentile @@ ;
cards;
 P_0 -10  P_1 -5  P_2 0  P_3 5  P_4 10  P_5 15
 P_6 17  P_7 21  P_8 25  P_9 30  P_10 34
;
data formats;
retain fmtname 'rank';
  set percentiles;
  label=_name_;
  end=percentile;
  start=lag(percentile);
  if missing(start) then hlo='L';
  keep fmtname start end label hlo;
run;

proc format cntlin=formats ;
run;

Then use the format to encode your actual data.

data have;
 input factor1 expected $ @@ ;
cards;
10 P_4 15 P_5 21 P_7 -8 P_0
4 P_3 -3 P_1 33 P_10
;

data want;
  set have ;
  percentile=put(factor1,rank.);
run;

Results:

Obs    factor1    expected    percentile

 1        10        P_4          P_4
 2        15        P_5          P_5
 3        21        P_7          P_7
 4        -8        P_0          P_1
 5         4        P_3          P_3
 6        -3        P_1          P_2
 7        33        P_10         P_10

Sujaytalesara · Posted 05-08-2019 11:12 AM

Woow ! Thanks a million Tom. It worked exactly.

Tom · Posted 05-07-2019 09:51 AM

@Sujaytalesara wrote:

I have column value (Factor 1) which I need to compare with 1 to 100 percentile values to determine under which bucket it fall.

Iteratively I have to perform this comparision for all the row in my column/factor 1 ?

Not sure why macro would be needed for any implementation of this.

You could just join the tables and figure out the percentile.

You could convert the percentile table into a format and use the format to group the other variable.

Where you might get value from macro coding is if you wanted to apply the same functionality to multiple datasets with different variable names. But before you can work on making a macro make sure you know what SAS code you want it to generate.

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away