DATA Step, Macro, Functions and more

using datastep variables to reference other columns

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 105
Accepted Solution

using datastep variables to reference other columns

Hello,

I'm having a hard time explaining what it is I'm trying to do with words, so I'm going to just explain more by example:

I have a dataset that has forecasted interest rates out from DEC2012 to Dec2015, the first two rows, and first few columns of the data looks like this (there are 40+ rates so I will include a few here to illistrute the format of the data):

DateFA_1MoTrFB_1MoTrFS_1MoTrFA_6MoTrFB_6MoTrFS_6MoTrFA_1YrTrFB_1YrTrFS_1YrTrFA_2YrTrFB_2YrTrFS_2YrTrFA_3YrTrFB_3YrTrFS_3YrTrFA_4YrTrFB_4YrTr
31Dec20120.040.040.040.120.120.120.160.160.160.260.260.260.350.350.350.5250.525
31Jan20130.5562276910.0685130980.121803720.8950744060.1491429730.0359408971.0939914570.19876930.011.2421366240.2874627270.011.3492947080.3883054570.0183618911.531171975

0.571382614

I need to loop through all of these rates, within each row of the data set, to ensure that no rates contradict with each other in any given month and interpolate if necessary. Since there are so many rates it would extremely tedious to write nested if-statements for even single rate, what I would like to do is be able to load in an array of all the rates, or an array of macro variables so that I can more generally reference each rate, and actually right a do loop, rather than manually write a bunch of nested if's.

-For example, to check that in the FA scenario that 1 month rates < 6 month rates < 1 year rates < 2 year rates < 3 year rates < 4 year rates < ... < 7 year rates (etc) I would need to manually right something like this:

if FA_1MoTr > FA_6MoTr then do;

  /*Interpolate 1MoTr with _1YrTr */

  if FA_6MoTr < _1YrTr then do;

  end;

  else do;

  /*Interpolate 1MoTr with _2YrTr */

  if FA_6MoTr < _2YrTr then do;

  end;

  else do;

  /*Interpolate 1MoTr with _3YrTr */

  if FA_6MoTr < _3YrTr then do;

  end;

  else do;

  /*Interpolate 1MoTr with _4YrTr */

  if FA_6MoTr < _4YrTr then do;

  end;

  else do;

  /*Interpolate 1MoTr with _5YrTr */

  if FA_6MoTr < _5YrTr then do;

  end;

  else do;

  /*Interpolate 1MoTr with _7YrTr */

  if FA_6MoTr < _7YrTr then do;

end;

  else do;

  /*Interpolate 1MoTr with _10YrTr */

  if FA_1MoTr < _10YrTr then do;

  end;

  else do;

  end;

  end;

  end;

  end;

  end;

  end;

  end;

  end;

--But what I would like to do is be able to just put all of these rates into an array (I already have a 1-D array of the all the rates as such:

rate_1 rate_2 rate_3 rate_4 rate_5 rate_6 rate_7 rate_8 rate_9 rate_10 rate_11 rate_12 rate_13 rate_14 rate_15 rate_16 rate_17 rate_18 rate_19 rate_20 rate_21

_FF _10YrSwp _12MoL _1MoL _1MoTr _1YrTr _2YrSwp _2YrTr _30YrSwp _30YrTr _3MoL _3YrSwp _3YrTr _4YrSwp _4YrTr _5YrSwp _5YrTr _6MoL _6MoTr _7YrSwp _7YrTr

then I would hope to be able to do something like this:

data all_interp_rates;

    merge interpolation_data interp_rateset_tr;

  array rates(21) rate_1-rate_21;

do i = 1 to 21;

     %let curr_Rate = rates;

     %let next_Rate = rates[i+1];

     if FA_&curr_Rate. > FA_&next_Rate. then do;

    /*do work here */

     end;

end;

run;

The reason I assign the array variables to macro variables like that is because using the "&" operator to concatenate variables names like that is the only method that I know of (I was hoping that the above code would ultimately translate that if statement from "if FA_&curr_Rate. > FA_&next_Rate." to "if FA_1MoTr > FA_6MoTr" through resolving the macro variables (and in the next loop iteration "if FA_6MoTr > FA_1YrTr"  Etc.) , rather than manually typing an if statement for each pair of rates. - The issue with this is that it doesn't work: i.e. " FA_&curr_Rate." would resolve to "FA_rates" which of course is useless in this situation. I need to concat that actual value of the variables in the array "rates" using the values not variable names.

- So is this possible? Please let me know if I am being completely unclear on what I am trying to do here, and I will try to explain better!

Thanks so much!!! - Ryan


Accepted Solutions
Solution
‎08-27-2013 07:59 AM
Super Contributor
Posts: 339

Re: using datastep variables to reference other columns

Hi, you can use SASHELP.vcolumn to extract the name of all the columns of your dataset in appropriate order into macro variables and then declare multiple arrays for each set of variables. This way, you can do comparisons over a do loop using the same array index (or i and i+1 etc depending on your intentions).

A generic approach would be something like this:

proc sql noprint;

     create table orderedcolumns as

     select libname, memname, name as varname, type as vartype, length as varlen, varnum

     from sashelp.vcolumn

     where      libname=           /* work or whatever other libname you use, in capital letters quoted */

                    memname=       /* your table names, again in capital letters quoted*/

     order by libname, memname, varnum

;

quit;

proc sql noprint;

     select varname

     into :favarnames separated by " "

     from orderedcolumns

     where libname=

              and  memname=

              and substr(varname, 1, 2) = "FA"

     ;

 

     select varname

     into :fbvarnames separated by " "

     from orderedcolumns

     where libname=

              and memname=

              and substr(varname, 1, 2) = "FB"

     ;

     select varname

     into :fsvarnames separated by " "

     from orderedcolumns

     where libname=

              and memname=

              and substr(varname, 1, 2) = "FS"

     ;

quit;

  data want;

     set have;

     array FA {*} &favarnames;

     array FB {*} &fbvarnames;

     array FS {*} &fsvarnames;

     do i=1 to (dim(FA)-1); /* to prevent array subscript out of range */

         if FA{i} > FA{i+1} then do;

               /* do work here */

          end;

     end;

run;

This is essentially what Ballardw said with the addition to how to generate the 3 arrays without having to hard type all variables names in the proper order (assuming your columns are ordered appropriately in your table obviously).

Vincent

View solution in original post


All Replies
Super User
Posts: 11,343

Re: using datastep variables to reference other columns

Posted in reply to AllSoEasy

I'm not sure I understand everything you're trying to do but if it is basically to compare each variable with the ones with higher indices in the array then you I think you may only need to nest do loops;

Do i = 1 to 20; (unless you are going to try  to compare element 21 with not existant higher */

     do j= (i+1) to 21;

     if rates > rates then do;

          <your work>

     end;
end;

This will compare rate_1 with rate_2 through rate_21, then rate_2 with rate_3 through rate_21, ending comparing rate_20 with rate_21.

Super Contributor
Posts: 307

Re: using datastep variables to reference other columns

Posted in reply to AllSoEasy

Have you looked at the VNAME function?

Super User
Posts: 19,849

Re: using datastep variables to reference other columns

Posted in reply to AllSoEasy

I think you're mixing up arrays and macro variables.

If you're comparing array elements to each other you can use:

array_dec(i)<array_dec(i+1)

From an array you can use the vname function to obtain the actual variable it references.

But the example is a bit unclear Smiley Happy.

Clear questions = clear answers.

Frequent Contributor
Posts: 105

Re: using datastep variables to reference other columns

Hello all,

I am sorry that my question is so unclear, I was afraid of this.

Ok -- I know that you can just use arrays to compare things within a do loop. The issue is, the only reason I was using an array was that so I had a way to reference each column in any observation, without actually manually typing each column name each time.

- See the "rates" array in my example does actually contain any numbers to compare, those values are in my data set each in separate column: for example, for _1MoTr: in the data set there are 3 variables FA_1MoTr, FB_1MoTr, FS_1MoTr (FA/FB/FS represent different scenarios the rate is forecasted for -- i'd like to check all of these.) - The entire dataset follow this pattern, i.e. FA_6MoTr, FB_6MoTr, FS_6MoTr, etc. there are 21 rates, times these 3 scenarios, so 63 columns of data in my dataset, and one observation per month for each month from Dec-2012 to Dec-2015. Longer term rates should also be higher than short term, so 1MoTr (1 month) should be less than 6MoTr (6 Month) should be less than 1YrTr (1 year) etc etc. If there are any unintuitive forecast that violate this, I need to interpolate the bad rates from the next best intuitive forecast (i.e. if   FA_1YrTr < FA_6MoTr <  FA_2YrTr, then I need to overwrite the value for FA_1YrTr by using linear interpolation from FA_6MoTr and FA_2YrTr. I need to check this for each rate, under each scenario (fa/fb/fs), so I could do this by manually writing if-statements and not using any arrays or macro variables at all; however, it would just be very tedious and time consuming because I would be writing hundreds of nested if-else statements, and it would be very hard to read & understand.

-I'm not set on using arrays or macros by any means, all that I am trying to do is essentially loop through every variable in my dataset per each observation, rather than manually writing out if-statements for each variable. I thought that using an array of just the rate names (the array is made from a 1-dimensional dataset that is just a list of: 1MoTr,6MoTr,1YrTr,2YrTr...etc, and the reason that I am assigning each value of the array to a macro variable is not quite because of confusion, but because I need to access the actual names in the dataset, which are preceded with the scenario name ("FA_","FB_",or "FS_"), and i know this can be done with global macro variables such as:

%let myrate = _6MoTr;

So that the code piece: "FA&myrate." would be equal to "FA_6MoTr"

so this way I could do something like:

do i = 1 to 20;

     if FA_&curr_Rate. > FA_&next_Rate. then do;

     if FS_&curr_Rate. > FS_&next_Rate. then do;

     if FB_&curr_Rate. > FB_&next_Rate. then do;

    /*do work here */

     end;

^ And this will have effectively went through and compared all 63 of my variables as expected, without manually writing 63 if-statements -- I don't need to use macro variables, or arrays, just whatever way is best to do this in sas. I have been reading the documentation on the VNAME function and am not sure that can quite apply to my situation but perhaps there is just a logical disconnect because of the lack of clarity in my question, and I deeply apologize for that -- I know it is my responsibility to present my question as clearly as possible for you guys but I just can't think how else to word this to explain what I'm trying to do here.

Please let me know if this makes sense, or perhaps I can try to explain differently and provide more data;/examples.

Super User
Posts: 11,343

Re: using datastep variables to reference other columns

Posted in reply to AllSoEasy

This looks more like wanting several arrays addressed with a common index, One array each for the FA, FB and FS variables.

Array RateFA <FA variables in order>;

Array RateFB <FB variables in order>;

Array RAteFS <FS variables in order>;

do i=1 to 20; (or dim(RateFA) - 1)

     if RateFA > RateFA[i=1] then do;

     if RateFB > RateFB[i=1] then do;

     if RateFS > RateFS[i=1] then do;

end;

Though there may be some questions about you rule for replacing the out of expected range for the variables. When you are comparing two rates at a time, how do you know which is incorrect? Suppose a series of rates looked like

.04, .0012, .15, .18, .24 as the first 5 values for the FA series. I'm not sure that the first value would be the one that should be corrected even though the first should be less than the second. It think you might want to look at the entire series of 21 and identify which ones may be the problem children and then interpolate around them. Which logic may be fun to determine.

You don't mention if there should be anything going on as far as consisitency between the FA FB and FS series such as should the 6  month values be within some range of each other.

Super Contributor
Posts: 307

Re: using datastep variables to reference other columns

Posted in reply to AllSoEasy

You could use an approach similiar to BallardW's suggestion but . . .

Do you know if you will be "interpolating" over more than three data points (e.g. RateFA = (RateFA[i+1] - RateFA[i-1])/2 would apply if one rate was lower than the previous, but what if the second and third are both lower?).

What if many subsequent data values for a given series are all lower than "expected"?

What if the last value is lower?

As an aside, it's unfortunate that your variables don't have a consistent naming pattern amenable to convenient array referencing (e.g. ARRAY RateFA {*} FA1-FA20). This forces you to explicity reference all of the variables in each array, as BallardW suggests.

Solution
‎08-27-2013 07:59 AM
Super Contributor
Posts: 339

Re: using datastep variables to reference other columns

Hi, you can use SASHELP.vcolumn to extract the name of all the columns of your dataset in appropriate order into macro variables and then declare multiple arrays for each set of variables. This way, you can do comparisons over a do loop using the same array index (or i and i+1 etc depending on your intentions).

A generic approach would be something like this:

proc sql noprint;

     create table orderedcolumns as

     select libname, memname, name as varname, type as vartype, length as varlen, varnum

     from sashelp.vcolumn

     where      libname=           /* work or whatever other libname you use, in capital letters quoted */

                    memname=       /* your table names, again in capital letters quoted*/

     order by libname, memname, varnum

;

quit;

proc sql noprint;

     select varname

     into :favarnames separated by " "

     from orderedcolumns

     where libname=

              and  memname=

              and substr(varname, 1, 2) = "FA"

     ;

 

     select varname

     into :fbvarnames separated by " "

     from orderedcolumns

     where libname=

              and memname=

              and substr(varname, 1, 2) = "FB"

     ;

     select varname

     into :fsvarnames separated by " "

     from orderedcolumns

     where libname=

              and memname=

              and substr(varname, 1, 2) = "FS"

     ;

quit;

  data want;

     set have;

     array FA {*} &favarnames;

     array FB {*} &fbvarnames;

     array FS {*} &fsvarnames;

     do i=1 to (dim(FA)-1); /* to prevent array subscript out of range */

         if FA{i} > FA{i+1} then do;

               /* do work here */

          end;

     end;

run;

This is essentially what Ballardw said with the addition to how to generate the 3 arrays without having to hard type all variables names in the proper order (assuming your columns are ordered appropriately in your table obviously).

Vincent

Frequent Contributor
Posts: 105

Re: using datastep variables to reference other columns

Posted in reply to Vince28_Statcan

Fugue - No it is not guaranteed that only one value will need to be interpolated, there could be 5-6 or any number of bad forecasts in a row, in which I will follow standard linear interpolation formula and interpolate from the starting point to the first good rate, overwriting all values in between, however many that may be. And yes of course, if the first rate is skewed or the last rate is skewed then linear interpolation is not possible, but this is rarely the case, and we can only do the best we can given our data, it's better than not fixing any of the curve! But thank you for this question, I should have made clearer -- this need to loop within the original loop to find a rate to interpolate from is a primary reason why I sought to solve my problem in this way, if I were to manually write this interpolation program I would essentially be hard-coding nested loops and soon have hundreds of nested if/else blocks for all of the rates.

Vince28@Statcan -- I cannot thank you enough, once again you have provided me quality information leading to an exact solution to my problem. I had never seen that method of populating arrays before! Many thanks, your code answered my question perfectly.

Thank you all for the suggestions and help as well!!

-Ryan

Super Contributor
Posts: 307

Re: using datastep variables to reference other columns

Posted in reply to AllSoEasy

FYI - you can use a separate array for the regression formula. Alternatively, you can write the regression formula to a macro provided you use characters that can be evaluated by SYSEVALF.

Super User
Posts: 19,849

Re: using datastep variables to reference other columns

Posted in reply to AllSoEasy

Have you tried defining your arrays as follows:

array fa_rates (*) fa:;

array fb_rates(*) fb:;

array fc_rates(*) fc:;

Note the colon before the semicolon.  This assumes that only the rates have this prefix, if you have other variables you'll need to hit the dictionary tables as suggested by

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 402 views
  • 4 likes
  • 5 in conversation