DATA Step, Macro, Functions and more

large numerical arrays

New Contributor
Posts: 2

large numerical arrays

I have created a basic function to calculate the percentile of some list of numbers, much like the PERCENTILE() function in excel:

proc fcmp library=sashelp.svrtdist outlib=work.mysubs.percentile;

  function RISKTYPE_percentile(pn,dim,x

  • ,F
  • ,Ftype);
  •   percentile=svrtutil_percentile(pn,dim,x,F,Ftype);

      return (percentile);



    It accepts as arguments two arrays x and F which are the sorted actual values and the sorted empirical distribution percentiles of your number list. I'm finding it difficult though to create very large arrays to use this function with.The method that has worked best so far is something like:

    %let actlist=;

    %let emplist=;

    data _null_;

      set temp;

      call symput('actlist',resolve('&actlist')||' '|actualvalues);

      call symput('emplist',resolve('&emplist')||' '||_edf_);


    This dataset temp has two columns actualvalues and _edf_ which I want to use as the values for my input arrays. This method seems to work for only 2-3000 observations though, and after that there is no longer any memory in the macro variable to store more information. And what's more, this is very slow. Does anyone know of any efficient way to create large numerical arrays that can be used as input for custom functions such as the one above?  The task I'm working on calls for arrays with hundreds of thousands or even millions of values.

    PROC Star
    Posts: 8,164

    Re: large numerical arrays

    I, unfortunately, am a total novice (as yet) regarding proc fcmp.  However, we do have a couple of strong FCMP posters on both the forum and sas-l.  I suggest that you crosspost, to both, but change your title to something like "problem using large arrays with proc fcmp"

    Valued Guide
    Posts: 653

    Re: large numerical arrays

    ArtT has a good suggestion, but I also have a couple of thoughts.

    1. Do you really need the values in a macro variable? Your function is working against DATA step variables so if we can go against those directly it might help.

    2. your DATA _NULL_  step to create the macro variables is a bit convoluted.  A SQL step is more straightforward.  Consider:

    proc sql noprint;

    select actualvalues,_edf_

       into :actlist separated by ' ',

            :emplist separated by ' '

          from temp;


    %put &actlist;

    %put &emplist;

    3.  If you do really need arrays, and we have not seen enough of what you are doing yet to know for sure, then you might consider skipping the macro language and going to either DATA step arrays or hash tables, neither have the size limitations of macro variables (64K).

    New Contributor
    Posts: 2

    Re: large numerical arrays

    Thanks for the helpful answers guys, I'll follow up on your responses. Art, I've already tried using a proc sql, and it seemed to accomplish the same thing, but also still with the 64k limitation.

    My problem is as follows:  I have a large set of data which we can say represents empirical, or actual losses. I'd like to generate a large number of random variables from the distribution of these losses. Since this is a distribution which can't be described exactly using some of the built-in parametric distributions available in SAS, I'd like to use something like the svrtutil_percentile utility function.

    I begin by generating a large number of random uniform variables (rand('UNIFORM')), and will get results something like 0.10, 0.70, 0.60....etc. I'd like to translate those numbers in to the 10% quantile value, 70% quantile value. 60% quantile value, etc from the original set of empirical losses. To do this, at least using the percentile function, for each uniform random variable I need to pass the array of actual losses and array of edf values (calculated using proc severity). This becomes difficult when using something over 2000-3000 original loss observations because the size of the variable surpasses the 64k limit.

    My SAS knowledge is fairly limited so maybe there's some easy solution that is out there. In Matlab, which I am more familiar with, this would be fairly straightforward, just to define an entire column of some variable as an input array. In SAS it seems this is not so simple.

    Super User
    Posts: 23,754

    Re: large numerical arrays

    Second to Art & Data _null_.

    You don't need macro's or arrays. You probably need a data step or a hash table.

    You may want to see Rick Wicklin's books and investigate IML if you like.

    SAS Press - Rick Wicklin Author Page

    Respected Advisor
    Posts: 3,852

    Re: large numerical arrays

    Art Carpenter wrote:

    3.  If you do really need arrays, and we have not seen enough of what you are doing yet to know for sure, then you might consider skipping the macro language and going to either DATA step arrays or hash tables, neither have the size limitations of macro variables (64K).


    Super User
    Posts: 6,781

    Re: large numerical arrays

    Well, I'm trying to wrap my head around what really needs to be done here.  Here are a couple of questions.  Maybe your comments will help me figure out what needs to happen.

    First, what is the difference between what you want and a random sampling of the data set (or perhaps a set of random samples)?  PROC SURVEYSELECT will accomplish this rather easily.

    Second, what is the purpose of applying RAND("UNIFORM")?  Are you trying to accomplish something more complex, like generating a random sample of all those values that fall into the 70th percentile?

    Usually, half the battle is figuring out what needs to be solved.  I know you've tried, but a little more explanation would help.

    Ask a Question
    Discussion stats
    • 6 replies
    • 6 in conversation