BookmarkSubscribeRSS Feed
jbeatty
Calcite | Level 5

I have created a basic function to calculate the percentile of some list of numbers, much like the PERCENTILE() function in excel:

proc fcmp library=sashelp.svrtdist outlib=work.mysubs.percentile;

  function RISKTYPE_percentile(pn,dim,x

  • ,F
  • ,Ftype);
  •   percentile=svrtutil_percentile(pn,dim,x,F,Ftype);

      return (percentile);

      endsub;

    quit;

    It accepts as arguments two arrays x and F which are the sorted actual values and the sorted empirical distribution percentiles of your number list. I'm finding it difficult though to create very large arrays to use this function with.The method that has worked best so far is something like:

    %let actlist=;

    %let emplist=;

    data _null_;

      set temp;

      call symput('actlist',resolve('&actlist')||' '|actualvalues);

      call symput('emplist',resolve('&emplist')||' '||_edf_);

      run;

    This dataset temp has two columns actualvalues and _edf_ which I want to use as the values for my input arrays. This method seems to work for only 2-3000 observations though, and after that there is no longer any memory in the macro variable to store more information. And what's more, this is very slow. Does anyone know of any efficient way to create large numerical arrays that can be used as input for custom functions such as the one above?  The task I'm working on calls for arrays with hundreds of thousands or even millions of values.

    6 REPLIES 6
    art297
    Opal | Level 21

    I, unfortunately, am a total novice (as yet) regarding proc fcmp.  However, we do have a couple of strong FCMP posters on both the forum and sas-l.  I suggest that you crosspost, to both, but change your title to something like "problem using large arrays with proc fcmp"

    ArtC
    Rhodochrosite | Level 12

    ArtT has a good suggestion, but I also have a couple of thoughts.

    1. Do you really need the values in a macro variable? Your function is working against DATA step variables so if we can go against those directly it might help.

    2. your DATA _NULL_  step to create the macro variables is a bit convoluted.  A SQL step is more straightforward.  Consider:

    proc sql noprint;

    select actualvalues,_edf_

       into :actlist separated by ' ',

            :emplist separated by ' '

          from temp;

    quit;

    %put &actlist;

    %put &emplist;

    3.  If you do really need arrays, and we have not seen enough of what you are doing yet to know for sure, then you might consider skipping the macro language and going to either DATA step arrays or hash tables, neither have the size limitations of macro variables (64K).

    jbeatty
    Calcite | Level 5

    Thanks for the helpful answers guys, I'll follow up on your responses. Art, I've already tried using a proc sql, and it seemed to accomplish the same thing, but also still with the 64k limitation.

    My problem is as follows:  I have a large set of data which we can say represents empirical, or actual losses. I'd like to generate a large number of random variables from the distribution of these losses. Since this is a distribution which can't be described exactly using some of the built-in parametric distributions available in SAS, I'd like to use something like the svrtutil_percentile utility function.

    I begin by generating a large number of random uniform variables (rand('UNIFORM')), and will get results something like 0.10, 0.70, 0.60....etc. I'd like to translate those numbers in to the 10% quantile value, 70% quantile value. 60% quantile value, etc from the original set of empirical losses. To do this, at least using the percentile function, for each uniform random variable I need to pass the array of actual losses and array of edf values (calculated using proc severity). This becomes difficult when using something over 2000-3000 original loss observations because the size of the variable surpasses the 64k limit.

    My SAS knowledge is fairly limited so maybe there's some easy solution that is out there. In Matlab, which I am more familiar with, this would be fairly straightforward, just to define an entire column of some variable as an input array. In SAS it seems this is not so simple.

    Reeza
    Super User

    Second to Art & Data _null_.

    You don't need macro's or arrays. You probably need a data step or a hash table.

    You may want to see Rick Wicklin's books and investigate IML if you like.

    SAS Press - Rick Wicklin Author Page

    data_null__
    Jade | Level 19

    Art Carpenter wrote:

    3.  If you do really need arrays, and we have not seen enough of what you are doing yet to know for sure, then you might consider skipping the macro language and going to either DATA step arrays or hash tables, neither have the size limitations of macro variables (64K).

    This!

    Astounding
    PROC Star

    Well, I'm trying to wrap my head around what really needs to be done here.  Here are a couple of questions.  Maybe your comments will help me figure out what needs to happen.

    First, what is the difference between what you want and a random sampling of the data set (or perhaps a set of random samples)?  PROC SURVEYSELECT will accomplish this rather easily.

    Second, what is the purpose of applying RAND("UNIFORM")?  Are you trying to accomplish something more complex, like generating a random sample of all those values that fall into the 70th percentile?

    Usually, half the battle is figuring out what needs to be solved.  I know you've tried, but a little more explanation would help.

    sas-innovate-2024.png

    Don't miss out on SAS Innovate - Register now for the FREE Livestream!

    Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

     

    Register now!

    How to Concatenate Values

    Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

    Find more tutorials on the SAS Users YouTube channel.

    Click image to register for webinarClick image to register for webinar

    Classroom Training Available!

    Select SAS Training centers are offering in-person courses. View upcoming courses for:

    View all other training opportunities.

    Discussion stats
    • 6 replies
    • 1084 views
    • 6 likes
    • 6 in conversation