BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
MuTwo
Calcite | Level 5

Hi, I am sure I am missing something simple, but I have been spinning my wheels for hours on this and just don't know where to turn besides a forum. This is on SAS Enterprise guide 8.2

I am trying to create a table that will be used to make an MRL plot. Here is my code:

%macro calculate_avg(threshold, data_table);

    data _temp; 
        set &data_table(keep=Trended_Loss);
        where trended_loss > &threshold;
		excess = trended_loss - &threshold;
    run;

    proc sql noprint; 
        select avg(excess) 
		into :AAT 
        from _temp; 
    quit;

%mend calculate_avg;

data work.results;
	set work.percentiles;
	call execute(cats('%calculate_avg(', Percentile_threshold, ',work.Trended)'));
	
	Mean_Excess_Loss = &AAT;
run;

 

work.percentiles is a table is derived from work.Trended, containing the percentiles from 90 to 99.9 with a .1 step. What I am trying to do is pass the threshold based on the percentile into the macro and estimate the average above the threshold using the _temp table and the proc sql statement Since the table contains 100 different thresholds, I initially tried to make a do loop but that didn't seem to work out... If I remove noprint and comment out the Mean_Excess_loss line, I can see the results of the test at various percentiles, so the logic is sound. I just can't seem to get the value being estimated out of the macro and back into my results table. Here is the SAS log:

 

107 Mean_Excess_Loss = &AAT;
                       _
                       22
WARNING: Apparent symbolic reference AAT not resolved.
ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, a numeric constant, a datetime constant,
a missing value, INPUT, PUT.

108 run;

 

Any insights you have will be extremely helpful, thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
Quentin
Super User

I think a join could work, if you've got enough memory.  If it blows up, there are other approaches.

 

We will really need sample data to work with.  You haven't shared any, but the code you posted can make some.  So let's use sashelp.cars as a sample dataset.

 

Your code to make the percentiles data is like:

 

proc univariate data=sashelp.cars noprint;
	var mpg_city;
	output out=work.percentiles1 PCTLPTS=0 to 100 by 10 pctlpre=P;
run;

proc transpose data=work.percentiles1 out=work.percentiles2;
run;

data work.percentiles;
	set work.percentiles2;
	drop _label_;
	rename col1 = Percentile_threshold _name_=percentile;
	format col1 9.2;
run;

proc print data=percentiles ;
run ;

Work.perceniles looks like:

                      Percentile_
 Obs    percentile     threshold

   1       P0              10.00
   2       P10             15.00
   3       P20             17.00
   4       P30             18.00
   5       P40             18.00
   6       P50             19.00
   7       P60             20.00
   8       P70             21.00
   9       P80             23.00
  10       P90             26.00
  11       P100            60.00

With that, I can use a SQL join to make a cartesian product, so every row of your percentiles table will get joined to every row of sashelp.cars, and then calculate the mean with a group by.  I think this gives what you want:

proc sql ;
  create table want as
  select percentile,mean(mpg_city-percentile_threshold) as Mean_Excess_Loss
  from percentiles,sashelp.cars
  where mpg_city>percentile_threshold
  group by percentile
 ;
quit ;

proc print data=want ;
run ;

Can you show the SQL code you tried?  It it give the wrong result, or did it error out?

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.

View solution in original post

10 REPLIES 10
Kurt_Bremser
Super User

CALL EXECUTE puts code into the execution queue, so it will run after the current DATA step has ended. So you cannot expect AAT to be created while your step runs. BTW, AAT would not exist outside of the macro anyway as you omitted to make it global with a %GLOBAL statement.

I think that some kind of join will be a much better option than trying it through macro coding.

Please provide some example data for your datasets, and tell us how large your real datasets are (number of observations)

MuTwo
Calcite | Level 5

Hi, and thanks for the reply! work.percentiles created with the following code:

 

proc univariate data=work.Trended noprint;
	var Trended_loss;
	output out=work.percentiles PCTLPTS=90 to 91 by 1 pctlpre=P;
run;

proc transpose data=work.percentiles out=work.percentiles;
run;

data work.percentiles;
	set work.percentiles;
	drop _label_;
	rename col1 = Percentile_threshold;
	format col1 9.2;
run;

It has two fields, _NAME_ with obs like P90 and P91, and Percentile_Threshold, which are the given percentiles from work.Trended. Right now, I am only creating two thresholds for testing. Once this is working, it will be changed to PCTLPTS=90 to 99.99 by .01 to get more granular.

 

work.Trended has 7 fields and about 4 million observations, with the only field I really care about is Trended_loss, which is built up from the other 6 fields (other fields include ID, state, year, trends, and loss types).

Joining would be one option, but I need to iterate over the work.percentiles table, since I am trying to capture the mean excess loss at each threshold. I attempted using proc SQL within the data work.results step itself, but getting the value generated from the average was also not working.

Kurt_Bremser
Super User
Create an array of 3*1000 elements.
In a DATA step reading your 4 million dataset, at _N_ = 1 read the 1000 percentiles into the {1,i} elements.
For every observation in the main dataset, iterate through the array and cumulate the losses in the {2,i} elements, increasing the counts in the {3,i} elements.
When the main dataset has read the final observation, iterate through the array, calculate the means, and output.
Tom
Super User Tom
Super User

Just think about the order of operations.

Your tried to run this data step:

data work.results;
  set work.percentiles;
  call execute(cats('%calculate_avg(', Percentile_threshold, ',work.Trended)'));
  Mean_Excess_Loss = &AAT;
run;

First the MACRO PROCESSOR will pre-process the text and pass it onto SAS to run.  So assuming that AAT has already been defined with a value of 123 then that means you asked SAS to run:

data work.results;
  set work.percentiles;
  call execute(cats('%calculate_avg(', Percentile_threshold, ',work.Trended)'));
  Mean_Excess_Loss = 123 ;
run;

Now when that data step runs it will push this text like 

%calculate_avg(threshold1,work.Trended)
%calculate_avg(threshold2,work.Trended)
%calculate_avg(threshold3,work.Trended)
...

through CALL EXECUTE().

Again the macro pre-processor will execute those macro calls and the resulting SAS statements will be queued up to run after the data step finishes.

data _temp; 
set work.Trended(keep=Trended_Loss);
where trended_loss > threshold1;
excess = trended_loss - threshold1;
run;
proc sql noprint; 
select avg(excess) into :AAT from _temp; 
quit;
data _temp; 
set work.Trended(keep=Trended_Loss);
where trended_loss > threshold2;
excess = trended_loss - threshold2;
run;
proc sql noprint; 
select avg(excess) into :AAT from _temp; 
quit;
...

Since that code is overwriting the value of the macro variable AAT every time it really only matters value existed in the LAST observation in the dataset PERCENTILES.  So you really did not need to use call execute() and the macro processor to generate code for any of the earlier observations.

 

What is the actual operation you are trying to perform?

If you have the value of PERCENTILE_THESHOLD in an a ACTUAL variable, then why did you try to convert it into text so it could be stored into a MACRO variable and used to generate CODE?  Why not just combine the datasets so you can use the value of the actual variable?

Perhaps something like this?

proc sql;
create table WANT as 
  select mean(a.trended_loss - b.Percentile_threshold) as mean_excess
  from trended a
     , percentiles b
  where a.trended_loss > b.Percentile_threshold
;
quit;

 

MuTwo
Calcite | Level 5

The operation I'm trying to perform is the mean excess loss of the Trended field in work.Trended, at each threshold saved in the work.percentiles table. work.percentiles will have 1000 different thresholds going from 90th percentile to 99.99th percentile (currently only has 90th and 91st percentiles for testing), so a join won't work since I need the average excess at each level. I converted it to text to be called into a macro since my attempts at doing the proc sql directly in the data step didn't work since the threshold value was not being recognized as a value to be compared against in the where trended > threshold step of the proc sql.

Quentin
Super User

I think a join could work, if you've got enough memory.  If it blows up, there are other approaches.

 

We will really need sample data to work with.  You haven't shared any, but the code you posted can make some.  So let's use sashelp.cars as a sample dataset.

 

Your code to make the percentiles data is like:

 

proc univariate data=sashelp.cars noprint;
	var mpg_city;
	output out=work.percentiles1 PCTLPTS=0 to 100 by 10 pctlpre=P;
run;

proc transpose data=work.percentiles1 out=work.percentiles2;
run;

data work.percentiles;
	set work.percentiles2;
	drop _label_;
	rename col1 = Percentile_threshold _name_=percentile;
	format col1 9.2;
run;

proc print data=percentiles ;
run ;

Work.perceniles looks like:

                      Percentile_
 Obs    percentile     threshold

   1       P0              10.00
   2       P10             15.00
   3       P20             17.00
   4       P30             18.00
   5       P40             18.00
   6       P50             19.00
   7       P60             20.00
   8       P70             21.00
   9       P80             23.00
  10       P90             26.00
  11       P100            60.00

With that, I can use a SQL join to make a cartesian product, so every row of your percentiles table will get joined to every row of sashelp.cars, and then calculate the mean with a group by.  I think this gives what you want:

proc sql ;
  create table want as
  select percentile,mean(mpg_city-percentile_threshold) as Mean_Excess_Loss
  from percentiles,sashelp.cars
  where mpg_city>percentile_threshold
  group by percentile
 ;
quit ;

proc print data=want ;
run ;

Can you show the SQL code you tried?  It it give the wrong result, or did it error out?

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
MuTwo
Calcite | Level 5

Sorry it took me a while to get back to you, juggling a few different projects. This approach seems to be the best out of all listed. It can be slow, especially as the number of percentiles being evaluated increases. With 1000 percentiles being evaluated, takes about 3 minutes on my machine. Since I am starting at the highest level of data and will eventually stratify it down to subgroups, this works for me. Thank you for the clear explanation!

Tom
Super User Tom
Super User

You just need a GROUP BY in that situation.  The only reason a join wouldn't work would be if the data was too large to be replicated that many times.  But I doubt you want to do the calculation for more than 100 pertentiles, so it shouldn't get that much larger.

 

Example:

%let dsname=sashelp.class;
%let varname=height;

proc univariate data=&dsname. noprint;
	var &varname.;
	output out=percentiles PCTLPTS=70 to 90 by 10 pctlpre=P;
run;

proc transpose data=percentiles out=cutoffs ;
  var p: ;
run;

proc sql;
create table want as 
select a._name_ as percentile
     , count(*) as nobs
     , mean(b.&varname. - a.col1) as mean_excess_&varname.
from cutoffs a
   , &dsname. b
where b.&varname. > a.col1
group by 1
;
quit;

Results:

                              mean_
                             excess_
Obs    percentile    nobs     height

 1        P70          5     2.90000
 2        P80          3     2.83333
 3        P90          1     3.00000

 

AhmedAl_Attar
Ammonite | Level 13

Hi @MuTwo 

You may want to check out the DOSUBL function https://support.sas.com/resources/papers/proceedings13/032-2013.pdf

hope this helps,

Ahmed

Quentin
Super User

As Ahmed mentioned, DOSUBL can be used for code generation for a problem like this, but there are limitations.  

 

As explained by others, when you use CALL EXECUTE to generate code, your main DATA step generates the code, and the generated code is executed *after* the main step has completed.  This is why the timing doesn't work out in your attempt.

 

DOSUBL provides some *magical* functionality.  When you use DOSUBL to generate code, your main DATA step can pause during execution, then the generated code is executed in a separate magical side-session, then your main DATA step resumes.  On each iteration of your main DATA step, you can pause and execute a separate DATA step, and return the results to the main DATA step. 

 

Take your original macro, and turn it into a function-style macro, which returns the calculated mean:

%macro calculate_avg(data=,var=,threshold=);
  %local rc mean;
  %let rc = %sysfunc(dosubl(%nrstr(
    data _temp; 
      set &data(keep=&var);
      where &var > &threshold;
      excess = &var - &threshold;
    run;

    proc sql noprint; 
      select avg(excess) 
      into :mean trimmed
      from _temp; 
    quit;
  )));
&mean /*return*/
%mend calculate_avg;

You can use that macro like a function, so you could test with stuff like:

%put >>%calculate_avg(data=sashelp.cars,var=mpg_city,threshold=10)<< ;
%put >>%calculate_avg(data=sashelp.class,var=height,threshold=65.3)<< 

If you want to use that macro like CALL EXECUTE, you can use the RESOLVE function to invoke the macro once on each iteration of the DATA step, like:

data work.results;
  set work.percentiles;
  Mean_Excess_Loss =input(
                          resolve('%calculate_avg('
                                                 || ' data= sashelp.cars'
                                                 || ',var= mpg_city '
                                                 || ',threshold= '  || put(percentile_threshold,8.2)
                                                 || ')'
                                  ) 
                         ,8.
                          );

run;

You can see how it looks similar to using CALL EXECUTE.  But the timing works out, because the code is executed in a side session.

That said, there are important limitations to this approach. 

 

One is that your moving non-integer values between data step variables and macro variables, and whenever you do that you need to think carefully about how much precision you need.

 

The bigger limitation is that the price you pay for the magic of DOSUBL is that it is S-L-O-W.  When I made a percentiles dataset with 100 rows, the SQL step completed in 0 seconds. The DOSUBL step took 15 seconds.

 

It makes sense that DOSUBL is slow.  If you call it 100 times it needs to create and destroy a new side-session 100 times.  And in each side session it executes a  DATA step and a SQL step.

 

I changed your macro to a function-style macro, because I tend to use DOSUBL in the macro language setting.  But that is not essential.  You could keep your macro almost like it is, and just add a %GLOBAL statement so that it will generate a global macro variable:

 

%macro calculate_avg(data=,var=,threshold=);
   %global mean ;
   data _temp; 
     set &data(keep=&var);
     where &var > &threshold;
     excess = &var - &threshold;
   run;

   proc sql noprint; 
     select avg(excess) 
     into :mean trimmed
     from _temp; 
   quit;
%mend calculate_avg;

Then you use DOSUBL function like:

data work.results;
  set work.percentiles;
  rc=dosubl('%calculate_avg('
                         || ' data= sashelp.cars'
                         || ',var= mpg_city '
                         || ',threshold= '  || put(percentile_threshold,8.2)
                         || ')'
             );
  Mean_Excess_Loss=symgetn('mean') ;
run;
%symdel mean ;

With that approach on each iteration of the DATA step loop, %calculate_avg() will be invoked and will create a global macro variable named MEAN.  Then if you want to use that macro variable in the same step, you can use the SYMGETN function to retrieve the value.  

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 902 views
  • 0 likes
  • 5 in conversation