BookmarkSubscribeRSS Feed
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

 

Can you start from the step before.  What is the data you are using (and if thats derived from what), as this macro seems to be doing functionality which is already in SAS?  Now we could fix it like below, however its not an idea approach as it leaves SAS to guess what your data is - again, this is a fundamental principal of a strongly typed language the emphasis being on you to know your data, even more so if your writing code that is to be shared around.  I think part of your problem straigh away comes from using the macro variable as text rather than a variable:

case when &VAR. > Variable_max then 1 else 0 end as n_over,
^ note without the quotes around it, so the variable is used rather then text "&VAR."

However I am flying blind here, need test data to provide good working code.

Sir_Highbury
Quartz | Level 8

Dear R9,

 

I changed the code as you suggested (here below)

/* check 02: macro count outliers */
%macro macro_outliers (var=);
proc sql; create table upper_bound as select Variable_max from DC.input_analysis_res where variable_name="&VAR" ;quit;
proc sql; create table lower_bound as select Variable_min from DC.input_analysis_res where variable_name="&VAR" ;quit;
proc sql; create table base as select distinct "&VAR" from in.test ;quit;
proc sql; create table base as select a.*, b.*, c.* from base a, upper_bound b, lower_bound c ;quit;
proc sql; create table num_ourliers as select
case when &VAR>Variable_max then 1 else 0 end as n_over,
case when &VAR<Variable_min then 1 else 0 end as n_under

But then I got the following error:

proc sql;
2 +
create table num_ourliers as select case when
3 + ALTERNATIVEBEGIN>Variable_max then 1 else 0 end as n_over, case when
ALTERNATIVEBEGIN<Variable_min then 1 else 0 end as n_under from base ;
ERROR: The following columns were not found in the contributing tables: ALTERNATIVEBEGIN.
from base ;quit;
proc sql; create table num_ourliers as select sum(n_over) as num_over, sum(n_under) as num_under from num_ourliers ;quit;
proc sql; update DC.input_analysis_res set outliers_up = (select num_over FROM num_ourliers) where variable_name="&VAR" ;quit;
proc sql; update DC.input_analysis_res set outliers_down = (select num_under FROM num_ourliers) where variable_name="&VAR" ;quit;
%mend macro_outliers;

 

additional clarification, the variable var works in two ways:

1. search the "variable" as an observation of the variable "variable_name" which is the output of a proc content enhance with some values, in this case the relevant are variable_max and variable_min

table example

variable_name lenght table ... variable_min variable_max

ALTERNATIVEBEGIN ................ 01.01.2015 31.12.2015

gender ..... .............................. - -

age ........................................ 18 40

 

2. search in a table where the variable is really a variable

table example:

id ALTERNATIVEBEGIN gender age

x1 01.01.2014 m 25

x2 02.02.2015 m 50

 

I would the macro to generate the following output, as enhancement in the table described at

variable_name lenght table ... variable_min variable_max outliers_up outliers_down outliers_up

ALTERNATIVEBEGIN ................ 01.01.2015 31.12.2015 1 0 

gender ..... .............................. - - - -

age ........................................ 18 40 0 1

 

Attached you find a screenshot of the output (I got something but it does nto look meaningful) and in the next post the log I got.

Thanks a lot guys!


output.PNG
ballardw
Super User

It may help to consider how your DC.input_analysis_res data set is built. Is this read as the result of another application or generated by SAS? If so how? Since you are playing around with something that apparently has a column named "variable_name" it may be that TRANSPOSING the data would give you columns of the variable name with numeric values.

 

Also if you built the DC.input_analysis_res dataset in SAS then you could inspect the INPUT data set for that using the dictionary.tables to determine the original variable type. Make that table name parameters in the macro, determine the type before going into any of that SQL code. Then as needed you would have the type and could do things like

 

%if vartype= 'N' %then %do;

   <the code for when that variable is expected to be numeric>

%end;

%else %do;

   <what to do when you hope the variable represents numeric values >

 

%end;

 

BTW code such as

case when &VAR>Variable_max then 1 else 0 end as n_over

can be written as

(&VAR>Variable_max ) as n_over

Sir_Highbury
Quartz | Level 8

dear ballardw,

 

thanks a lot for your hint, I am trying to play with it.

Still about the background: I run the proc contents on a table that I would like to analyze, I extendend the proc contents with some values (value lists, default values, number_of_missing, val_max, val_min, outliers_up, outliers_down values_not_in_v_list) taken from an external excel file. Some of these values in the extended proc contents are used merely as input to perform the checks (e.g. value list and default value or val_max and val_min) otherone should be calculated (e.g. outliers_up and outliers_down or values_not_in_v_list). basically I am executing 3 macro (currently only ones run perfectly thanks to rheinard) in order to get all the fields of the extended proc content filled (of course if meaninfull, if a variable has a value list, I will not calculate the outliers_up and viceversa). I will keep you posted. Thanks again.

ballardw
Super User

You can get all of the information available from proc contents using the Dictionary.columns table. Considering how you are using it that might be a better starting point then proc contents.

 

Take a look at this to se what is in the dictionary table:

Proc sql;

   describe table dictionary.columns;

quit;

 

You can get the information about the variables using:

Proc sql;

   create tableinfo as

   select *

   from dictionary.columns

      where libname='LIB' and memname='DATASET';

quit;

The libname and memname values either must be in upper case or use Upcase('value') so that theywill match those stored in the table. If the dataset of interest is in the WORK library it must still be referenced as the query will not default to a value for library.

 

It may very well be that the above table information could be used for Call Execute statements easier than macro logic.

You may also have an issue with your import from Excel if some values are coming is as text but need to be treated as numeric. This seems very likely as not many people spend time generating MAX and MIN values for character values like City names or product codes.

Sir_Highbury
Quartz | Level 8

Thanks a lot! I will try and let you know...

Tom
Super User Tom
Super User

One change you should make that will help prevent other errors, even if it doesn't fix your current problem is to add %NRSTR() around the macro names in your CALL EXECUTE statments.  

call execute('%nrstr(%macro_outliers) (var='||strip(variable_name)||');');

This will prevent SAS from resolving the macro while it pushes the code onto the stack to execute. Instead it will just push the macro call onto the stack.  So instead of seeing this in your SAS log:

1    + proc sql;

You will see 

1    + %macro_outliers(var=....);

This should also make the MPRINT() lines appear at a more apporpriate place in your LOG.

 

Besides making the log easier to read the real benefit is to prevent timing issues when resolving macro variable references.  When the macro runs durign the CALL EXECUTE statement and the generated code is pushed onto the stack (instead of just having the macro call on the stack) any executable steps (like PROC SQL and DATA STEPS) that change macro variable values will have their code generated, but the execution will not happen until after the current data step finishes.  So if the next part of the macro expects to test the generated macro variable or use it to generate more code it will not have the value that would have been generated had the step actually run.

Tom
Super User Tom
Super User

Looks like your macro is still not clear on when it is referencing the name of a variable and when it is referencing the value of a variable (even if the content of that variable is the names of other variables).  For example this line makes no sense.

 

create table base as select distinct "&VAR" from in.test ;

If you want to use the macro variable value as a string literal (since it is in quotes) then you need to tell SQL what name to use for the variable.  

create table base as select distinct "&VAR" as VARNAME from in.test ;

If you want &VAR to mean the name of variable that is in the IN.TEST dataset then remove the quotes.  Or you could add the letter N after the quotes to tell SAS that it is a name literal and not a string literal.

Sir_Highbury
Quartz | Level 8

Brilliant guys! I really appreciate our suggestions for improvement... hopefully by the end of the day I will come back to SAS and I will let you know. SH

Tom
Super User Tom
Super User

What is the INPUT dataset (or datasets?) to this macro?

It looks to me like you have two input datasets (even if they are not listed as parameters to macro).  One that has summary information and one that has actual data.

This bit of code looks a littel convoluted.

proc sql; create table upper_bound as select Variable_max from DC.input_analysis_res where variable_name="&VAR" ;quit;
proc sql; create table lower_bound as select Variable_min from DC.input_analysis_res where variable_name="&VAR" ;quit;
proc sql; create table base as select distinct "&VAR" from in.test ;quit;
proc sql; create table base as select a.*, b.*, c.* from base a, upper_bound b, lower_bound c ;quit;
proc sql; create table num_ourliers as select 
case when &VAR>Variable_max then 1 else 0 end as n_over,
case when &VAR<Variable_min then 1 else 0 end as n_under

Did you mean something like this?

 

%macro outlier
(var=
,out=num_outliers
,dsn=IN.TEST
,limits=DC.input_analysis
);
proc sql noprint;
 create table &out as
 select "&var" as variable_name length=32
      , sum(case when a.&var > b.variable_max then 1 else 0 end) as n_over
      , sum(case when a.&var < b.variable_min then 1 else 0 end) as n_under
      , sum(missing(a.&var)) as n_missing
 from &dsn a
    , &limits b
 where b.variable_name="&var"
 ;
quit;
%mend outliers ;

 

Sir_Highbury
Quartz | Level 8

Dear Tom,

 

exactly I have 2 input data set:

1. the main data set with variables and observations

data have;
infile datalines dsd;
input a b c d;
datalines;
xx, 5, 100, 3
xy, 5, 2, 3
xz, 5, 3, 3
xy, 5, 2, 3
run;

 

2. the summary data set obtained running a proc content and extending it with some external data: e.g. the maximum expected value and the minimum expected value

 

data content;
infile datalines dsd;
input variable min max;
datalines;
a, -, -
b, -50, 50
c, -50, 50
d, -50, 50
run;

 

basically I wanna enahcne the summary as if follows:

data content_res;
infile datalines dsd;
input variable min max out_up out_down;
datalines;
a, -, -, -, -
b, -50, 50, 0, 0
c, -50, 50, 1, 0
d, -50, 50, 0, 0
run;

 

The table I have is much bigger, this is just an example to better explain my problem.

Thanks a lot for the suggestion but before to move to an evoluted you I would like step by step to fix the bus in mine and then further improve it:

 

here the current version of my code:

 

proc printto log="W:\03-OUT_outputfolder\SAS_LOG.txt" new;
run;
options mlogic SYMBOLGEN MPRINT;

 

/* check 02: macro count outliers */
%macro macro_outliers (var=);
proc sql; create table upper_bound as select Variable_max from DC.input_analysis_res where variable_name="&VAR" ;quit;
proc sql; create table lower_bound as select Variable_min from DC.input_analysis_res where variable_name="&VAR" ;quit;
proc sql; create table base as select distinct '&VAR' from in.test ;quit;
proc sql; create table base as select a.*, b.*, c.* from base a, upper_bound b, lower_bound c ;quit;

%if vartype='N' /* vartype(table,1)='N' */ %then %do;
proc sql; create table num_ourliers as select (&VAR>Variable_max) as n_over, (&VAR<Variable_min) as n_under from base ;quit;
proc sql; create table num_ourliers as select sum(n_over) as num_over, sum(n_under) as num_under from num_ourliers ;quit;
proc sql; update DC.input_analysis_res set outliers_up = (select num_over FROM num_ourliers) where variable_name="&VAR" ;quit;
proc sql; update DC.input_analysis_res set outliers_down = (select num_under FROM num_ourliers) where variable_name="&VAR" ;quit;
%end;
%else %do; %end;
%mend macro_outliers;

 

/* call macros */
data _null_;
set DC.input_analysis_res;

/* check 01: unexpected values
if Variable_value_list not in ("") then
call execute('%In_List (var='||strip(variable_name)||',vals='||strip(variable_value_list)||');');*/

/* check 02: count outliers */
if (Variable_min ne 0 and Variable_min ne 0) then
call execute('%macro_outliers (var='||strip(variable_name)||');');

/* check 03: number of missing and default values

 

attached a screen shot of the results I get, I marked in red the wrong one: since the maximum is 20.607 and the variable_max=10.000 I should get in the field outliers_up an integer equal or bigger than 1.
call execute ('%macro_missing (var='||strip(variable_name)||',def_val='||strip(Default_value)||');');*/
run;

proc printto;
run;


data1.PNG
Sir_Highbury
Quartz | Level 8

Here the log. I did not get any error but for the vaible alternativebegin I should get a certain amount of up outlier as above explained.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 27 replies
  • 4552 views
  • 7 likes
  • 5 in conversation