Dear experts,
| Library | Library Member Name | Data Set | Special | Variable Name | Variable | Variable | Variable | Variable | Variable Format | 
| Name | 
 | Label | Data Set | 
 | Type | Length | Number | Label | 
 | 
| 
 | 
 | 
 | Type (From | 
 | 
 | 
 | 
 | 
 | 
 | 
| 
 | 
 | 
 | TYPE=) | 
 | 
 | 
 | 
 | 
 | 
 | 
| IN | AG_201602 | 
 | 
 | AB | 1 | 8 | 43 | 
 | BEST | 
| IN | AG_201602 | 
 | 
 | ACTIF | 2 | 1 | 61 | 
 | $ | 
| IN | AG_201602 | 
 | 
 | ALT_RV_NR | 1 | 8 | 71 | 
 | BEST | 
| IN | AG_201602 | 
 | 
 | ALT_VNR | 1 | 8 | 70 | 
 | BEST | 
proc sql; create table DC.input_analysis as select t.*
,t1.Variable_Name_Original
,t1.Variable_Name_English
,t1.Variable_description
,t1.Variable_value_list
,t1.Variable_category
,t1.Variable_min
,t1.Variable_max
from DC.input_analysis t
left outer join In.Variables_information t1 on (t.variable_name=t1.Variable_Name_Original or t.variable_name=t1.Variable_Name_English)
;run;
proc sql; create table DC.input_analysis_result as select
t.*
,0 as n_missing
,0 as perc_missing
,0 as over_up_bound
,0 as max
,0 as over_dow_bound
,0 as min
,0 as n_not_value_list
," " as unexpected_values
from DC.input_analysis t ;run;
How can I automatize these calculation keep it as simple as possible? Thank a lot in advance.
Hi,
Well, I am not writing it for you. Here is an example using sashelp.class and a input dataset. Basically the idea is from your test metadata, to generate a call out to one of the test macros. You can have as many test macros, or calls to them as you like:
data input_analysis_res;
  length variable_name_original variable_list variable_min variable_max $200;
  variable_name_original="NAME"; output;
  variable_name_original="SEX"; variable_list="M F"; output;
  variable_list="";
  variable_name_original="AGE"; variable_min="14"; variable_max="20"; output;
  variable_name_original="HEIGHT"; variable_min="57"; variable_max="63"; output;
  variable_name_original="WEIGHT"; variable_min="90"; variable_max="120"; output;
run;
proc sql;
  create table RESULTS (VAR char(200),TEST char(200),ID char(200),VAL char(200));
quit;
%macro Check_List (ds=,var=,vlist=);
  proc sql;
    insert into RESULTS 
    select  distinct
            "&VAR.",
            "In List",
            NAME,
            &VAR.
    from    &DS.
    where   &VAR. not in (&VLIST.);
  quit;
%mend Check_List;
%macro Check_Range (ds=,var=,vmin=,vmax=);
  proc sql;
    insert into RESULTS
    select  distinct
            "&VAR.",
            "Outside Range &VMIN. - &VMAX.",
            NAME,
            strip(put(&VAR.,best.))
    from    &DS.
    where   &VAR. not between &VMIN. and &VMAX.;
  quit;
%mend Check_Range;
data _null_;
  set input_analysis_res;
  if variable_list ne "" then 
    call execute(cats('%Check_List (ds=sashelp.class,var=',variable_name_original,',vlist="',tranwrd(strip(variable_list)," ",'" "'),'");'));
  if variable_min ne "" then 
    call execute(cats('%Check_Range (ds=sashelp.class,var=',variable_name_original,',vmin=',variable_min,',vmax=',variable_max,');'));
run;
					
				
			
			
				
			
			
			
				
			
			
			
			
			
		Hi,
What may be easier, is if you can post some test data (in the form of a datastep) and what you want the output to look like. I would imagine a simple datastep with call execute() should be able to achieve what you want. Its a bit difficult to say without seeing the actual data and what DC.input_analysis looks like.
data have;
  ab="ABC"; output;
  ab="DEF"; output;
run;
data test;
  var='AB'; test='ABC';
run;
proc sql;
  create table RESULTS (TEST char(200),RESULT char(200));
quit;
data _null_;
  set test;
  call execute('proc sql; insert into RESULTS set TEST="where '||strip(var)||' ne '||strip(test)||'",
                  RESULT=case when exists(select distinct '||strip(var)||' from HAVE where '||strip(var)||' ne "'||strip(test)||'") then "Fail" else "Pass" end; quit;');
run;
					
				
			
			
				
			
			
			
			
			
			
			
		HI RW9, attached you find:
- DC.input_analysis: input data set (data from the proc contents enhanced)
- test: data set to be tested
- expected results: in green the data that I already get running the proc means below:
proc means
data=test STACKODS n nmiss range min max;
var _numeric_;
ods output summary=stacked
;run;
In yellow the data I would like to get in an easy manner. My idea was: take individually each variable of the data set 'DC.input_analysis' (Macro + loop) and store the result obtained in 'DC.input_analysis'.
Thanks for your support,
Yes, so you can do something along these lines, within a if block, define the code to be checked and update in your master results table:
data _null_;
  set dc.input_analysis;
  if variable_value_list ne "" then do;
    call execute('proc sql; insert into RESULTS select distinct "Value not in list",CATEGORY from HAVE 
                    where CATEGORY not in ("'||strip(tranwrd(variable_value_list,",",'","'))||'");
                  quit;');
  end;
  if variable_min ne "" then do;
    call execute('proc sql; insert into RESULTS select distinct "Not in range",PAYMENT_DATE from HAVE 
                    where PAYMENT_DATE not between input("'||strip(VARIABLE_MIN)||'",ddmmyy10.) and input("'||strip(VARIABLE_MAX)||'",ddmmyy10.);
                  quit;');
  end;
  ...
run;
					
				
			
			
				
			
			
			
			
			
			
			
		thanks a lot for the suggestion... I will check it.
Dear RW9,
I am trying to use your code but I am still doing something wron: here the the current version of my code:
data _null_;
set DC.input_analysis_res;
if Variable_value_list ne "" then do;
call execute('proc sql; insert into DC.input_analysis_res select distinct "Unexpected_values", variable_value_list in.test
where _all_ not in ("'||strip(tranwrd(variable_value_list,",",'","'))||'");
quit;');
end;
/*variable_min ne "" then do;
call execute('proc sql; insert into outliers_down select distinct "Not in range",PAYMENT_DATE from HAVE
where PAYMENT_DATE not between input("'||strip(VARIABLE_MIN)||'",ddmmyy10.) and input("'||strip(VARIABLE_MAX)||'",ddmmyy10.);
quit;');
end;*/
run;
Attached again the input file and the error message I got.
I do not get the role of the variable you described "category". Since I should do the same check for all the the variables I called it _ALL_.
Further help on it is welcome.
log screenshot
data _null_;
  set DC.input_analysis_res;
  if Variable_value_list ne "" then do;
    call execute('proc sql; 
                    insert into DC.input_analysis_res 
                    select distinct "Unexpected_values", variable_value_list in.test
                    where _all_ not in ("'||strip(tranwrd(variable_value_list,",",'","'))||'");
                  quit;');
  end;
run;
Well, I am not sure what:
variable_value_list in.test
 Is meant to be, its not valid SQL syntaxt.  If you take in.test out, add a from statement, and remove the _ALL_ reference it should work.  Oh, one thing you might need to bear in mind, if you put a long string in there, you will get a warning to indicate long string, to avoid this use:
options noquotelenmax;
Before, and afterwards:
options quotelenmax;
One other thing to remember, you can put your actual code in a macro and call that, this may help organisation:
%macro In_list (var=,vals=);
  proc sql; 
    insert into DC.input_analysis_res 
    select distinct "Unexpected_values", variable_value_list in.test
    where  &VAR. not in (&VALS.);
  quit;
%mend In_List;
data _null_;
  set DC.input_analysis_res;
  if Variable_value_list ne "" then call execute('%In_List (var='||strip(var)||',vals=',strip(variable_value_list),');');
run;
Dear RW9,
the structure with the macrois at list for me better organized and I prefer it but still, I do not get what I want. 😞
Based on my understanding:
1. you check for each row of the table DC.input_analysis_res if the variable Variable_value_list is not empty, in case not you go to the next step;
2. using call execute you call the specified macro, defining the required inputs:
2a. var='||strip(var)||;
2b. vals=',strip(variable_value_list);
3. the macro recursively for each row, select the variable (colum) in the table test (e.g. Nationality) extracts all the values assumed (e.g. Jpanese, French...) that are not &vals;
4. the values selected in 3 are stored as variable "Unexpected_values" in DC.input_analysis_res.
Did I understand correctly?
Here the part that I am using for the debug and attached the log.
%macro In_list (var=,vals=);
proc sql;
insert into DC.input_analysis_res
select distinct
"Unexpected_values",
variable_value_list in.test
where &VAR. not in (&VALS.);
quit;
%mend In_List;
data _null_;
set DC.input_analysis_res;
if Variable_value_list ne "" then call execute('%In_List (var='||strip(var)||',vals=',strip(variable_value_list),');');
run;
/* debug:
proc sql; create table aaa as select &var., count(*) as n from in.test where &var. not in (&vals.) group by &var.; run;
proc sql; create table aaa as select segment, count(*) as n from in.test where segment not in (&vals.) group by segment; run;
*/
The error is here:
if Variable_value_list ne "" then call execute('%In_List (var='||strip(var)||',vals=',strip(variable_value_
^ ^
Should be
if Variable_value_list ne "" then call execute('%In_List (var='||strip(var)||',vals='||strip(variable_value_
You use commas when using a function to concatenate, e.g. cat("abc","def"). Your not in this instance so you ues the double bars. Call execute accepts one string. That is what your basically doing here creating a string which then gets pushed out after the datastep stops. That string then becomes code, so is then syntax checked and compiled. If you struggling, check your string looks right first.
Dear RW9,
it gets better and better and when it will be working I can also adapt the same macro for defferent purposes but still not get the data I want. I googled for the error and I tryit to get more information from the debug (Options nomacrogen NoSymbolgen nomlogic nomprint nomfile.) but I do not get why the variable var is not initialized: ERROR: All positional parameters must precede keyword parameters. Attached the log.
The clue here is in the NOTE: Variable var is uninitialised. In your dataset DC.input_analysis_res, look for the variable containing the variable name of the test in question - could it be variable_name_english? The second thing to check, and I can't see it here, is what does variable_value_list contain, is there some values with a comma, e.g. A,B,D? If so then you will need to pass it as a string:
if ... call execute('%In_List (var='||strip(variable_name_english)||',vals="'||strip(variable_value_list)||'");');
Note the double quotes after vals=.
Dear RW9,
thanks for the indication, I now replaced all the "," used to separate the value in the value list with "_" and I do not get the error message anymore. 🙂
Anyway I am still struggling with a syntax error (attachment). Any idea? Thanks again.
Your text is still being generated as:
...select distinct "Unexepected values",variable_value_list in.test where . not in...
^ ^
The indicated part is not valid syntax.
Also, you have warnings about the variable VAR not existing in dataset DC.input_analysis_res, that is also why you have the var uninitialised. I can't see where it is coming from, you would need to show the macro, and the dataset (at least a few lines of it) which feed into it.
Dear RW,
sorry but I am getting confuse, let me do a short recap:
1. DC.Input_analysis_res is created as a result of a proc contents (in this case of the file in.test)
proc contents data=in.test
out=DC.input_analysis noprint
;run;
2. some variables are dropped and other renamed:
proc sql; create table DC.input_analysis as select 
(cats(trim(t.LIBNAME),trim(t.MEMNAME),trim(t.NAME))) as unique_ID_data_checks
,t.LIBNAME as library_name
,t.MEMNAME as data_set_name
,t.NAME as variable_name
,t.FORMAT
,t.LENGTH
,t.NOBS as observation_number
,t.MODATE as Last_Modified_Date from DC.input_analysis t ;run;
3. the following vatiables are added (e.g Variable_value_list) using an external data source:
proc sql; create table DC.input_analysis as select t.* 
,t1.Variable_Name_Original 
,t1.Variable_Name_English 
,t1.Variable_description /* variable description */ 
,t1.Variable_value_list /* possible values (not comma separated!!!!!!) */
,t1.Default_value
,t1.Variable_category /* Classification: date, amount, ID, key */
,t1.Variable_min /* lower bound */
,t1.Variable_max /* upper bound */
from DC.input_analysis t 
left outer join In.Variables_information t1 on (t.variable_name=t1.Variable_Name_Original or t.variable_name=t1.Variable_Name_English)
;run;
4. run the proc means and get already for the numeric variable in test the following information:
proc means
data=in.test STACKODS n nmiss range min max;
var _numeric_;
ods output summary=DC.stacked
;run;
5.Create the table DC.input_analysis_res where: the data from input analysis are taken and enhanced with the ones from DC.stacked and additionally 3 new variable are created (in bold):
proc sql;
create table DC.input_analysis_res as
select t1.*, t2.*,
0 as outliers_up,
0 as outliers_down,
" " as Unexpected_values
from DC.input_analysis t1 left outer join DC.stacked t2 on t1.variable_name=t2.Variable
;quit;
Till the point 5 everything is running smoothly. Now I should perform some checks that the command STACKODS was not able to do. For instance: find the value for each observation in DC.input_analysis_res, on the basis of Variable_value_list I would like to fill the variable Unexpected_values. How?Taking each observation of the variable variable_name in DC.input_analysis_res as variable in in.test and comparing the value of all observation for each variable with the value list given in DC.input_analysis_res (field Variable_value_list). In order to get what I want at the point 5, the macro you suggested seems to be the (almost) the perfect solution:
%macro In_list (var=,vals=);
Options nomacrogen NoSymbolgen nomlogic nomprint nomfile;
proc sql;
insert into DC.input_analysis_res
select distinct
"Unexpected_values",
variable_value_list in.test
where &VAR. not in (&VALS.);
quit;
%mend In_List;
data _null_;
set DC.input_analysis_res;
if Variable_value_list ne "" then call execute('%In_List (var='||strip(var)||',vals='||strip(variable_value_list)||');');
run;
Coming back to your question:
in in.test there is no variable named Variable_value_list (this was the input and it should remanin unchanged, the value list is given for each variable by all the observations).
in DC.input_analysis_res there is the variable Variable_value_list since it was taken as enahncement (left outer join) if the name would be worng, I would not get it in DC.input_analysis_res.
Did we have the same understanding of the same Situation? Thanks a lot RW9.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
