DATA Step, Macro, Functions and more

SAS: enhance proc contents with data quality checks for each variable (macro, loop)

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 127
Accepted Solution

SAS: enhance proc contents with data quality checks for each variable (macro, loop)

Dear experts,

 

  1. I have the following table (input_analysis) as result of a proc content:

 

Library

Library Member Name

Data Set

Special

Variable Name

Variable

Variable

Variable

Variable

Variable Format

Name

 

Label

Data Set

 

Type

Length

Number

Label

 

 

 

 

Type (From

 

 

 

 

 

 

 

 

 

TYPE=)

 

 

 

 

 

 

IN

AG_201602

 

 

AB

1

8

43

 

BEST

IN

AG_201602

 

 

ACTIF

2

1

61

 

$

IN

AG_201602

 

 

ALT_RV_NR

1

8

71

 

BEST

IN

AG_201602

 

 

ALT_VNR

1

8

70

 

BEST

 

  1. I enhance the data with the following additional information:

 

proc sql; create table DC.input_analysis as select t.*

,t1.Variable_Name_Original   

,t1.Variable_Name_English    

,t1.Variable_description     

,t1.Variable_value_list

,t1.Variable_category  

,t1.Variable_min 

,t1.Variable_max

from DC.input_analysis t

left outer join In.Variables_information t1 on (t.variable_name=t1.Variable_Name_Original or t.variable_name=t1.Variable_Name_English)

;run;

 

  1. On the basis of the new variables (from Variables_information t1), I should perform some checks on the data related to each observation of the proc contents (e.g. ACTIF contained in Work.Test).
  2. I enhance the table with some variables that should be calculated for each observation of the proc content (i. variables in Work.Test)

proc sql; create table DC.input_analysis_result as select

t.*

,0 as n_missing

,0 as perc_missing

,0 as over_up_bound

,0 as max

,0 as over_dow_bound

,0 as min

,0 as n_not_value_list

,"                         " as unexpected_values

from DC.input_analysis t ;run;

 

5. I would like to perform some calculations (here below the first one) recursively for each observation of the proc contents (variable name):

 

%let variable_x= ACTIF

proc summary noprint

data=test n nmiss min max;

var &variable_x.;

output out=missing_stat n=num_obs nmiss=num_miss min=min_value max=max_value

;run;

                            6.    And store the results in the table DC.input_analysis_result

 

How can I automatize it and keep it as simple as possible? Thank a lot in advance.


Accepted Solutions
Solution
‎03-29-2016 10:35 AM
Respected Advisor
Posts: 3,799

Re: SAS: enhance proc contents with data quality checks for each variable (macro, loop)

[ Edited ]
Posted in reply to Astounding

Astounding wrote:

Rather than try to loop through every variable, you could make life much simpler along these lines:

 

proc summary data=test;

var _numeric_;

output out=nmiss (drop=_type_ _freq_) nmiss=;

output out=n (drop=_type_ _freq_) n=;

output out=max (drop=_type_ _freq_) max=;

output out=min (drop=_type_ _freq_) min=;

run;

 

The original variable names get reused to hold the 4 statistics, and you don't need to know the names.


 

PROC MEANS using STACKODS makes a much nicer output file structure.  One row for each var and class level with statistics as variables.

View solution in original post


All Replies
Super User
Posts: 5,498

Re: SAS: enhance proc contents with data quality checks for each variable (macro, loop)

Posted in reply to Sir_Highbury

Rather than try to loop through every variable, you could make life much simpler along these lines:

 

proc summary data=test;

var _numeric_;

output out=nmiss (drop=_type_ _freq_) nmiss=;

output out=n (drop=_type_ _freq_) n=;

output out=max (drop=_type_ _freq_) max=;

output out=min (drop=_type_ _freq_) min=;

run;

 

The original variable names get reused to hold the 4 statistics, and you don't need to know the names.

Solution
‎03-29-2016 10:35 AM
Respected Advisor
Posts: 3,799

Re: SAS: enhance proc contents with data quality checks for each variable (macro, loop)

[ Edited ]
Posted in reply to Astounding

Astounding wrote:

Rather than try to loop through every variable, you could make life much simpler along these lines:

 

proc summary data=test;

var _numeric_;

output out=nmiss (drop=_type_ _freq_) nmiss=;

output out=n (drop=_type_ _freq_) n=;

output out=max (drop=_type_ _freq_) max=;

output out=min (drop=_type_ _freq_) min=;

run;

 

The original variable names get reused to hold the 4 statistics, and you don't need to know the names.


 

PROC MEANS using STACKODS makes a much nicer output file structure.  One row for each var and class level with statistics as variables.

Super User
Posts: 5,498

Re: SAS: enhance proc contents with data quality checks for each variable (macro, loop)

Posted in reply to data_null__

It's entirely possible that this would fit better.  In may depend on the ultimate objective of how the statistics will be used.

 

Is the option STATODS or STACKODS?

Respected Advisor
Posts: 3,799

Re: SAS: enhance proc contents with data quality checks for each variable (macro, loop)

[ Edited ]
Posted in reply to Astounding

stackods

 


Astounding wrote:

It's entirely possible that this would fit better.  In may depend on the ultimate objective of how the statistics will be used.

 

Is the option STATODS or STACKODS?


 

It is easier to start with a more normal (narrow) data scructure and make is less normal (wide) than the other way around.

 

I always thought the default OUTPUT data set structure should be enhanced to allow more _STAT_s to be output but they "gave" is STACKODS instead.

Frequent Contributor
Posts: 127

Re: SAS: enhance proc contents with data quality checks for each variable (macro, loop)

Posted in reply to data_null__

Thanks a lot! very appreciated. The output of the proc means with stackods is exactly what I was looking for.

Still one remark: I would like also to have the not numeric variable in the list (of course without min and max) with the amount of missing values. Is possible to integrate it in the calculation or should I do it separately?

 Still 2 statistics that I would like to have with the same output structure:

  1. in the table that I mentioned above (create table DC.input_analysis) there is a variable named Variable_value_list that contains for all the non numeric variables the value lists. E.g. for the Variable Source I have AG, SA. Check requested: running a proc sql for each not numeric variable and grouping by variable in order to get the value list and list all the values that are not in the value list (e.g. I found AB instead of AG and SA put not like in (‘%AG%’,’%SA%’)
  2. in the table that I mentioned above (create table DC.input_analysis) I have also to variables

,t1.Variable_min,t1.Variable_max indicating the upper and lower bound.

Min and max generated within the proc means are already a helpful indication but my aim is also to get the amount of observation for the numeric variables that do not satisfy the boundary conditions.

 

Is there also in this case a compact and easy understandable code?

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 373 views
  • 2 likes
  • 3 in conversation