BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sir_Highbury
Quartz | Level 8

Dear experts,

 

  1. I have the following table (input_analysis) as result of a proc content:

 

Library

Library Member Name

Data Set

Special

Variable Name

Variable

Variable

Variable

Variable

Variable Format

Name

 

Label

Data Set

 

Type

Length

Number

Label

 

 

 

 

Type (From

 

 

 

 

 

 

 

 

 

TYPE=)

 

 

 

 

 

 

IN

AG_201602

 

 

AB

1

8

43

 

BEST

IN

AG_201602

 

 

ACTIF

2

1

61

 

$

IN

AG_201602

 

 

ALT_RV_NR

1

8

71

 

BEST

IN

AG_201602

 

 

ALT_VNR

1

8

70

 

BEST

 

  1. I enhance the data with the following additional information:

 

proc sql; create table DC.input_analysis as select t.*

,t1.Variable_Name_Original   

,t1.Variable_Name_English    

,t1.Variable_description     

,t1.Variable_value_list

,t1.Variable_category  

,t1.Variable_min 

,t1.Variable_max

from DC.input_analysis t

left outer join In.Variables_information t1 on (t.variable_name=t1.Variable_Name_Original or t.variable_name=t1.Variable_Name_English)

;run;

 

  1. On the basis of the new variables (from Variables_information t1), I should perform some checks on the data related to each observation of the proc contents (e.g. ACTIF contained in Work.Test).
  2. I enhance the table with some variables that should be calculated for each observation of the proc content (i. variables in Work.Test)

proc sql; create table DC.input_analysis_result as select

t.*

,0 as n_missing

,0 as perc_missing

,0 as over_up_bound

,0 as max

,0 as over_dow_bound

,0 as min

,0 as n_not_value_list

,"                         " as unexpected_values

from DC.input_analysis t ;run;

 

5. I would like to perform some calculations (here below the first one) recursively for each observation of the proc contents (variable name):

 

%let variable_x= ACTIF

proc summary noprint

data=test n nmiss min max;

var &variable_x.;

output out=missing_stat n=num_obs nmiss=num_miss min=min_value max=max_value

;run;

                            6.    And store the results in the table DC.input_analysis_result

 

How can I automatize it and keep it as simple as possible? Thank a lot in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

@Astounding wrote:

Rather than try to loop through every variable, you could make life much simpler along these lines:

 

proc summary data=test;

var _numeric_;

output out=nmiss (drop=_type_ _freq_) nmiss=;

output out=n (drop=_type_ _freq_) n=;

output out=max (drop=_type_ _freq_) max=;

output out=min (drop=_type_ _freq_) min=;

run;

 

The original variable names get reused to hold the 4 statistics, and you don't need to know the names.


 

PROC MEANS using STACKODS makes a much nicer output file structure.  One row for each var and class level with statistics as variables.

View solution in original post

5 REPLIES 5
Astounding
PROC Star

Rather than try to loop through every variable, you could make life much simpler along these lines:

 

proc summary data=test;

var _numeric_;

output out=nmiss (drop=_type_ _freq_) nmiss=;

output out=n (drop=_type_ _freq_) n=;

output out=max (drop=_type_ _freq_) max=;

output out=min (drop=_type_ _freq_) min=;

run;

 

The original variable names get reused to hold the 4 statistics, and you don't need to know the names.

data_null__
Jade | Level 19

@Astounding wrote:

Rather than try to loop through every variable, you could make life much simpler along these lines:

 

proc summary data=test;

var _numeric_;

output out=nmiss (drop=_type_ _freq_) nmiss=;

output out=n (drop=_type_ _freq_) n=;

output out=max (drop=_type_ _freq_) max=;

output out=min (drop=_type_ _freq_) min=;

run;

 

The original variable names get reused to hold the 4 statistics, and you don't need to know the names.


 

PROC MEANS using STACKODS makes a much nicer output file structure.  One row for each var and class level with statistics as variables.

Astounding
PROC Star

It's entirely possible that this would fit better.  In may depend on the ultimate objective of how the statistics will be used.

 

Is the option STATODS or STACKODS?

data_null__
Jade | Level 19

stackods

 


@Astounding wrote:

It's entirely possible that this would fit better.  In may depend on the ultimate objective of how the statistics will be used.

 

Is the option STATODS or STACKODS?


 

It is easier to start with a more normal (narrow) data scructure and make is less normal (wide) than the other way around.

 

I always thought the default OUTPUT data set structure should be enhanced to allow more _STAT_s to be output but they "gave" is STACKODS instead.

Sir_Highbury
Quartz | Level 8

Thanks a lot! very appreciated. The output of the proc means with stackods is exactly what I was looking for.

Still one remark: I would like also to have the not numeric variable in the list (of course without min and max) with the amount of missing values. Is possible to integrate it in the calculation or should I do it separately?

 Still 2 statistics that I would like to have with the same output structure:

  1. in the table that I mentioned above (create table DC.input_analysis) there is a variable named Variable_value_list that contains for all the non numeric variables the value lists. E.g. for the Variable Source I have AG, SA. Check requested: running a proc sql for each not numeric variable and grouping by variable in order to get the value list and list all the values that are not in the value list (e.g. I found AB instead of AG and SA put not like in (‘%AG%’,’%SA%’)
  2. in the table that I mentioned above (create table DC.input_analysis) I have also to variables

,t1.Variable_min,t1.Variable_max indicating the upper and lower bound.

Min and max generated within the proc means are already a helpful indication but my aim is also to get the amount of observation for the numeric variables that do not satisfy the boundary conditions.

 

Is there also in this case a compact and easy understandable code?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 2205 views
  • 2 likes
  • 3 in conversation