Dear experts,
Library | Library Member Name | Data Set | Special | Variable Name | Variable | Variable | Variable | Variable | Variable Format |
Name |
| Label | Data Set |
| Type | Length | Number | Label |
|
|
|
| Type (From |
|
|
|
|
|
|
|
|
| TYPE=) |
|
|
|
|
|
|
IN | AG_201602 |
|
| AB | 1 | 8 | 43 |
| BEST |
IN | AG_201602 |
|
| ACTIF | 2 | 1 | 61 |
| $ |
IN | AG_201602 |
|
| ALT_RV_NR | 1 | 8 | 71 |
| BEST |
IN | AG_201602 |
|
| ALT_VNR | 1 | 8 | 70 |
| BEST |
proc sql; create table DC.input_analysis as select t.*
,t1.Variable_Name_Original
,t1.Variable_Name_English
,t1.Variable_description
,t1.Variable_value_list
,t1.Variable_category
,t1.Variable_min
,t1.Variable_max
from DC.input_analysis t
left outer join In.Variables_information t1 on (t.variable_name=t1.Variable_Name_Original or t.variable_name=t1.Variable_Name_English)
;run;
proc sql; create table DC.input_analysis_result as select
t.*
,0 as n_missing
,0 as perc_missing
,0 as over_up_bound
,0 as max
,0 as over_dow_bound
,0 as min
,0 as n_not_value_list
," " as unexpected_values
from DC.input_analysis t ;run;
5. I would like to perform some calculations (here below the first one) recursively for each observation of the proc contents (variable name):
%let variable_x= ACTIF
proc summary noprint
data=test n nmiss min max;
var &variable_x.;
output out=missing_stat n=num_obs nmiss=num_miss min=min_value max=max_value
;run;
6. And store the results in the table DC.input_analysis_result
How can I automatize it and keep it as simple as possible? Thank a lot in advance.
@Astounding wrote:
Rather than try to loop through every variable, you could make life much simpler along these lines:
proc summary data=test;
var _numeric_;
output out=nmiss (drop=_type_ _freq_) nmiss=;
output out=n (drop=_type_ _freq_) n=;
output out=max (drop=_type_ _freq_) max=;
output out=min (drop=_type_ _freq_) min=;
run;
The original variable names get reused to hold the 4 statistics, and you don't need to know the names.
PROC MEANS using STACKODS makes a much nicer output file structure. One row for each var and class level with statistics as variables.
Rather than try to loop through every variable, you could make life much simpler along these lines:
proc summary data=test;
var _numeric_;
output out=nmiss (drop=_type_ _freq_) nmiss=;
output out=n (drop=_type_ _freq_) n=;
output out=max (drop=_type_ _freq_) max=;
output out=min (drop=_type_ _freq_) min=;
run;
The original variable names get reused to hold the 4 statistics, and you don't need to know the names.
@Astounding wrote:
Rather than try to loop through every variable, you could make life much simpler along these lines:
proc summary data=test;
var _numeric_;
output out=nmiss (drop=_type_ _freq_) nmiss=;
output out=n (drop=_type_ _freq_) n=;
output out=max (drop=_type_ _freq_) max=;
output out=min (drop=_type_ _freq_) min=;
run;
The original variable names get reused to hold the 4 statistics, and you don't need to know the names.
PROC MEANS using STACKODS makes a much nicer output file structure. One row for each var and class level with statistics as variables.
It's entirely possible that this would fit better. In may depend on the ultimate objective of how the statistics will be used.
Is the option STATODS or STACKODS?
stackods
@Astounding wrote:
It's entirely possible that this would fit better. In may depend on the ultimate objective of how the statistics will be used.
Is the option STATODS or STACKODS?
It is easier to start with a more normal (narrow) data scructure and make is less normal (wide) than the other way around.
I always thought the default OUTPUT data set structure should be enhanced to allow more _STAT_s to be output but they "gave" is STACKODS instead.
Thanks a lot! very appreciated. The output of the proc means with stackods is exactly what I was looking for.
Still one remark: I would like also to have the not numeric variable in the list (of course without min and max) with the amount of missing values. Is possible to integrate it in the calculation or should I do it separately?
Still 2 statistics that I would like to have with the same output structure:
,t1.Variable_min,t1.Variable_max indicating the upper and lower bound.
Min and max generated within the proc means are already a helpful indication but my aim is also to get the amount of observation for the numeric variables that do not satisfy the boundary conditions.
Is there also in this case a compact and easy understandable code?
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.