- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
My goal is to combine these tables into one without having to manually run my macro each time for each column.
The code I currently have with me is the following:
%macro task_Oct(set,col_name);
data _type_;
set &set;
call symputx('col_type', vtype(&col_name));
run;
proc sql;
create table work.oct27 as
select "&col_name" as variable,
"&col_type" as type,
nmiss(&col_name) as missing_val,
count(distinct &col_name) as distinct_val
from &set;
quit;
%mend task_Oct;
%task_Oct(sashelp.cars,Origin)
The above code gives me the following output:
|Var |Type |missing_val|distinct_val|
|Origin|Character|0 | 3 |
But the sashelp.cars data sheet has 15 columns and so I would like to output a new data sheet which has 15 rows with 4 columns.
I would like to get the following combined table as the output of my code:
|Var |Type |missing_val|distinct_val|
|Make |Character|0 | 38 |
|Model |Character|0 | 425 |
|Type |Character|0 | 6 |
|Origin|Character|0 | 3 |
... ... ...
Since I'm using a macro, I could run my code 15 different times by manually entering the names of the columns and then merging the tables into 1; and it wouldn't be a problem. But what if I have a table with 100s of columns? I could use some loop statement but I'm not sure how to go about that in this case. Help would be appreciated. Thank you.
I wish to solve this task using DO loops.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Is it a requirement to use Do Loops? And why?
Or do you simply want a table with col name, type, the number of missing values and the number of distinct values?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
My current code allows me to view the table for one column and I think that if I use DO loops, I can get a table for each column and then if I use PROC Append, I can just stack all the data sets together to get one nice looking table. This way I can also use elements of my current code without them having to go waste and having to rewrite a completely fresh code. Sure, there may be faster ways of doing this (without using DO loops) but I'm currently learning SAS and if there is a way to solve this task with DO loops, I would be interested in that.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ok. As pointet out by others, I do not recommend using this method. But since you're learning SAS and this is in a learning context, I have expanded your code a bit to help you.
When you say Do Loop, I assume you want to use a macro %do loop in your code and do what you've already written for as many variables available in the input data. See the code below and feel free to ask further
%macro task_Oct(set);
proc sql;
select nvar into :nvar
from dictionary.tables
where libname=upcase(scan("&set.", 1, '.'))
and memname=upcase(scan("&set.", 2, '.'));
quit;
%do i = 1 %to &nvar.;
proc sql;
select name, type
into :name, :type
from dictionary.columns
where libname=upcase(scan("&set.", 1, '.'))
and memname=upcase(scan("&set.", 2, '.'))
and varnum = &i.;
quit;
proc sql;
create table table_&i. as
select distinct "&name" as variable
, "&type" as type
, cmiss(&name) as missing_val
, count(distinct &name) as distinct_val
from &set.;
quit;
%end;
data want;
set table_:;
run;
%mend task_Oct;
%task_Oct(sashelp.cars)
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@SASuser4321 wrote:
I wish to solve this task using DO loops
In general, it is a mistake to specify the method of solution (in this case a DO loop), rather than to say "This is the output I want". By specifying the method of solution (DO loop), you miss a much simpler solution. Also, DO loops in SAS ought to be the last resort, rather than the first choice.
PROC FREQ with the NLEVELS option does what you want, without any looping.
ods select nlevels;
proc freq nlevels data=sashelp.cars;
tables _all_;
run;
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
proc freq doesn't tell me that there are 2 missing values in the Cylinders column in the sashelp.cars data set. It counts the 2 "." as 1, while I would like the output table of the code to make it clear that there are indeed 2 missing values.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Okay, got it. You will have to combine data from several sources (PROC FREQ, PROC SUMMARY, PROC CONTENTS), but again no DO loops needed.
ods select none;
proc freq nlevels data= sashelp.cars;
ods output nlevels=nlevels;
tables _all_;
run;
ods select all;
proc summary data=sashelp.cars;
var _numeric_;
output out=_summary_ nmiss=;
run;
proc transpose data=_summary_(drop=_type_ _freq_) out=_summary_t;
run;
proc sort data=_summary_t;
by _name_;
run;
proc contents noprint data=sashelp.cars out=_contents_;
run;
proc sort data=nlevels;
by tablevar;
run;
proc sort data=_contents_;
by name;
run;
proc format;
value typef 1='Numeric' 2='Character';
run;
data final;
merge nlevels(rename=(tablevar=name)) _contents_(keep=name type) _summary_t(rename=(_name_=name col1=n_missing));
by name;
format type typef.;
run;
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@PaigeMiller In this code, how did you know that we had to write "_numeric_" and not something else after var?
And what is the purpose of "nmiss="? Thanks.
proc summary data=sashelp.cars; var _numeric_; output out=_summary_ nmiss=; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
_NUMERIC_ allows PROC SUMMARY to operate on all numeric variables, I don't have to list them by name.
NMISS= calculates the number of missing values for each variable in the VAR statement.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Nothing after the equal sign sets the new variable name (the name of the new variable that has the count of number of missings) to be the same as the original variable name. On the other hand if you had NMISS=POTATO, then the new variable name is POTATO. It was fine to use NMISS= in this case.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@PaigeMiller Thank you. But when I run your code, I don't see the number of missing values for the character variables in the final table. I think this could be because your summary table (_summary_t) has 10 rows (only numeric) instead of 15?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@SASuser4321 did you try my code above?