How to create a dataset with number of
1) Missing
2) Non-Missing
3) Unique value
in all column of a dataset, irespective if the filed is numeric or char.
column name | Missing | Non-Missing | Unique Values |
Colum1 | 2 | 8 | 3 |
Colum2 | 7 | 3 | 4 |
Colum3 | 10 | 0 | 6 |
Colum4 | 0 | 10 | 3 |
Colum5 | 1 | 9 | 9 |
Hello,
data have;
input x $ y;
cards;
1 .
2 6
. 6
4 8
. 10
6 10
7 .
. 16
9 16
. 20
;
run;
data _NULL_;
set have;
length _NAME $32.;
call execute('proc sql noprint; CREATE TABLE counts AS SELECT ');
do until (_NAME='_NAME');
i+1;
call vnext(_NAME);
if _NAME ne '_NAME' then
call execute(cats('"', _NAME, '" AS var_name',i,', sum(missing(', strip(_NAME),')) AS nmiss', i,
', (count(*)-CALCULATED nmiss', i,') AS nnonmiss', i,
', count(DISTINCT '|| _NAME, ') AS nuniq', i,','));
end;
call execute('1 AS kludge FROM have; quit;');
stop;
run;
data want;
set counts;
keep variable missing nonmissing unique;
array vars (*) var_name:;
array miss (*) nmiss:;
array nonmiss (*) nnonmiss:;
array uni (*) nuniq:;
do i=1 to dim(miss);
variable=vars(i);
missing=miss(i);
nonmissing=nonmiss(i);
unique=uni(i);
output;
end;
run;
Edit : added retrieval of variable names in the sql query.
Use a data step run from sashelp.vcolumn that creates the necessary SQL with call execute:
%let library=SASHELP;
%let memname=CLASS;
data _null_;
set sashelp.vcolumn (where=(libname = "&library." and memname = "&memname.")) end=eof;
if _n_ = 1 then call execute("
proc sql;
create table want as
");
if _n_ > 1 then call execute(" union all ");
call execute("
select
'" !! strip(name) !! "' as column_name,
sum(case when " !! strip(name) !! " is missing then 1 else 0 end) as missing,
sum(case when " !! strip(name) !! " is missing then 0 else 1 end) as non_missing,
count(distinct " !! strip(name) !! ") as unique_values
from sashelp.class
");
if eof then call execute("
;
quit;
");
run;
it keeps running for long time even for a small out. Can't see if this is giving any result or not. Please help...
@Srigyan wrote:
it keeps running for long time even for a small out. Can't see if this is giving any result or not. Please help...
Run it with a small subset (like obs=100) first; keep in mind that the number of columns is important, as it determines the number of passes through the dataset.
My test with sashelp.class took less than a tenth of a second here.
not working...taking too much time
data abc;
set sashelp.class(obs=10);
run;
%let library=work;
%let memname=abc;
data _null_;
set sashelp.vcolumn (where=(libname = "&library." and memname = "&memname.")) end=eof;
if _n_ = 1 then call execute("
proc sql;
create table want as
");
if _n_ > 1 then call execute(" union all ");
call execute("
select
'" !! strip(name) !! "' as column_name,
sum(case when " !! strip(name) !! " is missing then 1 else 0 end) as missing,
sum(case when " !! strip(name) !! " is missing then 0 else 1 end) as non_missing,
count(distinct " !! strip(name) !! ") as unique_values
from sashelp.class
");
if eof then call execute("
;
quit;
");
run;
Keep in mind that library and dataset names are always uppercase. Therefore you should set your macro variables to capitals, just as I did.
Hello,
data have;
input x $ y;
cards;
1 .
2 6
. 6
4 8
. 10
6 10
7 .
. 16
9 16
. 20
;
run;
data _NULL_;
set have;
length _NAME $32.;
call execute('proc sql noprint; CREATE TABLE counts AS SELECT ');
do until (_NAME='_NAME');
i+1;
call vnext(_NAME);
if _NAME ne '_NAME' then
call execute(cats('"', _NAME, '" AS var_name',i,', sum(missing(', strip(_NAME),')) AS nmiss', i,
', (count(*)-CALCULATED nmiss', i,') AS nnonmiss', i,
', count(DISTINCT '|| _NAME, ') AS nuniq', i,','));
end;
call execute('1 AS kludge FROM have; quit;');
stop;
run;
data want;
set counts;
keep variable missing nonmissing unique;
array vars (*) var_name:;
array miss (*) nmiss:;
array nonmiss (*) nnonmiss:;
array uni (*) nuniq:;
do i=1 to dim(miss);
variable=vars(i);
missing=miss(i);
nonmissing=nonmiss(i);
unique=uni(i);
output;
end;
run;
Edit : added retrieval of variable names in the sql query.
%let have=sd.customer;
data _NULL_;
set &have.;
call execute('proc sql noprint; CREATE TABLE counts AS SELECT ');
do until (NAME='NAME');
call vnext(NAME);
if NAME ne 'NAME' then
call execute(cat('sum(missing(', NAME,')) AS nmiss',NAME,', count(*)-CALCULATED nmiss', NAME,' AS nnonmiss', NAME, ', count(DISTINCT ', NAME, ') AS nuniq', NAME,','));
end;
call execute('1 AS kludge FROM &have.; quit;');
stop;
run;
Giving me following error:
NOTE: CALL EXECUTE generated line. 1 + proc sql noprint; 1 + CREATE TABLE counts AS SELECT 2 + sum(missing(iden)) AS nmissiden, count(*)-CALCULATED nmissiden AS nnonmissiden, count(DISTINCT iden) AS nuniqiden, 3 + sum(missing(titl)) AS nmisstitl, count(*)-CALCULATED nmisstitl AS nnonmisstitl, count(DISTINCT titl) AS nuniqtitl, 4 + sum(missing(prin)) AS nmissprin, count(*)-CALCULATED nmissprin AS nnonmissprin, count(DISTINCT prin) AS nuniqprin, 5 + sum(missing(birt)) AS nmissbirt, count(*)-CALCULATED nmissbirt AS nnonmissbirt, count(DISTINCT birt) AS nuniqbirt, 6 + sum(missing(cred)) AS nmisscred, count(*)-CALCULATED nmisscred AS nnonmisscred, count(DISTINCT cred) AS nuniqcred, 7 + sum(missing(stat)) AS nmissstat, count(*)-CALCULATED nmissstat AS nnonmissstat, count(DISTINCT stat) AS nuniqstat, 8 + sum(missing(post)) AS nmisspost, count(*)-CALCULATED nmisspost AS nnonmisspost, count(DISTINCT post) AS nuniqpost, 9 + sum(missing(date)) AS nmissdate, count(*)-CALCULATED nmissdate AS nnonmissdate, count(DISTINCT date) AS nuniqdate, 10 + sum(missing(date)) AS nmissdate, count(*)-CALCULATED nmissdate AS nnonmissdate, count(DISTINCT date) AS nuniqdate, 11 + sum(missing(date)) AS nmissdate, count(*)-CALCULATED nmissdate AS nnonmissdate, count(DISTINCT date) AS nuniqdate, 12 + sum(missing(date)) AS nmissdate, count(*)-CALCULATED nmissdate AS nnonmissdate, count(DISTINCT date) AS nuniqdate, 13 + sum(missing(open)) AS nmissopen, count(*)-CALCULATED nmissopen AS nnonmissopen, count(DISTINCT open) AS nuniqopen, 14 + sum(missing(LTIM)) AS nmissLTIM, count(*)-CALCULATED nmissLTIM AS nnonmissLTIM, count(DISTINCT LTIM) AS nuniqLTIM, 15 + sum(missing(LTIM)) AS nmissLTIM, count(*)-CALCULATED nmissLTIM AS nnonmissLTIM, count(DISTINCT LTIM) AS nuniqLTIM, 16 + sum(missing(LTIM)) AS nmissLTIM, count(*)-CALCULATED nmissLTIM AS nnonmissLTIM, count(DISTINCT LTIM) AS nuniqLTIM, 17 + 1 AS kludge FROM sd.customer; ERROR: The following columns were not found in the contributing tables: birt, cred, date, iden, LTIM, open, post, prin, stat, titl.
Made small change in substr
variable=substr(vname(miss(i)),6,length(vname(miss(i))));
now its working.
I have edited my initial post to avoid problems with long variables names.
Thanks
The easiest way I know to get a count of unique levels of a variable is:
ods select nlevels; proc freq data=have nlevels; ods output nlevels=work.levels; run;
The output dataset work.levels will have the name of the variable in a variable named TableVar and the variable NNonMissLevels will have the number of unique non-missing values. If NNonMissLevels=0 then the variable has missing values for all records.
The NMissLevels variable will indicate if you have more than one "level" of missing when the value is greater than one. That indicates that you have one or more "special missing" values assigned, which may be something you need to consider in your overall project. If the value is 0 then that variable has no missing values.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.