Hi, my data have more than 1,000 variables and I need to check for missing and filling rate for each variable.
My aim is to output a sas table which lists all variables, along with number of missing, non missing and the rate of missing and non missing.
Sample data as per below:
data missingval;
length Country $ 10 TourGuide $ 10 Trip_Num 8;
input Country TourGuide Trip_Num;
datalines;
Japan Yamada 3
. Militello .
Australia Edney 4
Venezuela . 2
Brazil Cardoso .
;run;
Expected output:
Variable_Name | Variable_Type | Fill_Num | Miss_Num | Fill_Rate | Miss_Rate |
Country | Char | 4 | 1 | 0.8 | 0.2 |
TourGuide | Char | 4 | 1 | 0.8 | 0.2 |
Trip_Num | Num | 3 | 2 | 0.6 | 0.4 |
Really appreciate your help on this!
@Kurt_Bremser wrote:
Mind that both methods require you to have enough memory to store a number of variables equal to the number of observations in your original dataset during the transpose.
So, do you think that solution skipping transposing would be better?
That one got a bit lengthy, but i am out of time right now.
proc sql noprint;
select quote(trim(Name)) into :varList separated by ','
from sashelp.vcolumn
where LibName = 'WORK' and MemName = 'MISSINGVAL'
;
quit;
data _null_;
set missingval end=lastObs;
length
Variable_Name $ 32
Variable_Type $ 4
Fill_Num Miss_Num 8
_isMissing 8
;
if _n_ = 1 then do;
declare hash h(ordered:'yes');
declare hiter bob('h');
h.defineKey('Variable_Name');
h.defineData('Variable_Name', 'Variable_Type', 'Fill_Num', 'Miss_Num');
h.defineDone();
do Variable_Name = &varList;
Variable_Type = vtypex(Variable_Name);
Fill_Num = 0;
Miss_Num = 0;
h.add();
end;
end;
do Variable_Name = &varList;
_rc = h.find();
if Variable_Type = 'C' then do;
_isMissing = missing(vvaluex(Variable_Name));
end;
else do;
_isMissing = (strip(vvaluex(Variable_Name)) = '.');
end;
Fill_Num = sum(Fill_Num, (not _isMissing));
Miss_Num = sum(Miss_Num, _isMissing);
h.replace();
end;
if lastObs then do;
h.output(dataset: 'want');
end;
run;
data work.want;
set work.want;
length Fill_Rate Miss_Rate 8;
Fill_Rate = Fill_Num / (Fill_Num + Miss_Num);
Miss_Rate = Miss_Num / (Fill_Num + Miss_Num);
run;
proc print data=work.want;run;
Use a double transpose, followed by a summarizing data step that merges the types in:
data missingval;
length Country $ 10 TourGuide $ 10 Trip_Num 8;
input Country TourGuide Trip_Num;
datalines;
Japan Yamada 3
. Militello .
Australia Edney 4
Venezuela . 2
Brazil Cardoso .
;
proc sql;
create table types as
select name as _name_, type
from dictionary.columns
where libname = 'WORK' and memname = 'MISSINGVAL';
quit;
proc transpose data=missingval out=trans;
var _all_;
run;
proc transpose data=trans out=freq;
by _name_;
var col:;
run;
data want (rename=(_name_=variable_name));
length _name_ $32; * prevents WARNING;
merge
freq
types
;
by _name_;
if first._name_
then do;
fill_num = 0;
miss_num = 0;
end;
if strip(col1) not in (' ','.')
then fill_num + 1;
else miss_num + 1;
if last._name_;
fill_rate = fill_num / (fill_num + miss_num);
miss_rate = miss_num / (fill_num + miss_num);
keep _name_ type fill_num miss_num fill_rate miss_rate;
run;
proc print data=want noobs;
run;
Result:
variable_ fill_ miss_ name type fill_num miss_num rate rate Country char 4 1 0.8 0.2 TourGuide char 4 1 0.8 0.2 Trip_Num num 3 2 0.6 0.4
alternately please try
data missingval;
length Country $ 10 TourGuide $ 10 Trip_Num 8;
input Country TourGuide Trip_Num;
datalines;
Japan Yamada 3
. Militello .
Australia Edney 4
Venezuela . 2
Brazil Cardoso .
;
run;
proc transpose data=missingval out=test;
var Country TourGuide Trip_Num;
run;
data want;
set test;
array cols(*) col1-col5;
do i = 1 to dim(cols);
obs=sum(i);
if compress(cols(i))='.' then cols(i)=' ';
if compress(cols(i))^=' ' then cols(i)='1';
end;
miss_Num=cmiss(of col1-col5);
fill_Num=sum(of col1-col5);
fill_rate=fill_Num/obs;
miss_rate=miss_Num/obs;
drop col1-col5;
run;
Mind that both methods require you to have enough memory to store a number of variables equal to the number of observations in your original dataset during the transpose.
@Kurt_Bremser wrote:
Mind that both methods require you to have enough memory to store a number of variables equal to the number of observations in your original dataset during the transpose.
So, do you think that solution skipping transposing would be better?
That one got a bit lengthy, but i am out of time right now.
proc sql noprint;
select quote(trim(Name)) into :varList separated by ','
from sashelp.vcolumn
where LibName = 'WORK' and MemName = 'MISSINGVAL'
;
quit;
data _null_;
set missingval end=lastObs;
length
Variable_Name $ 32
Variable_Type $ 4
Fill_Num Miss_Num 8
_isMissing 8
;
if _n_ = 1 then do;
declare hash h(ordered:'yes');
declare hiter bob('h');
h.defineKey('Variable_Name');
h.defineData('Variable_Name', 'Variable_Type', 'Fill_Num', 'Miss_Num');
h.defineDone();
do Variable_Name = &varList;
Variable_Type = vtypex(Variable_Name);
Fill_Num = 0;
Miss_Num = 0;
h.add();
end;
end;
do Variable_Name = &varList;
_rc = h.find();
if Variable_Type = 'C' then do;
_isMissing = missing(vvaluex(Variable_Name));
end;
else do;
_isMissing = (strip(vvaluex(Variable_Name)) = '.');
end;
Fill_Num = sum(Fill_Num, (not _isMissing));
Miss_Num = sum(Miss_Num, _isMissing);
h.replace();
end;
if lastObs then do;
h.output(dataset: 'want');
end;
run;
data work.want;
set work.want;
length Fill_Rate Miss_Rate 8;
Fill_Rate = Fill_Num / (Fill_Num + Miss_Num);
Miss_Rate = Miss_Num / (Fill_Num + Miss_Num);
run;
proc print data=work.want;run;
Nice. One of the things that work fine with a hash. And it is equally free of any assumptions as my double transpose code.
And it will outperform the double transpose quite well with increasing dataset size.
It's actually not that lengthy, and because of the hash dynamics can be done in a single step:
data have ;
input (country tourguide) (:$10.) trip_num ;
cards ;
Japan Yamada 3
. Militello .
Australia Edney 4
Venezuela . 2
Brazil Cardoso .
;
run ;
data want (drop = country tourguide trip_num) ;
if _n_ = 1 then do ;
dcl hash h () ;
h.definekey ("variable_name") ;
h.definedata ("variable_name", "variable_type", "fill_num") ;
h.definedone () ;
dcl hiter hi ("h") ;
end ;
set have nobs = n end = z ;
array cc _char_ ;
array nn _numeric_ ;
length variable_name $ 32 variable_type $ 4 ;
call missing (variable_name, variable_type) ;
do over nn ;
if h.find (key:vname(nn)) ne 0 then fill_num = 0 ;
fill_num + not cmiss(nn) ;
h.replace (key:vname(nn), data:vname(nn), data:vtype(nn), data:fill_num) ;
end ;
do over cc ;
if h.find (key:vname(cc)) ne 0 then fill_num = 0 ;
fill_num + not cmiss(cc) ;
h.replace (key:vname(cc), data:vname(cc), data:vtype(cc), data:fill_num) ;
end ;
if z then do while (hi.next() = 0) ;
variable_type = ifc (variable_type eq "N", "Num", "Char") ;
miss_num = n - fill_num ;
miss_rate = divide (miss_num, n) ;
output ;
end ;
run ;
Kind regards
Paul D.
Thanks for your help!
All the codes given by others are working too, and i ran your codes, it works way faster, and I do like it when we dont have to specify length or dropping any variables, because it wont help when working with more than 1,000 variables.
Thank you everyone!
data missingval; length Country $ 10 TourGuide $ 10 Trip_Num 8; input Country TourGuide Trip_Num; datalines; Japan Yamada 3 . Militello . Australia Edney 4 Venezuela . 2 Brazil Cardoso . ;run; data want; set sashelp.vcolumn(keep=libname memname name type where=(lowcase(libname)='work' and lowcase(memname)='missingval')) end=last; if _n_=1 then call execute('proc sql;create table want as'); call execute(catt('select "',name,'" as name length=40,"',type,'" as type length=10, n(',name,') as fill_n,nmiss(',name,') as miss_n, calculated fill_n/count(*) as Fill_Rate,1- calculated Fill_Rate as Miss_Rate from missingval')); if last then call execute(';quit;'); else call execute('union'); run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.