Hi Everyone,
I have a Data set ,it contains character variables .some of the records are missing my requirement is how many records are missing in each variable.i tried with Freq and means.Means and summary giving stats only for numeric variables but i my Data set is contains character variables. is their any alternate method to get the requirement
Example:
DATA _null_;
Input Customer_Id $ Gender $ location $;
Cards;
1001 M XXXX
1002 YYYY
M ZZZZ
F
;
RUN;
My Requirement is :
column_name Miss Nmiss
custom_id 2 2
Gender 1 3
location 1 3
If you don't mind multiple passes through your data the following mIght build the table you seek with PROC SQL
proc contents data = your.data out= conts noprint ;
Run;
proc sql noprint ;
Select ' insert into cStats select quote(!! Name !! '), sum( case when ' !! name !! ' is null then 1 else 0 end ) as miss, sum( case when , !! Name !! ' is null then 0 else 1 end ) as nMiss'
Into :collector separated by ' ; '
From conts
where type = 2
order by name ;
Create table cstats ( column_name char(32), miss integer, nmiss integer) ;
&collector ;
quit ;
Alternately,
PROC FORMAT;
VALUE $miss
" " = "Missing"
OTHER = "Non_Missing";
RUN;
%macro freq(in,out);
ods output OneWayFreqs=OneWayFreqs;
proc freq data=have;
table &in / missing ;
format &in $miss.;
run;
ods output close ;
proc transpose data=OneWayFreqs out=&out(drop=_name_);
by table;
id f_&in ;
var Frequency;
run;
%mend;
%freq(Customer_Id,want1)
%freq(Gender,want2)
%freq(location,want3)
data all;
set want1 want2 want3;
run;
Thanks,
Jag
Should be possible to acquire these stats on a single pass
data stats ;
do rows=0 by 1 while( not end_data ) ;
set your.data end= end_data ;
Array chrs _character_ ;
Array counts(999) _temporary_ ;
do over chrs ;
Counts(_i_) + missing( chr) ;
End ;
End ;
length colunn_name $32 ;
Do over chrs ;
Column_name = vname( chrs) ;
Miss = counts(_i_) ;
Nmiss = rows- miss ;
Output ;
End ;
stop;
KEEP COLUMN_NAME MISS NMISS ;
run;
works a treat! Just a couple of small typos:
- Counts(_i_) + missing( chrS) ;
- length coluMn_name $32 ;
If I correctly understand what you are trying to do, wouldn't something like the following accomplish the task?
data have;
input a b $ c d $ e f $;
cards;
1 . . 4 5 .
1 2 3 4 5 6
. . . . . .
1 2 . . 5 6
. . . . . .
. 2 . 4 5 .
;
data need;
set have;
num_missing=cmiss(of a--f);
run;
proc freq data=need;
tables num_missing;
run;
or, to do it in a single pass:
data want (keep=num_missing);
set have end=last;
num_missing+cmiss(of a--f) eq 6;
if last then output;
run;
proc print;
run;
HI Trinesh..
I know my code not so effictive ,But if you have only 2 or 3 variables ,you may get the desired results with bellow simple datastep code..
Data Test;
input ID $ Name $;
ID_new="ID ";
Name_New="Name";
cards;
100 San
. sant
101 .
102 .
103 ref
;
run;
Data New1;
Set New(Rename=(ID_new=var ID=name) Keep=ID ID_new ) New(Rename=(name_new=var) Keep=name name_new ) ;
run;
Proc Sql;
create table final as
Select Var,Sum(missing(name)) as missing,count(Name) as nonmissing from new1
group by var;
quit;
Thanks,
Sanjeev.K
Here is a cheat's way of doing it - assuming you don't need it to be dynamic in identifying column names. If you do then you need to be clever and add some macro code perhaps.
DATA test;
infile datalines dlm=',' dsd;
Input Customer_Id $ Gender $ location $;
Cards;
1001,M,XXXX
1002,,YYYY
,M,ZZZZ
,F,
;
RUN;
data test2(drop=count_customer count_gender count_location Customer_Id Gender location);
retain count_customer count_gender count_location;
set test end=end;
if _n_ = 1 then
do;
count_customer =0;
count_gender=0;
count_location=0;
end;
if Customer_Id = "" then
count_customer = count_customer+1;
if Gender = "" then
count_gender = count_gender+1;
if location = "" then
count_location = count_location+1;
if end = 1 then
do;
column_name = "Customer_Id";
miss = count_customer;
not_missing = _n_ - count_customer;
output;
column_name = "Gender";
miss = count_gender;
not_missing = _n_ - count_gender;
output;
column_name = "Location";
miss = count_location;
not_missing = _n_ - count_location;
output;
end;
run;
Try this.
%macro missing_rpt(data=);
proc contents data=&data noprint out=_contents;
run;
data _null_;
set _contents end=last;
call symput(cats('var',_n_),strip(name));
call symput(cats('type',_n_),strip(type));
if last then do;
call symput('nvars',strip(_n_));
end;
run;
%let delim=;
proc sql;
create table _missing_counts as
select %do i=1 %to &nvars;
%if &i>1 %then %let delim=%str(,);
%if &&type&i=1 %then %do;
&delim.sum(case when &&var&i=. then 1 else 0 end) as &&var&i
%end;
%else %do;
&delim.sum(case when &&var&i='' then 1 else 0 end) as &&var&i
%end;
%end;
from &data
;quit;
proc transpose data=_missing_counts out=_missing_counts_t;
run;
proc sql;
select _name_ label='Variable',
col1 label='Missing Value Count'
from _missing_counts_t
order by _name_
;quit;
%mend;
%missing_rpt(data=mylib.mydata);
Getting the counts is easy if you apply a format and PROC SUMMARY will do all the variables at once character and numeric. Then it is "just" a matter of formatting the output for the report.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.