- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Everyone,
I have a Data set ,it contains character variables .some of the records are missing my requirement is how many records are missing in each variable.i tried with Freq and means.Means and summary giving stats only for numeric variables but i my Data set is contains character variables. is their any alternate method to get the requirement
Example:
DATA _null_;
Input Customer_Id $ Gender $ location $;
Cards;
1001 M XXXX
1002 YYYY
M ZZZZ
F
;
RUN;
My Requirement is :
column_name Miss Nmiss
custom_id 2 2
Gender 1 3
location 1 3
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you don't mind multiple passes through your data the following mIght build the table you seek with PROC SQL
proc contents data = your.data out= conts noprint ;
Run;
proc sql noprint ;
Select ' insert into cStats select quote(!! Name !! '), sum( case when ' !! name !! ' is null then 1 else 0 end ) as miss, sum( case when , !! Name !! ' is null then 0 else 1 end ) as nMiss'
Into :collector separated by ' ; '
From conts
where type = 2
order by name ;
Create table cstats ( column_name char(32), miss integer, nmiss integer) ;
&collector ;
quit ;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Alternately,
PROC FORMAT;
VALUE $miss
" " = "Missing"
OTHER = "Non_Missing";
RUN;
%macro freq(in,out);
ods output OneWayFreqs=OneWayFreqs;
proc freq data=have;
table &in / missing ;
format &in $miss.;
run;
ods output close ;
proc transpose data=OneWayFreqs out=&out(drop=_name_);
by table;
id f_&in ;
var Frequency;
run;
%mend;
%freq(Customer_Id,want1)
%freq(Gender,want2)
%freq(location,want3)
data all;
set want1 want2 want3;
run;
Thanks,
Jag
Jag
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Should be possible to acquire these stats on a single pass
data stats ;
do rows=0 by 1 while( not end_data ) ;
set your.data end= end_data ;
Array chrs _character_ ;
Array counts(999) _temporary_ ;
do over chrs ;
Counts(_i_) + missing( chr) ;
End ;
End ;
length colunn_name $32 ;
Do over chrs ;
Column_name = vname( chrs) ;
Miss = counts(_i_) ;
Nmiss = rows- miss ;
Output ;
End ;
stop;
KEEP COLUMN_NAME MISS NMISS ;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
works a treat! Just a couple of small typos:
- Counts(_i_) + missing( chrS) ;
- length coluMn_name $32 ;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If I correctly understand what you are trying to do, wouldn't something like the following accomplish the task?
data have;
input a b $ c d $ e f $;
cards;
1 . . 4 5 .
1 2 3 4 5 6
. . . . . .
1 2 . . 5 6
. . . . . .
. 2 . 4 5 .
;
data need;
set have;
num_missing=cmiss(of a--f);
run;
proc freq data=need;
tables num_missing;
run;
or, to do it in a single pass:
data want (keep=num_missing);
set have end=last;
num_missing+cmiss(of a--f) eq 6;
if last then output;
run;
proc print;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
HI Trinesh..
I know my code not so effictive ,But if you have only 2 or 3 variables ,you may get the desired results with bellow simple datastep code..
Data Test;
input ID $ Name $;
ID_new="ID ";
Name_New="Name";
cards;
100 San
. sant
101 .
102 .
103 ref
;
run;
Data New1;
Set New(Rename=(ID_new=var ID=name) Keep=ID ID_new ) New(Rename=(name_new=var) Keep=name name_new ) ;
run;
Proc Sql;
create table final as
Select Var,Sum(missing(name)) as missing,count(Name) as nonmissing from new1
group by var;
quit;
Thanks,
Sanjeev.K
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is a cheat's way of doing it - assuming you don't need it to be dynamic in identifying column names. If you do then you need to be clever and add some macro code perhaps.
DATA test;
infile datalines dlm=',' dsd;
Input Customer_Id $ Gender $ location $;
Cards;
1001,M,XXXX
1002,,YYYY
,M,ZZZZ
,F,
;
RUN;
data test2(drop=count_customer count_gender count_location Customer_Id Gender location);
retain count_customer count_gender count_location;
set test end=end;
if _n_ = 1 then
do;
count_customer =0;
count_gender=0;
count_location=0;
end;
if Customer_Id = "" then
count_customer = count_customer+1;
if Gender = "" then
count_gender = count_gender+1;
if location = "" then
count_location = count_location+1;
if end = 1 then
do;
column_name = "Customer_Id";
miss = count_customer;
not_missing = _n_ - count_customer;
output;
column_name = "Gender";
miss = count_gender;
not_missing = _n_ - count_gender;
output;
column_name = "Location";
miss = count_location;
not_missing = _n_ - count_location;
output;
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try this.
%macro missing_rpt(data=);
proc contents data=&data noprint out=_contents;
run;
data _null_;
set _contents end=last;
call symput(cats('var',_n_),strip(name));
call symput(cats('type',_n_),strip(type));
if last then do;
call symput('nvars',strip(_n_));
end;
run;
%let delim=;
proc sql;
create table _missing_counts as
select %do i=1 %to &nvars;
%if &i>1 %then %let delim=%str(,);
%if &&type&i=1 %then %do;
&delim.sum(case when &&var&i=. then 1 else 0 end) as &&var&i
%end;
%else %do;
&delim.sum(case when &&var&i='' then 1 else 0 end) as &&var&i
%end;
%end;
from &data
;quit;
proc transpose data=_missing_counts out=_missing_counts_t;
run;
proc sql;
select _name_ label='Variable',
col1 label='Missing Value Count'
from _missing_counts_t
order by _name_
;quit;
%mend;
%missing_rpt(data=mylib.mydata);
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Getting the counts is easy if you apply a format and PROC SUMMARY will do all the variables at once character and numeric. Then it is "just" a matter of formatting the output for the report.
Input Customer_Id $ Gender $ location $ x y;
Cards;
1001 M XXXX 1 2
1002 . YYYY . 2
. M ZZZZ 1 2
. F . 3 2
;
RUN;
proc format;
value _miss ._-.Z = 'Missing' other='NotMissing';
value $_miss ' ' = 'Missing' other='NotMissing';
run;
proc summary chartype missing;
class _all_ / mlf; *MLF converts all class variables to character;
ways 1;
format _numeric_ _miss. _character_ $_miss.;
output out=count;
run;
data report0(keep=_type_ variable value _freq_);
set count;
array v
length variable $32 value $16;
variable = vname(v[indexc(_type_,'1')]);
value = coalescec(of v
run;
proc print;
run;
proc transpose out=report(drop=_name_);
by _type_ variable;
var _freq_;
id value;
run;
proc print;
run;