- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Dear All,
I have dataset with ID as key variable and 10 variables A1-A10. I would like create 'flag' which will have all the Variables which have missing value.Flag should have variable names which are missing(Character or numeric). Below is example of the output.
EX:
ID A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 FLAG
001 . 1 . TRT TER TEST X A1,A3,,A7,A8,A10
Can someone help with above query. Thanks in Advance.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data HAVE;
infile cards pad;
input ID $3. (Q W E) (: 8.) ( X T F R Y G S )( : $4.);
call missing(R,Y,G);
cards;
001 . 1 . TRT TER TEST a a a X
;
run;
data want;
set have;
length _name_ flag $ 200;
do while(_name_ ne '_name_');
call vnext(_name_);
if _name_ ne '_name_' then do;
if left(vvaluex(_name_)) in (' ','.') then flag=catx(',',flag,_name_);
end;
end;
drop _name_;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Arrays should come in handy:
data want;
set have;
array a {10};
length flag $ 30;
do i=1 to 10;
if a{i}=. then do;
if flag=' ' then flag = vname(a{i});
else flag = trim(flag) || ',' || vname(a{i});
end;
end;
run;
Just a side note: I would suggest using a more meaningful name like MISSING_NUMS instead of FLAG. But that's just a matter of style.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You should tell use which variables are actually numeric and which are character. It appears as though A1 - A3 might be numeric and the remainder character but I don't like to make assumptions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
To mix NUM and CHAR variables:
data HAVE;
infile cards pad;
input ID $3. (A1 A2 A3) (: 8.) ( A4 - A10 )( : $4.);
call missing(of A7-A9);
cards;
001 . 1 . TRT TER TEST a a a X
run;
data WANT;
set HAVE;
length MISSING_VAR_LIST $16;
do _I=1 to 10;
_VARNAME =cats('A',_I);
_VARVALUE =left(vvaluex(_VARNAME));
_VARTYPE =vtypex(_VARNAME);
_IS_MISSING=ifn(_VARTYPE='C' & _VARVALUE=' ', 1
,ifn(_VARTYPE='N' & _VARVALUE='.', 1
, 0));
if _IS_MISSING then
MISSING_VAR_LIST=catx(',', MISSING_VAR_LIST, _VARNAME);
end;
drop _:;
putlog MISSING_VAR_LIST=;
run;
MISSING_VAR_LIST=A1,A3,A7,A8,A9
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If your variable names are not A1 to A10...
data HAVE;
infile cards pad;
input ID $3. (Q W E) (: 8.) ( X T F R Y G S )( : $4.);
call missing(R,Y,G);
cards;
001 . 1 . TRT TER TEST a a a X
run;
data WANT;
set HAVE;
length MISSING_VAR_LIST _VARNAME $32 _VARVALUE _VARTYPE $1;
do until (_VARNAME='ID');
call vnext(_VARNAME,_VARTYPE);
end;
do _I=1 to 10;
call vnext(_VARNAME,_VARTYPE);
_VARVALUE =left(vvaluex(_VARNAME));
_IS_MISSING=ifn(_VARTYPE='C' & _VARVALUE=' ', 1
,ifn(_VARTYPE='N' & _VARVALUE='.', 1
, 0));
if _IS_MISSING then
MISSING_VAR_LIST=catx(',', MISSING_VAR_LIST, _VARNAME);
end;
drop _:;
putlog MISSING_VAR_LIST=;
run;
MISSING_VAR_LIST=Q,E,R,Y,G
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data HAVE;
infile cards pad;
input ID $3. (Q W E) (: 8.) ( X T F R Y G S )( : $4.);
call missing(R,Y,G);
cards;
001 . 1 . TRT TER TEST a a a X
;
run;
data want;
set have;
length _name_ flag $ 200;
do while(_name_ ne '_name_');
call vnext(_name_);
if _name_ ne '_name_' then do;
if left(vvaluex(_name_)) in (' ','.') then flag=catx(',',flag,_name_);
end;
end;
drop _name_;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Excellent Sir...Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Ksharp You removed necessary checks.
If you have a string equal to '.' it will be marked as missing when it is not.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
OK. Using missing= system option.
data HAVE;
infile cards pad;
input ID $3. (Q W E) (: 8.) ( X T F R Y G S )( : $4.);
call missing(R,Y,G);
cards;
001 . 1 . TRT TER TEST a a a X
;
run;
options missing=' ';
data want;
set have;
length _name_ flag $ 200;
do while(_name_ ne '_name_');
call vnext(_name_);
if _name_ ne '_name_' then do;
if missing(vvaluex(_name_)) then flag=catx(',',flag,_name_);
end;
end;
drop _name_;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Ksharp No, You now miss all missing numerics.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Chris,
No. I didn't miss. I check the result under UE.
data HAVE;
infile cards pad;
input ID $3. (Q W E) (: 8.) ( X T F R Y G S )( : $4.);
X='.';
call missing(R,Y,G);
cards;
001 . 1 . TRT TER TEST a a a X
;
run;
options missing=' ';
data want;
set have;
length _name_ flag $ 200;
do while(_name_ ne '_name_');
call vnext(_name_);
if _name_ ne '_name_' then do;
if missing(vvaluex(_name_)) then flag=catx(',',flag,_name_);
end;
end;
drop _name_;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Ksharp Ah yes, i missed the
options missing=' ';
Good thinking!
Which highlights the point that this logic is not the best after all.
Using vvaluex() means that any value formatted to ' ' will be shown as missing when it's not.
One has to keep this limitation in mind if using the suggestions in these posts.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Chris,
The safe way is make a format(.z .x. .... also are missing value in SAS)
value fmt
._-.z=' ';
and format all the numeric.
format _numeric_ fmt. ;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So we have these 2 solutions now:
%* Solution from KSharp;
proc format;
value nummiss ._-.z=' ';
run;
data want;
set have;
format _NUMERIC_ nummiss.;
format _CHAR_ ;
length _name_ flag $ 200;
do while(_name_ ne '_name_');
call vnext(_name_);
if _name_ ne '_name_' then do;
if missing(vvaluex(_name_)) then flag=catx(',',flag,_name_);
end;
end;
drop _name_;
putlog flag=;
run;
%* Solution from ChrisNZ;
data WANT;
set HAVE;
length _VARNAME $32 _VARVALUE _VARTYPE $1 MISSING_VAR_LIST $800 ;
format _ALL_ ;
do until( _VARNAME='_VARNAME') ;
call vnext(_VARNAME,_VARTYPE);
_VARVALUE =left(vvaluex(_VARNAME));
_IS_MISSING=ifn(_VARTYPE='C' & missing(_VARVALUE) , 1
,ifn(_VARTYPE='N' & ^anydigit(_VARVALUE), 1
, 0));
if _IS_MISSING then
MISSING_VAR_LIST=catx(',', MISSING_VAR_LIST, _VARNAME);
end;
drop _:;
putlog MISSING_VAR_LIST=;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content