Dear All,
I have dataset with ID as key variable and 10 variables A1-A10. I would like create 'flag' which will have all the Variables which have missing value.Flag should have variable names which are missing(Character or numeric). Below is example of the output.
EX:
ID A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 FLAG
001 . 1 . TRT TER TEST X A1,A3,,A7,A8,A10
Can someone help with above query. Thanks in Advance.
data HAVE;
infile cards pad;
input ID $3. (Q W E) (: 8.) ( X T F R Y G S )( : $4.);
call missing(R,Y,G);
cards;
001 . 1 . TRT TER TEST a a a X
;
run;
data want;
set have;
length _name_ flag $ 200;
do while(_name_ ne '_name_');
call vnext(_name_);
if _name_ ne '_name_' then do;
if left(vvaluex(_name_)) in (' ','.') then flag=catx(',',flag,_name_);
end;
end;
drop _name_;
run;
Arrays should come in handy:
data want;
set have;
array a {10};
length flag $ 30;
do i=1 to 10;
if a{i}=. then do;
if flag=' ' then flag = vname(a{i});
else flag = trim(flag) || ',' || vname(a{i});
end;
end;
run;
Just a side note: I would suggest using a more meaningful name like MISSING_NUMS instead of FLAG. But that's just a matter of style.
You should tell use which variables are actually numeric and which are character. It appears as though A1 - A3 might be numeric and the remainder character but I don't like to make assumptions.
To mix NUM and CHAR variables:
data HAVE;
infile cards pad;
input ID $3. (A1 A2 A3) (: 8.) ( A4 - A10 )( : $4.);
call missing(of A7-A9);
cards;
001 . 1 . TRT TER TEST a a a X
run;
data WANT;
set HAVE;
length MISSING_VAR_LIST $16;
do _I=1 to 10;
_VARNAME =cats('A',_I);
_VARVALUE =left(vvaluex(_VARNAME));
_VARTYPE =vtypex(_VARNAME);
_IS_MISSING=ifn(_VARTYPE='C' & _VARVALUE=' ', 1
,ifn(_VARTYPE='N' & _VARVALUE='.', 1
, 0));
if _IS_MISSING then
MISSING_VAR_LIST=catx(',', MISSING_VAR_LIST, _VARNAME);
end;
drop _:;
putlog MISSING_VAR_LIST=;
run;
MISSING_VAR_LIST=A1,A3,A7,A8,A9
If your variable names are not A1 to A10...
data HAVE;
infile cards pad;
input ID $3. (Q W E) (: 8.) ( X T F R Y G S )( : $4.);
call missing(R,Y,G);
cards;
001 . 1 . TRT TER TEST a a a X
run;
data WANT;
set HAVE;
length MISSING_VAR_LIST _VARNAME $32 _VARVALUE _VARTYPE $1;
do until (_VARNAME='ID');
call vnext(_VARNAME,_VARTYPE);
end;
do _I=1 to 10;
call vnext(_VARNAME,_VARTYPE);
_VARVALUE =left(vvaluex(_VARNAME));
_IS_MISSING=ifn(_VARTYPE='C' & _VARVALUE=' ', 1
,ifn(_VARTYPE='N' & _VARVALUE='.', 1
, 0));
if _IS_MISSING then
MISSING_VAR_LIST=catx(',', MISSING_VAR_LIST, _VARNAME);
end;
drop _:;
putlog MISSING_VAR_LIST=;
run;
MISSING_VAR_LIST=Q,E,R,Y,G
data HAVE;
infile cards pad;
input ID $3. (Q W E) (: 8.) ( X T F R Y G S )( : $4.);
call missing(R,Y,G);
cards;
001 . 1 . TRT TER TEST a a a X
;
run;
data want;
set have;
length _name_ flag $ 200;
do while(_name_ ne '_name_');
call vnext(_name_);
if _name_ ne '_name_' then do;
if left(vvaluex(_name_)) in (' ','.') then flag=catx(',',flag,_name_);
end;
end;
drop _name_;
run;
Excellent Sir...Thank you.
@Ksharp You removed necessary checks.
If you have a string equal to '.' it will be marked as missing when it is not.
OK. Using missing= system option.
data HAVE;
infile cards pad;
input ID $3. (Q W E) (: 8.) ( X T F R Y G S )( : $4.);
call missing(R,Y,G);
cards;
001 . 1 . TRT TER TEST a a a X
;
run;
options missing=' ';
data want;
set have;
length _name_ flag $ 200;
do while(_name_ ne '_name_');
call vnext(_name_);
if _name_ ne '_name_' then do;
if missing(vvaluex(_name_)) then flag=catx(',',flag,_name_);
end;
end;
drop _name_;
run;
@Ksharp No, You now miss all missing numerics.
Chris,
No. I didn't miss. I check the result under UE.
data HAVE;
infile cards pad;
input ID $3. (Q W E) (: 8.) ( X T F R Y G S )( : $4.);
X='.';
call missing(R,Y,G);
cards;
001 . 1 . TRT TER TEST a a a X
;
run;
options missing=' ';
data want;
set have;
length _name_ flag $ 200;
do while(_name_ ne '_name_');
call vnext(_name_);
if _name_ ne '_name_' then do;
if missing(vvaluex(_name_)) then flag=catx(',',flag,_name_);
end;
end;
drop _name_;
run;
@Ksharp Ah yes, i missed the
options missing=' ';
Good thinking!
Which highlights the point that this logic is not the best after all.
Using vvaluex() means that any value formatted to ' ' will be shown as missing when it's not.
One has to keep this limitation in mind if using the suggestions in these posts.
Chris,
The safe way is make a format(.z .x. .... also are missing value in SAS)
value fmt
._-.z=' ';
and format all the numeric.
format _numeric_ fmt. ;
So we have these 2 solutions now:
%* Solution from KSharp;
proc format;
value nummiss ._-.z=' ';
run;
data want;
set have;
format _NUMERIC_ nummiss.;
format _CHAR_ ;
length _name_ flag $ 200;
do while(_name_ ne '_name_');
call vnext(_name_);
if _name_ ne '_name_' then do;
if missing(vvaluex(_name_)) then flag=catx(',',flag,_name_);
end;
end;
drop _name_;
putlog flag=;
run;
%* Solution from ChrisNZ;
data WANT;
set HAVE;
length _VARNAME $32 _VARVALUE _VARTYPE $1 MISSING_VAR_LIST $800 ;
format _ALL_ ;
do until( _VARNAME='_VARNAME') ;
call vnext(_VARNAME,_VARTYPE);
_VARVALUE =left(vvaluex(_VARNAME));
_IS_MISSING=ifn(_VARTYPE='C' & missing(_VARVALUE) , 1
,ifn(_VARTYPE='N' & ^anydigit(_VARVALUE), 1
, 0));
if _IS_MISSING then
MISSING_VAR_LIST=catx(',', MISSING_VAR_LIST, _VARNAME);
end;
drop _:;
putlog MISSING_VAR_LIST=;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.