@Valeidoscope wrote:
Hi @Reeza, the goal is not quite identifying patterns of missing data; I just need to know which variables are missing for each unique ID. The actual data set has 32 variables so it's difficult to tell manually.
Variables are a mix of Numeric and Character but could be converted to Numeric if necessary. I don't care about the order of the variables in the list.
Thanks
Names of the variables? Since your desired output is apparently the names of the variables concatenated together with a comma delimiter then we need to be able to know the maximum number of characters that could get stuck into that variable or do you want to use the 32*32 (maximum number of characters for 32 variable names)+31 (commas needed to separate?
One way:
data want;
set have;
array c (*) <list names of character variables here>;
array n (*) <list names of numeric variables here>;
length missvars $ 1055;
do i=1 to dim(c);
if missing(c[i]) then missvars=catx(',',missvars,vname(c[i]));
end;
do i=1 to dim(n);
if missing(n[i]) then missvars=catx(',',missvars,vname(n[i]));
end;
drop i;
run;
However if you want the variables in a specific order then there is going to be a lot of extra work involved.
The Vname function will return the name of variable and will accept an array reference as valid source.
The CATX function, if you aren't familiar will concatenate strings placing the first parameter, the comma in this case, between values when there are two or more. The loops process the character and numeric variables separately because arrays must be of the same type variables, all character or all numeric.
Do not place the < > characters in the array statements, just the names of the variables you are interested in determining "missingness".
The Missing function returns 1 (true) if a variable is missing or 0 (false) and is one of the functions that works with both numeric and character variables.
... View more