I want to check if all variables of a data set are missing except one, but without knowledge about the variable names. (I import some excel files of different structure and want to get rid of all the rows with only missing values, except there is a column with the row number that will be non missing.)
So my first idea was to concatenate all the (maybe missing) values and then check this value, but the concatenation using 'cats(of _all_)' does not work.
Code:
data x;
input x1 x2 x3 x4 x5;
Concat = cats(of _all_);
*Concat = cats(of X1-X5);
*Concat = cats(of X1--X5);
datalines;
1 2 3 4 5
. 2 . 4 5
. . . . .
;
run;
Output:
x1 x2 x3 x4 x5 Concat
1 2 3 4 5 12345
. 2 . 4 5 .
. . . . . .
The versions with 'cats(of X1-X5)' and 'cats(of X1--X5)' are working as expected, but as mentioned I do not want to determine any knowledge about the variable names at this point.
I would appreciate any help on this.
You need to tell the data step compiler that you want the concat variable(s) to be character strings.
Example:
data x;
input x1-x5;
length concat1-concat3 $30 ;
Concat1 = cats(of _all_);
Concat2 = cats(of X1-X5);
Concat3 = cats(of X1--X5);
datalines;
1 2 3 4 5
. 2 . 4 5
. . . . .
;
Result
Obs x1 x2 x3 x4 x5 concat1 concat2 concat3 1 1 2 3 4 5 12345 12345 12345 2 . 2 . 4 5 .2.45 .2.45 .2.45 3 . . . . . ..... ..... .....
You need to tell the data step compiler that you want the concat variable(s) to be character strings.
Example:
data x;
input x1-x5;
length concat1-concat3 $30 ;
Concat1 = cats(of _all_);
Concat2 = cats(of X1-X5);
Concat3 = cats(of X1--X5);
datalines;
1 2 3 4 5
. 2 . 4 5
. . . . .
;
Result
Obs x1 x2 x3 x4 x5 concat1 concat2 concat3 1 1 2 3 4 5 12345 12345 12345 2 . 2 . 4 5 .2.45 .2.45 .2.45 3 . . . . . ..... ..... .....
I want to check if all variables of a data set are missing except one, but without knowledge about the variable names.
Use the CMISS() function to count number of missing. You will need to know how many variables there are to get your test of "except one". Here is a trick to dynamically count the variables in a datastep by making two arrays. You will need to introduce an extra character variable to make sure it works for datasets with zero character variables.
data x;
input x1 x2 x3 x4 x5;
nmiss=cmiss(of _all_)-1;
array _n _numeric_;
array _c $1 _c_ _character_;
drop _c_ ;
nvars = dim(_n) + dim(_c) -2 ;
except1 = 1 = (nvars - nmiss);
datalines;
1 2 3 4 5
. 2 . 4 5
1 2 3 4 .
1 . . . .
. . . . .
;
It might be easier to just add an extra step that finds the number of variables and puts it into a macro variable instead.
data x;
input x1 x2 x3 x4 x5;
datalines;
1 2 3 4 5
. 2 . 4 5
1 2 3 4 .
1 . . . .
. . . . .
;
proc sql noprint;
select count(*) into :nvars trimmed
from dictionary.columns
where libname='WORK' and memname='X'
;
quit;
data want;
set x;
nmiss=cmiss(of _all_)-1;
except1 = 1 = (&nvars - nmiss);
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.