Hello everybody,
Today I come with this problem that I hope you can help me.
Let's say that I have this dataset:
data have;
input A $ B C D $;
datalines;
try 0 57 0
do . 42 something
with 442 . this
;
And I need to return this:
The new variable, N_MISS, count how many missing character I have in every observation. I know can do this easily with the function CMISS(OF A-D), but the problem is that I need to consider the 0 as a missing value.
I create a format to identify with 0 and 1 when the value are missing or not, and then sum al the variables of the observation and have the amount of not missing values.
PROC FORMAT LIBRARY = FORMATS;
INVALUE $CMISS_COUNT '0' = 0
' ' = 0
'.' = 0
OTHER = 1;
INVALUE NMISS_COUNT 0 = 0
. = 0
OTHER = 1;
RUN;
But, when I apply a format, the value of the character still the same for the sum. I can use the fuction PUT(A, $CMISS_COUNT.), but in the real excercise in don't have only 4 variables but 250 ... aaaaand do that 250 times is... well.
I hope I can express myself, sorry for my newbie english.
Goodbye, and thanks for all.
Something like this?
data want;
set have;
array nvars{*} _numeric_;
array cvars{*} _character_;
do i = 1 to dim(nvars);
n_miss = sum(n_miss, (nvars(i) in (., 0)));
end;
do j = 1 to dim(cvars);
c_miss = sum(c_miss, (cvars(j) in ('.', '0')));
end;
all_miss = sum(n_miss, c_miss);
run;
You can use the translate function in cmiss but you won't be able to use the "of" notation. Or you create a separate array of temporary variables that you apply translate to.
data junk; x = '0'; y = '123'; z = ' '; nmiss = cmiss(translate(x,' ','0'),y,z); run;
This approach does have a potential issue in that it will treat '0000' as missing as well. If that is not desired and the value appears in your data then you may be better off creating an array of temporary variables with some logic involving If/then for exactly '0'.
Something like this?
data want;
set have;
array nvars{*} _numeric_;
array cvars{*} _character_;
do i = 1 to dim(nvars);
n_miss = sum(n_miss, (nvars(i) in (., 0)));
end;
do j = 1 to dim(cvars);
c_miss = sum(c_miss, (cvars(j) in ('.', '0')));
end;
all_miss = sum(n_miss, c_miss);
run;
Excelent, this works. Very thanks.
I can't test this right at the moment, but what about something like:
data want; set have; total=countc(cats(of a--d),'.0'); run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.