DATA Step, Macro, Functions and more

How to count all the miss values for an entire observation, taking into consideration the 0

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 10
Accepted Solution

How to count all the miss values for an entire observation, taking into consideration the 0

Hello everybody,

Today I come with this problem that I hope you can help me.

Let's say that I have this dataset: 

 

 

data have;
input A $ B C D $;
datalines;
try 0 57 0
do . 42 something
with 442 . this
;

 

 

And I need to return this:

 

Captura.PNG

 

The new variable, N_MISS, count how many missing character I have in every observation. I know  can do this easily with the function CMISS(OF A-D), but the problem is that I need to consider the 0 as a missing value.

I create a format to identify with 0 and 1 when the value are missing or not, and then sum al the variables of the observation and have the amount of not missing values.

 

PROC FORMAT LIBRARY = FORMATS;
	INVALUE $CMISS_COUNT '0' = 0
	                     ' ' = 0
		             '.' = 0
			     OTHER = 1;
	INVALUE NMISS_COUNT 0 = 0
			    . = 0
			    OTHER = 1;
RUN;

But, when I apply a format, the value of the character still the same for the sum. I can use the fuction PUT(A, $CMISS_COUNT.), but in the real excercise in don't have only 4 variables but 250 ... aaaaand do that 250 times is... well.

I hope I can express myself, sorry for my newbie english.

Goodbye, and thanks for all.

 


Accepted Solutions
Solution
‎01-10-2017 01:10 PM
PROC Star
Posts: 288

Re: How to count all the miss values for an entire observation, taking into consideration the 0

Something like this?

 


data want;
    set have;
    array nvars{*} _numeric_;
    array cvars{*} _character_;
    do i = 1 to dim(nvars);
        n_miss = sum(n_miss, (nvars(i) in (., 0)));
    end;
    do j = 1 to dim(cvars);
        c_miss = sum(c_miss, (cvars(j) in ('.', '0')));
    end;
    all_miss = sum(n_miss, c_miss);
run;

View solution in original post


All Replies
Super User
Posts: 10,500

Re: How to count all the miss values for an entire observation, taking into consideration the 0

You can use the translate function in cmiss but you won't be able to use the "of" notation. Or you create a separate array of temporary variables that you apply translate to.

 

data junk;
   x = '0';
   y = '123';
   z = ' ';
   nmiss = cmiss(translate(x,' ','0'),y,z);
run;

This approach does have a potential issue in that it will treat '0000' as missing as well. If that is not desired and the value appears in your data then you may be better off creating an array of temporary variables with some logic involving If/then for exactly '0'.

 

Solution
‎01-10-2017 01:10 PM
PROC Star
Posts: 288

Re: How to count all the miss values for an entire observation, taking into consideration the 0

Something like this?

 


data want;
    set have;
    array nvars{*} _numeric_;
    array cvars{*} _character_;
    do i = 1 to dim(nvars);
        n_miss = sum(n_miss, (nvars(i) in (., 0)));
    end;
    do j = 1 to dim(cvars);
        c_miss = sum(c_miss, (cvars(j) in ('.', '0')));
    end;
    all_miss = sum(n_miss, c_miss);
run;
Occasional Contributor
Posts: 10

Re: How to count all the miss values for an entire observation, taking into consideration the 0

Excelent, this works. Very thanks.

Super User
Super User
Posts: 7,401

Re: How to count all the miss values for an entire observation, taking into consideration the 0

[ Edited ]

I can't test this right at the moment, but what about something like:

data want;
  set have;
  total=countc(cats(of a--d),'.0');
run;
Occasional Contributor
Posts: 10

Re: How to count all the miss values for an entire observation, taking into consideration the 0

Interesting method.... But, when the database present a value like '200', the function will detect 2 extra missings thanks to the 0.
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 408 views
  • 3 likes
  • 4 in conversation