Hi,
The following example deletes the records whose missing values(numeric and char) are ge 10
in the following example why has the author used the j=1to dim(nm)?? Cant it be written as i=1 to dim(nm) ?
i dint understand this concept. Could anyone explain
Thanks
data complete;
set maybeokay;
misscount = 0;
array ch(*) _character_ ;
array nm(*) _numeric_ ;
do i = 1 to dim(ch);
if ch(i) = ' ' then misscount + 1;
end;
do j = 1 to dim(nm);
if nm(j) = . then misscount + 1;
end;
drop i j ;
if misscount ge 10 then delete;
run;
CATX() will skip the missing values. So catx('|','A',' ',9,.,'C') will be 'A|9|C' .
I assume it is because that is what made sense at the time.
You are right that they could have re-used I as the loop variable for the second loop. But if the loops were nested that would cause trouble.
In the past I would normally use DO OVER construct for situations like this where the index variable is meaningless and avoid having to worry about dropping the variable. (Note that it doesn't solve the issue of nested do loops as DO OVER use the automatic _I_ variable for indexing.)
Note that setting MISSCOUNT to zero is critical in this program because of the use of the sum statement (var + value;) since it means MISSCOUNT is retained.
Now that SAS has added so many new functions over the years I would use the NMISS() function to count the _NUMERIC_ missings. You could probably do something with COUNTW() and CATX() to count missing characters, but you would need to find an unused character to serve as the delimiter.
misscount=0;
array ch _character_;
misscount=nmiss(of _numeric_) + dim(ch) - countw(catx('00'x,of ch(*)),'00'x);
Note that setting MISSCOUNT to zero is critical here also because it will be included in the _NUMERIC_ variable list.
Both programs will have trouble when there are no character variables in the input datastep since you cannot define an array with no elements. You could solve that by creating a dummy character variable.
misscount=0;
retain _ch 'A';
drop _ch;
array ch _character_;
misscount=nmiss(of _numeric_) + dim(ch) - countw(catx('00'x,of ch(*)),'00'x) ;
Hi,
Thanks for the detailed explanation.
could you explain whats happening here??????its a lil bit confusing as to why the '00'x was used???????/
also why the substraction logic is done????/
misscount=nmiss(of _numeric_) + dim(ch) - countw(catx('00'x,of ch(*)),'00'x) ;
Thanks
COUNTW() counts words by using a delimiter. It is counting non-missing hence the need to subtract from the number of items in the array. So I use binary zero ('00'x) as the delimiter as it is extremely unlikely to be an actual character in your data. But see response below from that uses another function I didn't find. CMISS() will work for both numeric and character variables.
misscount=cmiss(of _all_);
It would help if SAS was more consistent in its documentation as I looked for a See Also section in the documentation for NMISS() and missed that CMISS() was instead only included in a Comparisons section instead.
Comparisons |
The NMISS function returns the number of missing values, whereas the N function returns the number of nonmissing values. NMISS requires numeric values, whereas CMISS works with both numeric and character values. NMISS works with multiple numeric values, whereas MISSING works with only one value that can be either numeric or character.
So when you concatenate all the values under all the char variables with '00'x wont the dim(ch) be equalant to the countw???
Is it like when a char value is missing '00'x concatenates with '00'x. And since there is no word between these two countw does not have a value now
only when there is '00'xword'00'x countw has a value to count???
Does it work like that??/
Also what the little x is doing beside the Zeros and why is it outside the braces????????
Thanks
CATX() will skip the missing values. So catx('|','A',' ',9,.,'C') will be 'A|9|C' .
Finally why the little x in '00'x is outside of the inverted comas?????
That is how to represent a literal value using hexadecimal digits. For example a space is represented by '20'x, tab by '09'x.
There are many other literals such date ('01JAN1960'd), time ('09:30't), datetime ('01JAN1960:09:30'dt).
How about :
misscount = 0;
misscount = cmiss(of _all_);
or, if you don't need the number of missing values:
data complete;
set maybeokay;
if cmiss( of _all_) <= 10;
run;
PG
Yes, the author could have used i instead of j for the second loop. The result would have been the same. The difference in efficiency would have been insignificant. The name of the do loop variable matters only if you want to use it after the end of the loop. For example :
do i = 1 to dim(ch);
if ch(i) = ' ' then leave;
end;
do j = 1 to dim(nm);
if ch(j) = ' ' then leave;
end;
hasMissing = i < dim(ch) or j < dim(nm);
PG
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.