I know of
Function | Purpose | Arguments (signature) |
NMISS | Count number of missing values | numeric only |
CMISS | Count number of missing values | mix of character and numeric allowed |
N | Count number of non-missing values | numeric only |
??? (should there be CN?) | Count number of non-missing values | mix of character and numeric allowed |
MISSING | Test if any missing values | numeric only |
MISSING | Test if any missing values | character only |
??? (should there be ANYMISSING?) | Test if any missing values | mix of character and numeric allowed |
??? (should there be ALLMISSING?) | Test if all missing value | mix of character and numeric allowed |
If the above table ??? rows is it reasonable to expect a function for the purpose+signature ?
What are the current work arounds (in DATA Step) to perform the role of the ??? functionalities.
You should provide some use cases of exactly how you expect something to be used.
For example there is a function COUNTW that will count "words" in a character value. I put the "word" in quotes because you can specify characters to be the delimiter between words that would typically be considered part of a word.
data example; string="banana"; numwords = countw(string,'n'); run;
Numwords is 3 because the 'n' is used to separate bits. So the "words" were "ba" "a" and "a".
The LENGTH function returns the number of characters in a text value excluding trailing blanks.
COUNTC returns the number of times any of a list of characters appears in a text value.
data example; string="banana"; numlist = countc(string,'anq'); run;
Numlist has a value of 5 because it counted 3 'a's , 2 'n's and 0 'q' in the string variable.
Coutw and CountC have optional parameters that affect matching properties like case, exclude the characters instead of include for counting or delimiting and such.
You can make long strings from multiple variables and count the results.
data example; string="banana"; x=25; y='This is a longer text value'; z=.; numvalues =countw(catx('|',string,x,y),'|'); /* next uses default list of delimiters which includes space*/ numwords = countw(catx(' ',string,x,y)); /* one not reliable way to avoid counting missing numeric values*/ numvalues2 =countw(catx('|',string,x,y,z),'|.'); run;
CAUTION: The CATX and other CAT functions that might be used this way will by default use a BEST function to convert numeric values to text. So a value like 123.4567 would have a decimal in the middle of the value and "123" and "4567" would become words in the numvalues2 example. The numvalues2 example also assumes that "missing" will be represented by ".". A simple statement before the data step: Options missing=' '; could set any single character as what appears in the string for the missing numeric values.
I used the | character to separate values of individual variables because I was sure, in this case, none of the values contained that character. Depending on what you do you may get to play with function options. Look in the documentation for the defaults and options for the functions Countw and Countc.
Best is to use a specific format for converting your numeric values so you know what to expect.
Next I expect "but I have a lot of variables". Variable lists or Arrays may help to define lists of variable.
Another approach could be to build a set of numeric variables using the MISSING function. The function returns a value of 1 for missing and 0 for not missing.
data example; string="banana"; x=25; y='This is a longer text value'; z=.; mstring = missing(string); mx = missing(x); my = missing(y); mz = missing(z); Nummissing = sum(mstring, mx, my,mz); run;
This is where array processing to do similar things to a bunch of like type variables is helpful.
So with any time you talk about counting character values you need to be pretty specific about what is counted.,
Allmissing would be nummiss = number of variables. ( n (mstring,mx,my,mz) )
Anymissing would range(mstring, mx, my, mz) = 1 (Look up the range function
1. N=CN(of VARS[*]) could be written as N=countw(catx('|', of VARS[*]) ,'|')
There might be a better way, this is expensive, and thee are caveats at @ballardw mentioned.
2. NB=ANYMISSING(of VARS[*]) could be written as NB=(MISSING(of VARS[*]) > 0)
3. NB=ALLMISSING(of VARS[*]) could be written as NB=(^lengthn( catx('|', of VARS[*]) ) )
for characters variables,
or NB=( N(of VARS[*]) = 0 )
for numeric variables
[Edited to correct as I misread the functions]
It probably helps to use ARRAY for some of these. Note that a single ARRAY can only have one type of variable.
So count of non-missing characters is a simple subtraction.
array c _character_;
cmiss=cmiss(of c[*]);
cnonmiss = dim(c) - cmiss;
ANYMISS functions could use COALESCE() function.
anymissn = missing(coalesce(of _numeric_));
anymissc = missing(coalescec(of _character_));
But why not just use the current NMISS or CMISS functions?
anymiss = 0 < cmiss(of _all_) ;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.