Hi everybody. I'm confused with the n and d delimiter in Countw funtion. Would someone explain a little bit? Any thoughts would be appreciated!
For countw function,
1. d or D adds digits to the list of characters.
2. n or N adds digits, an underscore, and English letters(that is, the characters that can appear after the first characterin a SAS variable name using VALIDVARNAME=V7) to the list of characters. Thus, for the code below,I think the result should be:
obs n d
1 3 4
2 1 5
data test;
input string $char60.;
datalines;
'_'2+2=4
\\"Windows" "\Path\Names\Use\Backslashes
;
run;
data test1;
set test;
d=countw(string,,'d');
n=countw(string,,'n');
run;
proc print data=test1;run;
However,the result is different, as shown in the picture below.
Hi,
Remember that character variables are padded with trailing blanks. When you specify the delimiters and do not include a blank as a delimiter, the trailing blanks counts as a word. You can use the TRIMN function to remove trailing blanks.
Below log shows the result you expect:
26 data test1; 27 set test; 28 d=countw(trimn(string),,'d'); 29 n=countw(trimn(string),,'n'); 30 31 put (string d n)(=) ; 32 run; string='_'2+2=4 d=3 n=4 string=\\"Windows" "\Path\Names\Use\Backslashes d=1 n=5
Hi,
Remember that character variables are padded with trailing blanks. When you specify the delimiters and do not include a blank as a delimiter, the trailing blanks counts as a word. You can use the TRIMN function to remove trailing blanks.
Below log shows the result you expect:
26 data test1; 27 set test; 28 d=countw(trimn(string),,'d'); 29 n=countw(trimn(string),,'n'); 30 31 put (string d n)(=) ; 32 run; string='_'2+2=4 d=3 n=4 string=\\"Windows" "\Path\Names\Use\Backslashes d=1 n=5
When you read the data into SAS, you specified that the variable STRING was 60 characters long. So if the value assigned to string is less than 60 characters long, it will be padded with blanks.
If you have a variable that is one character long, and you put one character in it, it is not padded with blanks.
Below example shows String1 has a length of one character, and String2 has a length of two characters.
data test;
input string1 : $1. string2 : $2.;
datalines;
A 1
1 11
;
run;
data test1;
set test;
d1=countw(string1,,'d');
d2=countw(string2,,'d');
put string1= d1= string2= d2=;
run;
I think the questions you have are actually basic ones: What is a word? What is a delimiter? The answers affect how COUNTW counts, and the number of words that it finds.
For example, in the data lines that you illustrated, should "+" be a delimiter, or should it be a character within a word? By default, it is a character within a word.
Assuming you are working on an ASCII-based system, these characters are delimiters rather than part of a word:
blank ! $ % & ( ) * + , - . / ; < ^ |
You have control over that, however. Using the "n" or "d" modifiers (and there are additional choices as well) switches the meaning of some characters. They now become delimiters, rather than being characters within a word.
You're getting closer.
Adding "n" or "d" does not change the default set of delimiters. It modifies the default set by adding characters that also function as delimiters. So "+" remains a delimiter when you add "n" or "d". If you want to replace the default set of delimiters, you can do that. You would need to add the delimiters that you want as the second parameter, between what is now the two consecutive commas.
Hi @Cecillia_Mao,
I think the default delimiters are not applied if more than one argument is used (see documentation). This includes the case of an empty second argument as in your examples (or in countw(string,,)where also the third argument is empty or in countw(string,) where the third argument is not used). That is, with two or three arguments (empty or not) the list of delimiters is constructed from the second argument and, if applicable, modified by the third argument. In particular, an empty second argument means an empty list of delimiters (i.e. no character serves as a delimiter), until the modification by the third argument is applied, if any. So, in your examples the digits ('d') or "name characters" ('n'), respectively, are added to the empty list (not to the list of default delimiters), i.e., they are eventually the only delimiters. To add them to the list of default delimiters you would need to specify the default delimiters explicitly in the second argument, e.g., d=countw(string,' !$%&()*+,-./;<^|','d') in an ASCII environment (resulting in different values for variable d than with the empty second argument).
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.