04-12-2012 08:12 AM
I'm looking to find a way for SAS to check for leading/trailing spaces (or blanks) for every field within a SAS dataset.
There are many ways to automoatically strip leading/trailing spaces, but i can't find anything that can flag up these instead.
Any ideas/solutions would be greatly appreciated!
04-12-2012 08:30 AM
Not sure if this what you are looking for: ( for multiple variables, use array _character_)
a1=' b ';output;
/* trailing blank*/
if length(a1) < lengthc(a1) then T_flag=1;
/*for leading blanks*/
if first(a1)=' ' then L_flag=1;
04-12-2012 10:07 AM
I'm sure you realize that you can only check the character fields, not all fields.
Since you know how to remove leading or trailing blanks, you can create a flag pretty easily. Here's a one-liner that checks for both leading and trailing at the same time (which may or may not be what you were hoping to find):
flag = '*' || varname || '*' ne '*' || strip(varname) || '*';
I'm sure some of this can be replaced with CATS or similar, but I'm not as familiar with those functions since our shop still uses 9.1
Expanding this to a list of character variables is easy with arrays. The tricky part is automating how you count the number of flags that will be needed.
04-12-2012 10:37 AM
As long as we are pausing for additional thought ...
I think you will need to use LENGTHN instead of LENGTH. LENGTH has a "feature" built in, returning 1 when the string is blank. That could cause incorrect results.
If this is going to be part of a standard data cleaning process, we should realize that LENGTHC(a1) is a constant for the entire data set. An automated solution might first capture the defined lengths of all the character variables, and then use this set of constants instead of repeatedly using LENGTHC on every observation.
04-12-2012 04:00 PM
It's not clear to me whether Stephen wants a separate flag for each character variable, or a single flag per observation (presumably, if any variable has a leading/trailing blank, the flag is set to 1)
If he wants a flag for each variable...that would be a larger solution (*bows out gracefully*).
If he wants a single flag for each observation the following would combine your ideas above and produce "flagcnt" for each observation in dataset "testdata":
data test(drop=i flag);
array c _character_;
do i = 1 to dim(c);
flag = '*' || c(i) || '*' ne '*' || strip(c(i)) || '*';
flagcnt = flagcnt + flag;