Alert for leading/trailing spaces in datasets?

Reply
Occasional Contributor
Posts: 6

Alert for leading/trailing spaces in datasets?

Hello all,

I'm looking to find a way for SAS to check for leading/trailing spaces (or blanks) for every field within a SAS dataset.

There are many ways to automoatically strip leading/trailing spaces, but i can't find anything that can flag up these instead.

Any ideas/solutions would be greatly appreciated!

Many thanks

Stephen


Respected Advisor
Posts: 3,124

Re: Alert for leading/trailing spaces in datasets?

Not sure if this what you are looking for: ( for multiple variables, use array _character_)

data have;

a1='abc';output;

a1=' b ';output;

run;

data want;

set have;

/* trailing  blank*/

if length(a1) < lengthc(a1) then T_flag=1;

/*for leading blanks*/

if first(a1)=' ' then L_flag=1;

run;

proc print;run;

Haikuo

Super User
Posts: 5,081

Re: Alert for leading/trailing spaces in datasets?

I'm sure you realize that you can only check the character fields, not all fields.

Since you know how to remove leading or trailing blanks, you can create a flag pretty easily.  Here's a one-liner that checks for both leading and trailing at the same time (which may or may not be what you were hoping to find):

flag = '*' || varname || '*' ne '*' || strip(varname) || '*';

I'm sure some of this can be replaced with CATS or similar, but I'm not as familiar with those functions since our shop still uses 9.1

Expanding this to a list of character variables is easy with arrays.  The tricky part is automating how you count the number of flags that will be needed.

Good luck.

Respected Advisor
Posts: 3,124

Re: Alert for leading/trailing spaces in datasets?

Inspired by Astounding's logic, another one-line approach could be:

flag=(lengthc(a1)>length(strip(a1)));

Haikuo

Super User
Posts: 5,081

Re: Alert for leading/trailing spaces in datasets?

As long as we are pausing for additional thought ...

I think you will need to use LENGTHN instead of LENGTH.  LENGTH has a "feature" built in, returning 1 when the string is blank.  That could cause incorrect results.

If this is going to be part of a standard data cleaning process, we should realize that LENGTHC(a1) is a constant for the entire data set.  An automated solution might first capture the defined lengths of all the character variables, and then use this set of constants instead of repeatedly using LENGTHC on every observation.

Good luck.

Respected Advisor
Posts: 3,124

Re: Alert for leading/trailing spaces in datasets?

Agreed. So something like if _n_=1 then lc=lengthc() will be used and of course lc will be retained.

Thanks,

Haikuo

Contributor
Posts: 44

Re: Alert for leading/trailing spaces in datasets?

It's not clear to me whether Stephen wants a separate flag for each character variable, or a single flag per observation (presumably, if any variable has a leading/trailing blank, the flag is set to 1)

If he wants a flag for each variable...that would be a larger solution (*bows out gracefully*).

If he wants a single flag for each observation the following would combine your ideas above and produce "flagcnt" for each observation in dataset "testdata":

data test(drop=i flag);

  set testdata;

  flagcnt=0;

  array c  _character_;

  do i = 1 to dim(c);

    flag = '*' || c(i) || '*' ne '*' || strip(c(i)) || '*';

    flagcnt = flagcnt + flag;

  end;

run;


Ask a Question
Discussion stats
  • 6 replies
  • 767 views
  • 0 likes
  • 4 in conversation