BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Babloo
Rhodochrosite | Level 12

Any help with the generic program to remove any special characters (leading/trailing spaces, CR, LF) other than alphabets, underscores and numbers in each observations for all the variables? 

 

I know we can accomplish this with COMPRESS function but I'm not certain how to make it work for all the variables in the dataset. I want to add this step in the macro program which already in place. Input dataset is dynamic and hence the variables. E.g. Dataset A can have ID and EMPLOYEE variables and Dataset B can have ID and AGE variables. Any dataset can be input for my generic program where I want to remove the special characters.

 

I don't want to create any new variables as part of this step.

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26
data want;
    set have;
    array var _character_;
    do i=1 to dim(var);
        var(i)=compress(var(i),.........);
    end;
run;
--
Paige Miller

View solution in original post

16 REPLIES 16
PaigeMiller
Diamond | Level 26
data want;
    set have;
    array var _character_;
    do i=1 to dim(var);
        var(i)=compress(var(i),.........);
    end;
run;
--
Paige Miller
Babloo
Rhodochrosite | Level 12

@PaigeMiller  I don't want to create any new variables as part of this step.  How to rename the variable 'i' to actual variable name?

PaigeMiller
Diamond | Level 26

Please try the code and see if there are new variables created. The only new variable created is I, which you can get rid of.

--
Paige Miller
Babloo
Rhodochrosite | Level 12

@PaigeMiller Already tried the below program and it's not working as excepted.  After removing the special characters we have to rename it to the actual variable name. If there are two variables in the dataset then I have to do the same in those two variables.

 

data have;
input id $;
datalines;
 124 5 
6 7 
$1k09~  
     
;
run;

data want;
    set have;
    array var _character_;
    do i=1 to dim(var);
        var(i)=compress(var(i),'090A0C0DA0'x);
    end;
run;

Excepted results:

 

1245
67
PaigeMiller
Diamond | Level 26

Please look at data set named HAVE. The data set is not being created properly and so of course after you use COMPRESS on it, you don't get the proper results. 

 

I add that this simple activity of LOOKING AT your own data ought to be the first step in debugging, you don't need people here in the SAS Communities to tell you that your data set is not being created properly.

--
Paige Miller
Babloo
Rhodochrosite | Level 12

@Kurt_Bremser @PaigeMiller  Can I ignore this NOTE in the log? 

 

NOTE: The array var has the same name as a SAS-supplied or user-defined function. Parentheses following this name are treated as
array references and not function references.

 

 

60   data  want;
61       set want;
62       array var _character_;
63       do i=1 to dim(var);
64           var(i)=compress(var(i),'090A0C0DA0'x);
NOTE: The array var has the same name as a SAS-supplied or user-defined function.  Parentheses following this name are treated as 
      array references and not function references.
65       end;
66       drop i;
67   run;

 

 

How to tweak the COMPRESS function for NOT to remove the spaces within the string? E.g. 'Hello World' should be same as  'Hello World' and NOT  'HelloWorld'

PaigeMiller
Diamond | Level 26

My mistake. To remove that NOTE from the log, I should not have told you to use an array named VAR. If you use an array named VARNAME (for example) then the NOTE does not appear in the log.

 

data want;
    set have;
    array varnames _character_;
    do i=1 to dim(varnames);
        varnames(i)=compress(varnames(i),'090A0C0DA0'x);
    end;
run;

 

About how to handle the case of "Hello World": if you want to remove leading and trailing blanks, but not remove blanks in the middle of the string, you can use the TRIM function and the LEFT function.

 

data want;
    set have;
    array varnames _character_;
    do i=1 to dim(varnames);
        varnames(i)=trim(left(compress(varnames(i),'090A0C0DA0'x)));
    end;
run;

 

--
Paige Miller
Babloo
Rhodochrosite | Level 12

@PaigeMiller I tried your code with trim and left function but it's not producing the desired results.

 

data have;
input id $;
datalines;
1234 
 234
123 456
!123
;
run;

data want;
    set have;
    array varnames _character_;
    do i=1 to dim(varnames);
        varnames(i)=trim(left(compress(varnames(i),'090A0C0DA0'x)));
    end;
drop i;
run;

Excepted Results:

 

1234
234
123 456
!23

With your code, I could see only '123' in the third observation instead of '123 456'

PaigeMiller
Diamond | Level 26

Repeating my earlier statements:

 

Please look at data set named HAVE. The data set is not being created properly and so of course after you use COMPRESS on it, you don't get the proper results. 

 

I add that this simple activity of LOOKING AT your own data ought to be the first step in debugging, you don't need people here in the SAS Communities to tell you that your data set is not being created properly.

 

Honestly, @Babloo you need to check to see if you have the right data every time you are not getting the right answer. Don't make us check for you.

--
Paige Miller
Babloo
Rhodochrosite | Level 12

@PaigeMiller  Sample data which I shown in data step is correct and align with the real life data. Hence I looking for guidance and clarification.

Kurt_Bremser
Super User

@Babloo wrote:

@PaigeMiller  Sample data which I shown in data step is correct and align with the real life data. Hence I looking for guidance and clarification.


A clear and resounding NO. The way you read the DATALINES, only "123" will be read into the variable, as the blank acts as a delimiter.

You need to fix that first, then you will see that @PaigeMiller 's code works.

Babloo
Rhodochrosite | Level 12

I want to remove CR, LF and leading and trailing spaces in the string. With this code, I could see only '123' in the third observation instead of '123 456'.

 

I don't want to remove the spaces between the words  or string.

 

 

data have;
input id $;
datalines;
1234 
 234
123 456
123
;
run;

data want;
    set have;
    array varnames _character_;
    do i=1 to dim(varnames);
        varnames(i)=trim(left(compress(varnames(i),'090A0C0DA0'x)));
    end;
drop i;
run;

Excepted result:

 

 

 

1234
234
123 456
123

 

 

Also I'd like to know how to achieve this using PRXCHANGE or any other similar function.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 2047 views
  • 6 likes
  • 3 in conversation