Any help with the generic program to remove any special characters (leading/trailing spaces, CR, LF) other than alphabets, underscores and numbers in each observations for all the variables?
I know we can accomplish this with COMPRESS function but I'm not certain how to make it work for all the variables in the dataset. I want to add this step in the macro program which already in place. Input dataset is dynamic and hence the variables. E.g. Dataset A can have ID and EMPLOYEE variables and Dataset B can have ID and AGE variables. Any dataset can be input for my generic program where I want to remove the special characters.
I don't want to create any new variables as part of this step.
data want;
set have;
array var _character_;
do i=1 to dim(var);
var(i)=compress(var(i),.........);
end;
run;
data want;
set have;
array var _character_;
do i=1 to dim(var);
var(i)=compress(var(i),.........);
end;
run;
@PaigeMiller I don't want to create any new variables as part of this step. How to rename the variable 'i' to actual variable name?
Please try the code and see if there are new variables created. The only new variable created is I, which you can get rid of.
@PaigeMiller Already tried the below program and it's not working as excepted. After removing the special characters we have to rename it to the actual variable name. If there are two variables in the dataset then I have to do the same in those two variables.
data have; input id $; datalines; 124 5 6 7 $1k09~ ; run; data want; set have; array var _character_; do i=1 to dim(var); var(i)=compress(var(i),'090A0C0DA0'x); end; run;
Excepted results:
1245 67
Please look at data set named HAVE. The data set is not being created properly and so of course after you use COMPRESS on it, you don't get the proper results.
I add that this simple activity of LOOKING AT your own data ought to be the first step in debugging, you don't need people here in the SAS Communities to tell you that your data set is not being created properly.
You don't "rename" the helper variable i. Just get rid of it with a DROP statement or DROP= dataset option.
@Kurt_Bremser @PaigeMiller Can I ignore this NOTE in the log?
NOTE: The array var has the same name as a SAS-supplied or user-defined function. Parentheses following this name are treated as
array references and not function references.
60 data want; 61 set want; 62 array var _character_; 63 do i=1 to dim(var); 64 var(i)=compress(var(i),'090A0C0DA0'x); NOTE: The array var has the same name as a SAS-supplied or user-defined function. Parentheses following this name are treated as array references and not function references. 65 end; 66 drop i; 67 run;
How to tweak the COMPRESS function for NOT to remove the spaces within the string? E.g. 'Hello World' should be same as 'Hello World' and NOT 'HelloWorld'
My mistake. To remove that NOTE from the log, I should not have told you to use an array named VAR. If you use an array named VARNAME (for example) then the NOTE does not appear in the log.
data want;
set have;
array varnames _character_;
do i=1 to dim(varnames);
varnames(i)=compress(varnames(i),'090A0C0DA0'x);
end;
run;
About how to handle the case of "Hello World": if you want to remove leading and trailing blanks, but not remove blanks in the middle of the string, you can use the TRIM function and the LEFT function.
data want;
set have;
array varnames _character_;
do i=1 to dim(varnames);
varnames(i)=trim(left(compress(varnames(i),'090A0C0DA0'x)));
end;
run;
@PaigeMiller I tried your code with trim and left function but it's not producing the desired results.
data have; input id $; datalines; 1234 234 123 456 !123 ; run; data want; set have; array varnames _character_; do i=1 to dim(varnames); varnames(i)=trim(left(compress(varnames(i),'090A0C0DA0'x))); end; drop i; run;
Excepted Results:
1234 234 123 456 !23
With your code, I could see only '123' in the third observation instead of '123 456'
Repeating my earlier statements:
Please look at data set named HAVE. The data set is not being created properly and so of course after you use COMPRESS on it, you don't get the proper results.
I add that this simple activity of LOOKING AT your own data ought to be the first step in debugging, you don't need people here in the SAS Communities to tell you that your data set is not being created properly.
Honestly, @Babloo you need to check to see if you have the right data every time you are not getting the right answer. Don't make us check for you.
@PaigeMiller Sample data which I shown in data step is correct and align with the real life data. Hence I looking for guidance and clarification.
@Babloo wrote:
@PaigeMiller Sample data which I shown in data step is correct and align with the real life data. Hence I looking for guidance and clarification.
A clear and resounding NO. The way you read the DATALINES, only "123" will be read into the variable, as the blank acts as a delimiter.
You need to fix that first, then you will see that @PaigeMiller 's code works.
You should also make it a habit to not use round brackets for array references. It is common practice to use curly quotes for this.
I want to remove CR, LF and leading and trailing spaces in the string. With this code, I could see only '123' in the third observation instead of '123 456'.
I don't want to remove the spaces between the words or string.
data have; input id $; datalines; 1234 234 123 456 123 ; run; data want; set have; array varnames _character_; do i=1 to dim(varnames); varnames(i)=trim(left(compress(varnames(i),'090A0C0DA0'x))); end; drop i; run;
Excepted result:
1234 234 123 456 123
Also I'd like to know how to achieve this using PRXCHANGE or any other similar function.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.