Hello,
I'm trying to figure out a technique of removing a specific character value from a string where the character value is at a specific position in the string. For example, I have a variable like the following: "N,N,N,Y". And say I want to remove the value in the second position of that string (in this case it would be the N in bold), so that the new value of the variable is now: "N,N,Y".
I've tried using the technique of combining multiple procedures (substr and tranwrd); however, I have not had any luck. Below is an example of how I used the combination:
newvariable=tranwrd(substr(oldvariable,position,1),"N,","");
Unfortunately, the tranwrd function finds all "N," and removes them so I get the new variable value equal to "Y". The similar issue I get is if the variable is equal to "N,N,N" and I want to remove the second position of that string again (in this case it would again be the N in bold), so that the new value is equal to "N,N". However, using the combination of functions previously shown, I get the value "N".
Is there a specific character function (or combination of functions) I could use that can remove/replace a value at a specific position within a string? Thank you in advance for any advice or help!
My program works as you want:
data have ;
length ID position 8 Have $ 20;
input ID position have;
cards;
1000 2 N,N,Y
1001 2 Y,N,N,N,Y
1002 4 N,N,N,N,N
1003 6 N,Y,Y,N,Y,Y,N
1004 5 N,Y,N,N,N,Y,N
1005 3 Y,N,Y
;
RUN;
data want;
set have;
length Wanted $20;
do i=1 to countw(have);
if i=position then continue;
wanted=cats(wanted,scan(have,i),',');
end;
wanted=substr(wanted,1,length(wanted)-1);
drop i;
run;
CTorres
Here's one way (untested):
first_comma = find(old_variable, ',');
second_comma = find(old_variable, ',' , first_comma+1);
newvariable = substr(old_variable, 1, first_comma), || substr(old_variable, second_comma+1);
This suits your example but I'm not sure how dynamic your process has to be. Let me know if this helps:
data have;
infile cards dsd;
input string $;
cards;
nnny
;
run;
data want;
set have;
new_string = substr(string,1,1)||substr(string,3,2);
run;
Thank you both for your inputs, I appreciate your help with this. Unfortunately, the value of the string variable could be many different values i.e. N,Y or N,N,Y or N,Y,Y,N or N,N,N,N,Y, etc...
I was hoping there was some kind of function, similar to the tranwrd, that could replace a value at a specific position within the string. So, if I needed the 3rd value removed from the last example given: "N,N,N,N,Y" (the N in bold), I could get the resulting value of the new variable equal to "N,N,N,Y". But, what is dynamic is that the position of the value I want to remove can change by observation; as well as, the value of the actual string variable I'm wanting to replace and change.
Thanks again!
In that case, you might have to define the problem a bit more. For example, would POSITION change from one observation to the next? Is it a variable in the incoming data set, or a macro variable that applies to every observation? Are your character strings always one character long, so that the commas are always in the even-numbered positions?
For what it is worth, the answer I posted always removes the second piece of the string, regardless of the lengths between the commas.
Sorry for being vague. The position is a variable in the data set extracted from another variable and in the string, the values are all 1 character with commas separating them. Here would be an example of the data set:
ID Position Have Wanted
1000 2 N,N,Y N,Y
1001 2 Y,N,N,N,Y Y,N,N,Y
1002 4 N,N,N,N,N N,N,N,N
1003 6 N,Y,Y,N,Y,Y,N N,Y,Y,N,Y,N
1004 5 N,Y,N,N,N,Y,N N,Y,N,N,Y,N
1005 3 Y,N,Y Y,N
My program works as you want:
data have ;
length ID position 8 Have $ 20;
input ID position have;
cards;
1000 2 N,N,Y
1001 2 Y,N,N,N,Y
1002 4 N,N,N,N,N
1003 6 N,Y,Y,N,Y,Y,N
1004 5 N,Y,N,N,N,Y,N
1005 3 Y,N,Y
;
RUN;
data want;
set have;
length Wanted $20;
do i=1 to countw(have);
if i=position then continue;
wanted=cats(wanted,scan(have,i),',');
end;
wanted=substr(wanted,1,length(wanted)-1);
drop i;
run;
CTorres
It does. Thank you very much for your help with this!
OK, this might do the trick then.
wanted = have || ' ';
if position = 1 then wanted = substr(wanted, 3);
else wanted = substr(wanted, 1, 2*position - 3) || substr(wanted, 2*position);
The purpose of appending a blank on the first line is to guarantee that SUBSTR never tries to read past the end of the original string (such as when POSITION indicates that the last Y/N should be removed).
If the items between commas vary in length, more complex solutions are needed. (See, for example, the CTorres suggestion.)
Good luck.
When you say the position in the string is dynamic I assume that this value is known in every observation.
The following program removes the text found in different positions (del variable) in the text separated by commas:
data have ;
length del 8 text $ 20;
input del text;
cards;
2 N,N,Y,Y,N,XX,G
4 YX,N,Y,N,N,Y,AA
1 N,X,T,Y,Y,A,F
3 Y,XX,N,YY,DF
3 M,N,N,Y,YY,XX,Z
8 Y,N,Y,Y,Y,Y,Y,D
;
Run;
data want;
set have;
length newtext $20;
do i=1 to countw(text);
if i=del then continue;
newtext=cats(newtext,scan(text,i),',');
end;
newtext=substr(newtext,1,length(newtext)-1);
drop i;
run;
I hope this is what you want,
CTorres
Why not LEFT substr() ?
data have ;
length ID position 8 Have $ 20;
input ID position have;
cards;
1000 2 N,N,Y
1001 2 Y,N,N,N,Y
1002 4 N,N,N,N,N
1003 6 N,Y,Y,N,Y,Y,N
1004 5 N,Y,N,N,N,Y,N
1005 3 Y,N,Y
;
RUN;
data want;
set have;
substr(have,ifn(position-1=0,1,position-1)*2,2)='';
run;
Xia Keshan
Yes, It works OK but I would improve the program with a compress at the end to remove the blanks:
data want;
set have;
want=have;
substr(want,ifn(position-1=0,1,position-1)*2,2)='';
want=compress(want);
run;
CTorres
Here is one more way to skin the cat, using PRX.
data have;
length ID position 8 Have $ 20;
input ID position have;
cards;
1000 2 1,2,3
1001 3 1,2,3,4,5
1002 4 1,2,3,4,5
1003 6 N,Y,Y,N,Y,Y,N
1004 5 N,Y,N,N,N,Y,N
1005 1 Y,N,Y
;
RUN;
DATA WANT;
SET HAVE;
IF POSITION=1 THEN
WANT=SUBSTR(HAVE,3);
ELSE WANT=PRXCHANGE('s/(^\S{'||cats(position*2-3)||'})(\S{2})(.+$)/$1$3/',-1,have);
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.