- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I'm trying to figure out a technique of removing a specific character value from a string where the character value is at a specific position in the string. For example, I have a variable like the following: "N,N,N,Y". And say I want to remove the value in the second position of that string (in this case it would be the N in bold), so that the new value of the variable is now: "N,N,Y".
I've tried using the technique of combining multiple procedures (substr and tranwrd); however, I have not had any luck. Below is an example of how I used the combination:
newvariable=tranwrd(substr(oldvariable,position,1),"N,","");
Unfortunately, the tranwrd function finds all "N," and removes them so I get the new variable value equal to "Y". The similar issue I get is if the variable is equal to "N,N,N" and I want to remove the second position of that string again (in this case it would again be the N in bold), so that the new value is equal to "N,N". However, using the combination of functions previously shown, I get the value "N".
Is there a specific character function (or combination of functions) I could use that can remove/replace a value at a specific position within a string? Thank you in advance for any advice or help!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
My program works as you want:
data have ;
length ID position 8 Have $ 20;
input ID position have;
cards;
1000 2 N,N,Y
1001 2 Y,N,N,N,Y
1002 4 N,N,N,N,N
1003 6 N,Y,Y,N,Y,Y,N
1004 5 N,Y,N,N,N,Y,N
1005 3 Y,N,Y
;
RUN;
data want;
set have;
length Wanted $20;
do i=1 to countw(have);
if i=position then continue;
wanted=cats(wanted,scan(have,i),',');
end;
wanted=substr(wanted,1,length(wanted)-1);
drop i;
run;
CTorres
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here's one way (untested):
first_comma = find(old_variable, ',');
second_comma = find(old_variable, ',' , first_comma+1);
newvariable = substr(old_variable, 1, first_comma), || substr(old_variable, second_comma+1);
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This suits your example but I'm not sure how dynamic your process has to be. Let me know if this helps:
data have;
infile cards dsd;
input string $;
cards;
nnny
;
run;
data want;
set have;
new_string = substr(string,1,1)||substr(string,3,2);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you both for your inputs, I appreciate your help with this. Unfortunately, the value of the string variable could be many different values i.e. N,Y or N,N,Y or N,Y,Y,N or N,N,N,N,Y, etc...
I was hoping there was some kind of function, similar to the tranwrd, that could replace a value at a specific position within the string. So, if I needed the 3rd value removed from the last example given: "N,N,N,N,Y" (the N in bold), I could get the resulting value of the new variable equal to "N,N,N,Y". But, what is dynamic is that the position of the value I want to remove can change by observation; as well as, the value of the actual string variable I'm wanting to replace and change.
Thanks again!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In that case, you might have to define the problem a bit more. For example, would POSITION change from one observation to the next? Is it a variable in the incoming data set, or a macro variable that applies to every observation? Are your character strings always one character long, so that the commas are always in the even-numbered positions?
For what it is worth, the answer I posted always removes the second piece of the string, regardless of the lengths between the commas.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sorry for being vague. The position is a variable in the data set extracted from another variable and in the string, the values are all 1 character with commas separating them. Here would be an example of the data set:
ID Position Have Wanted
1000 2 N,N,Y N,Y
1001 2 Y,N,N,N,Y Y,N,N,Y
1002 4 N,N,N,N,N N,N,N,N
1003 6 N,Y,Y,N,Y,Y,N N,Y,Y,N,Y,N
1004 5 N,Y,N,N,N,Y,N N,Y,N,N,Y,N
1005 3 Y,N,Y Y,N
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
My program works as you want:
data have ;
length ID position 8 Have $ 20;
input ID position have;
cards;
1000 2 N,N,Y
1001 2 Y,N,N,N,Y
1002 4 N,N,N,N,N
1003 6 N,Y,Y,N,Y,Y,N
1004 5 N,Y,N,N,N,Y,N
1005 3 Y,N,Y
;
RUN;
data want;
set have;
length Wanted $20;
do i=1 to countw(have);
if i=position then continue;
wanted=cats(wanted,scan(have,i),',');
end;
wanted=substr(wanted,1,length(wanted)-1);
drop i;
run;
CTorres
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It does. Thank you very much for your help with this!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
OK, this might do the trick then.
wanted = have || ' ';
if position = 1 then wanted = substr(wanted, 3);
else wanted = substr(wanted, 1, 2*position - 3) || substr(wanted, 2*position);
The purpose of appending a blank on the first line is to guarantee that SUBSTR never tries to read past the end of the original string (such as when POSITION indicates that the last Y/N should be removed).
If the items between commas vary in length, more complex solutions are needed. (See, for example, the CTorres suggestion.)
Good luck.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
When you say the position in the string is dynamic I assume that this value is known in every observation.
The following program removes the text found in different positions (del variable) in the text separated by commas:
data have ;
length del 8 text $ 20;
input del text;
cards;
2 N,N,Y,Y,N,XX,G
4 YX,N,Y,N,N,Y,AA
1 N,X,T,Y,Y,A,F
3 Y,XX,N,YY,DF
3 M,N,N,Y,YY,XX,Z
8 Y,N,Y,Y,Y,Y,Y,D
;
Run;
data want;
set have;
length newtext $20;
do i=1 to countw(text);
if i=del then continue;
newtext=cats(newtext,scan(text,i),',');
end;
newtext=substr(newtext,1,length(newtext)-1);
drop i;
run;
I hope this is what you want,
CTorres
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Why not LEFT substr() ?
Code: Program
data have ;
length ID position 8 Have $ 20;
input ID position have;
cards;
1000 2 N,N,Y
1001 2 Y,N,N,N,Y
1002 4 N,N,N,N,N
1003 6 N,Y,Y,N,Y,Y,N
1004 5 N,Y,N,N,N,Y,N
1005 3 Y,N,Y
;
RUN;
data want;
set have;
substr(have,ifn(position-1=0,1,position-1)*2,2)='';
run;
Xia Keshan
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, It works OK but I would improve the program with a compress at the end to remove the blanks:
data want;
set have;
want=have;
substr(want,ifn(position-1=0,1,position-1)*2,2)='';
want=compress(want);
run;
CTorres
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is one more way to skin the cat, using PRX.
data have;
length ID position 8 Have $ 20;
input ID position have;
cards;
1000 2 1,2,3
1001 3 1,2,3,4,5
1002 4 1,2,3,4,5
1003 6 N,Y,Y,N,Y,Y,N
1004 5 N,Y,N,N,N,Y,N
1005 1 Y,N,Y
;
RUN;
DATA WANT;
SET HAVE;
IF POSITION=1 THEN
WANT=SUBSTR(HAVE,3);
ELSE WANT=PRXCHANGE('s/(^\S{'||cats(position*2-3)||'})(\S{2})(.+$)/$1$3/',-1,have);
run;