BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dcruik
Lapis Lazuli | Level 10

Hello,

I'm trying to figure out a technique of removing a specific character value from a string where the character value is at a specific position in the string.  For example, I have a variable like the following:  "N,N,N,Y".  And say I want to remove the value in the second position of that string (in this case it would be the N in bold), so that the new value of the variable is now:  "N,N,Y".

I've tried using the technique of combining multiple procedures (substr and tranwrd); however, I have not had any luck.  Below is an example of how I used the combination:

newvariable=tranwrd(substr(oldvariable,position,1),"N,","");

Unfortunately, the tranwrd function finds all "N," and removes them so I get the new variable value equal to "Y".  The similar issue I get is if the variable is equal to "N,N,N" and I want to remove the second position of that string again (in this case it would again be the N in bold), so that the new value is equal to "N,N".  However, using the combination of functions previously shown, I get the value "N".

Is there a specific character function (or combination of functions) I could use that can remove/replace a value at a specific position within a string?  Thank you in advance for any advice or help!

1 ACCEPTED SOLUTION

Accepted Solutions
CTorres
Quartz | Level 8

My program works as you want:

data have ;

  length ID position 8 Have $ 20;

  input ID position have;

cards;

1000 2  N,N,Y         

1001 2  Y,N,N,N,Y     

1002 4  N,N,N,N,N     

1003 6  N,Y,Y,N,Y,Y,N 

1004 5  N,Y,N,N,N,Y,N 

1005 3  Y,N,Y          

;

RUN;

data want;

  set have;

  length Wanted $20;

  do i=1 to countw(have);

    if i=position then continue;

    wanted=cats(wanted,scan(have,i),',');

  end;

  wanted=substr(wanted,1,length(wanted)-1);

  drop i;

run;

CTorres

View solution in original post

12 REPLIES 12
Astounding
PROC Star

Here's one way (untested):

first_comma = find(old_variable, ',');

second_comma = find(old_variable, ',' , first_comma+1);

newvariable = substr(old_variable, 1, first_comma), || substr(old_variable, second_comma+1);


Steelers_In_DC
Barite | Level 11

This suits your example but I'm not sure how dynamic your process has to be.  Let me know if this helps:

data have;

infile cards dsd;

input string $;

cards;

nnny

;

run;

data want;

set have;

new_string = substr(string,1,1)||substr(string,3,2);

run;

dcruik
Lapis Lazuli | Level 10

Thank you both for your inputs, I appreciate your help with this.  Unfortunately, the value of the string variable could be many different values i.e. N,Y or N,N,Y or N,Y,Y,N or N,N,N,N,Y, etc...

I was hoping there was some kind of function, similar to the tranwrd, that could replace a value at a specific position within the string.  So, if I needed the 3rd value removed from the last example given: "N,N,N,N,Y" (the N in bold), I could get the resulting value of the new variable equal to "N,N,N,Y".  But, what is dynamic is that the position of the value I want to remove can change by observation; as well as, the value of the actual string variable I'm wanting to replace and change.

Thanks again!

Astounding
PROC Star

In that case, you might have to define the problem a bit more.  For example, would POSITION change from one observation to the next?  Is it a variable in the incoming data set, or a macro variable that applies to every observation?  Are your character strings always one character long, so that the commas are always in the even-numbered positions?

For what it is worth, the answer I posted always removes the second piece of the string, regardless of the lengths between the commas.

dcruik
Lapis Lazuli | Level 10

Sorry for being vague.  The position is a variable in the data set extracted from another variable and in the string, the values are all 1 character with commas separating them.  Here would be an example of the data set:

ID              Position             Have                         Wanted

1000               2                    N,N,Y                          N,Y

1001               2                    Y,N,N,N,Y                   Y,N,N,Y

1002               4                    N,N,N,N,N                   N,N,N,N

1003               6                    N,Y,Y,N,Y,Y,N            N,Y,Y,N,Y,N

1004               5                    N,Y,N,N,N,Y,N            N,Y,N,N,Y,N

1005               3                    Y,N,Y                          Y,N  

CTorres
Quartz | Level 8

My program works as you want:

data have ;

  length ID position 8 Have $ 20;

  input ID position have;

cards;

1000 2  N,N,Y         

1001 2  Y,N,N,N,Y     

1002 4  N,N,N,N,N     

1003 6  N,Y,Y,N,Y,Y,N 

1004 5  N,Y,N,N,N,Y,N 

1005 3  Y,N,Y          

;

RUN;

data want;

  set have;

  length Wanted $20;

  do i=1 to countw(have);

    if i=position then continue;

    wanted=cats(wanted,scan(have,i),',');

  end;

  wanted=substr(wanted,1,length(wanted)-1);

  drop i;

run;

CTorres

dcruik
Lapis Lazuli | Level 10

It does.  Thank you very much for your help with this!

Astounding
PROC Star

OK, this might do the trick then.

wanted = have || ' ';

if position = 1 then wanted = substr(wanted, 3);

else wanted = substr(wanted, 1, 2*position - 3) || substr(wanted, 2*position);

The purpose of appending a blank on the first line is to guarantee that SUBSTR never tries to read past the end of the original string (such as when POSITION indicates that the last Y/N should be removed).

If the items between commas vary in length, more complex solutions are needed.  (See, for example, the CTorres suggestion.)

Good luck.

CTorres
Quartz | Level 8

When you say the position in the string is dynamic I assume that this value is known in every observation.

The following program removes the text found in different positions (del variable) in the text separated by commas:

data have ;

  length del 8 text $ 20;

  input del text;

cards;

2 N,N,Y,Y,N,XX,G

4 YX,N,Y,N,N,Y,AA

1 N,X,T,Y,Y,A,F

3 Y,XX,N,YY,DF

3 M,N,N,Y,YY,XX,Z

8 Y,N,Y,Y,Y,Y,Y,D

;

Run;

data want;

  set have;

  length newtext $20;

  do i=1 to countw(text);

    if i=del then continue;

    newtext=cats(newtext,scan(text,i),',');

  end;

  newtext=substr(newtext,1,length(newtext)-1);

  drop i;

run;

I hope this is what you want,

CTorres

Ksharp
Super User

Why not LEFT substr() ?

Code: Program

data have ;
  length ID position 8 Have $ 20;
  input ID position have;
cards;
1000 2 N,N,Y 
1001 2 Y,N,N,N,Y 
1002 4 N,N,N,N,N 
1003 6 N,Y,Y,N,Y,Y,N
1004 5 N,Y,N,N,N,Y,N
1005 3 Y,N,Y 
;
RUN;

data want;
set have;
substr(have,ifn(position-1=0,1,position-1)*2,2)='';
run;

Xia Keshan

CTorres
Quartz | Level 8

Yes, It works OK but I would improve the program with a compress at the end to remove the blanks:

data want;

set have;

want=have;

substr(want,ifn(position-1=0,1,position-1)*2,2)='';

want=compress(want);

run;

CTorres

Haikuo
Onyx | Level 15

Here is one more way to skin the cat, using PRX.

data have;

     length ID position 8 Have $ 20;

     input ID position have;

     cards;

1000 2 1,2,3

1001 3 1,2,3,4,5

1002 4 1,2,3,4,5

1003 6 N,Y,Y,N,Y,Y,N

1004 5 N,Y,N,N,N,Y,N

1005 1 Y,N,Y

;

RUN;

DATA WANT;

     SET HAVE;

     IF POSITION=1 THEN

           WANT=SUBSTR(HAVE,3);

     ELSE WANT=PRXCHANGE('s/(^\S{'||cats(position*2-3)||'})(\S{2})(.+$)/$1$3/',-1,have);

run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 9387 views
  • 4 likes
  • 6 in conversation