BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dcruik
Lapis Lazuli | Level 10

Hello,

I'm trying to figure out a technique of removing a specific character value from a string where the character value is at a specific position in the string.  For example, I have a variable like the following:  "N,N,N,Y".  And say I want to remove the value in the second position of that string (in this case it would be the N in bold), so that the new value of the variable is now:  "N,N,Y".

I've tried using the technique of combining multiple procedures (substr and tranwrd); however, I have not had any luck.  Below is an example of how I used the combination:

newvariable=tranwrd(substr(oldvariable,position,1),"N,","");

Unfortunately, the tranwrd function finds all "N," and removes them so I get the new variable value equal to "Y".  The similar issue I get is if the variable is equal to "N,N,N" and I want to remove the second position of that string again (in this case it would again be the N in bold), so that the new value is equal to "N,N".  However, using the combination of functions previously shown, I get the value "N".

Is there a specific character function (or combination of functions) I could use that can remove/replace a value at a specific position within a string?  Thank you in advance for any advice or help!

1 ACCEPTED SOLUTION

Accepted Solutions
CTorres
Quartz | Level 8

My program works as you want:

data have ;

  length ID position 8 Have $ 20;

  input ID position have;

cards;

1000 2  N,N,Y         

1001 2  Y,N,N,N,Y     

1002 4  N,N,N,N,N     

1003 6  N,Y,Y,N,Y,Y,N 

1004 5  N,Y,N,N,N,Y,N 

1005 3  Y,N,Y          

;

RUN;

data want;

  set have;

  length Wanted $20;

  do i=1 to countw(have);

    if i=position then continue;

    wanted=cats(wanted,scan(have,i),',');

  end;

  wanted=substr(wanted,1,length(wanted)-1);

  drop i;

run;

CTorres

View solution in original post

12 REPLIES 12
Astounding
PROC Star

Here's one way (untested):

first_comma = find(old_variable, ',');

second_comma = find(old_variable, ',' , first_comma+1);

newvariable = substr(old_variable, 1, first_comma), || substr(old_variable, second_comma+1);


Steelers_In_DC
Barite | Level 11

This suits your example but I'm not sure how dynamic your process has to be.  Let me know if this helps:

data have;

infile cards dsd;

input string $;

cards;

nnny

;

run;

data want;

set have;

new_string = substr(string,1,1)||substr(string,3,2);

run;

dcruik
Lapis Lazuli | Level 10

Thank you both for your inputs, I appreciate your help with this.  Unfortunately, the value of the string variable could be many different values i.e. N,Y or N,N,Y or N,Y,Y,N or N,N,N,N,Y, etc...

I was hoping there was some kind of function, similar to the tranwrd, that could replace a value at a specific position within the string.  So, if I needed the 3rd value removed from the last example given: "N,N,N,N,Y" (the N in bold), I could get the resulting value of the new variable equal to "N,N,N,Y".  But, what is dynamic is that the position of the value I want to remove can change by observation; as well as, the value of the actual string variable I'm wanting to replace and change.

Thanks again!

Astounding
PROC Star

In that case, you might have to define the problem a bit more.  For example, would POSITION change from one observation to the next?  Is it a variable in the incoming data set, or a macro variable that applies to every observation?  Are your character strings always one character long, so that the commas are always in the even-numbered positions?

For what it is worth, the answer I posted always removes the second piece of the string, regardless of the lengths between the commas.

dcruik
Lapis Lazuli | Level 10

Sorry for being vague.  The position is a variable in the data set extracted from another variable and in the string, the values are all 1 character with commas separating them.  Here would be an example of the data set:

ID              Position             Have                         Wanted

1000               2                    N,N,Y                          N,Y

1001               2                    Y,N,N,N,Y                   Y,N,N,Y

1002               4                    N,N,N,N,N                   N,N,N,N

1003               6                    N,Y,Y,N,Y,Y,N            N,Y,Y,N,Y,N

1004               5                    N,Y,N,N,N,Y,N            N,Y,N,N,Y,N

1005               3                    Y,N,Y                          Y,N  

CTorres
Quartz | Level 8

My program works as you want:

data have ;

  length ID position 8 Have $ 20;

  input ID position have;

cards;

1000 2  N,N,Y         

1001 2  Y,N,N,N,Y     

1002 4  N,N,N,N,N     

1003 6  N,Y,Y,N,Y,Y,N 

1004 5  N,Y,N,N,N,Y,N 

1005 3  Y,N,Y          

;

RUN;

data want;

  set have;

  length Wanted $20;

  do i=1 to countw(have);

    if i=position then continue;

    wanted=cats(wanted,scan(have,i),',');

  end;

  wanted=substr(wanted,1,length(wanted)-1);

  drop i;

run;

CTorres

dcruik
Lapis Lazuli | Level 10

It does.  Thank you very much for your help with this!

Astounding
PROC Star

OK, this might do the trick then.

wanted = have || ' ';

if position = 1 then wanted = substr(wanted, 3);

else wanted = substr(wanted, 1, 2*position - 3) || substr(wanted, 2*position);

The purpose of appending a blank on the first line is to guarantee that SUBSTR never tries to read past the end of the original string (such as when POSITION indicates that the last Y/N should be removed).

If the items between commas vary in length, more complex solutions are needed.  (See, for example, the CTorres suggestion.)

Good luck.

CTorres
Quartz | Level 8

When you say the position in the string is dynamic I assume that this value is known in every observation.

The following program removes the text found in different positions (del variable) in the text separated by commas:

data have ;

  length del 8 text $ 20;

  input del text;

cards;

2 N,N,Y,Y,N,XX,G

4 YX,N,Y,N,N,Y,AA

1 N,X,T,Y,Y,A,F

3 Y,XX,N,YY,DF

3 M,N,N,Y,YY,XX,Z

8 Y,N,Y,Y,Y,Y,Y,D

;

Run;

data want;

  set have;

  length newtext $20;

  do i=1 to countw(text);

    if i=del then continue;

    newtext=cats(newtext,scan(text,i),',');

  end;

  newtext=substr(newtext,1,length(newtext)-1);

  drop i;

run;

I hope this is what you want,

CTorres

Ksharp
Super User

Why not LEFT substr() ?

Code: Program

data have ;
  length ID position 8 Have $ 20;
  input ID position have;
cards;
1000 2 N,N,Y 
1001 2 Y,N,N,N,Y 
1002 4 N,N,N,N,N 
1003 6 N,Y,Y,N,Y,Y,N
1004 5 N,Y,N,N,N,Y,N
1005 3 Y,N,Y 
;
RUN;

data want;
set have;
substr(have,ifn(position-1=0,1,position-1)*2,2)='';
run;

Xia Keshan

CTorres
Quartz | Level 8

Yes, It works OK but I would improve the program with a compress at the end to remove the blanks:

data want;

set have;

want=have;

substr(want,ifn(position-1=0,1,position-1)*2,2)='';

want=compress(want);

run;

CTorres

Haikuo
Onyx | Level 15

Here is one more way to skin the cat, using PRX.

data have;

     length ID position 8 Have $ 20;

     input ID position have;

     cards;

1000 2 1,2,3

1001 3 1,2,3,4,5

1002 4 1,2,3,4,5

1003 6 N,Y,Y,N,Y,Y,N

1004 5 N,Y,N,N,N,Y,N

1005 1 Y,N,Y

;

RUN;

DATA WANT;

     SET HAVE;

     IF POSITION=1 THEN

           WANT=SUBSTR(HAVE,3);

     ELSE WANT=PRXCHANGE('s/(^\S{'||cats(position*2-3)||'})(\S{2})(.+$)/$1$3/',-1,have);

run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 11427 views
  • 4 likes
  • 6 in conversation