BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mmhxc5
Quartz | Level 8

Hi all,

I would like to remove a specific letter and the number just following it, but not a similar number from a character format.

data have;

A 606R1   3

A 607R2   1

A 607R3   1

A 607R1   1

A 607R     1

 

data want;

A 606   3

A 607   1

A 607   1

A 607   1

A 607   1

 

The data set is large. The R stands for revision and the number just following it the revision number. I want to remove the R and the number following it (not replacing it with space). There might be numbers following the revision number which I want to keep.

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@mmhxc5 wrote:

Hi @Patrick , Yes, I tried @Ksharp Suggestion. it works fine, but for characters starting with an R that I want to keep, the code removes those Rs.


That's a simple change. Here you go.

data have;
  input string $20.;
cards;
A 606R1   3
A 607R2   1
A 607R3   1
A 607R1   1
A 607R   1
A 607   1
400080
A 200
Z 400R
R 186   1
R 186R   1 
R 186R1   1
;

data want;
  set have;
  length want $20;
  want=prxchange('s/(?<=\d)R\d*\b//',1,string);
run;

proc print;
run;

 

Patrick_0-1593057544290.png

 

View solution in original post

15 REPLIES 15
PaigeMiller
Diamond | Level 26

UNTESTED CODE

data want;
    set have;
    where = find(variablename,'R');
    variablename = substr(variablename,1,where-1);
run;

Side issue:

we normally ask for working SAS code to be presented. Simply sticking data have; above your data does not provide working SAS code. So I have not tested the above code. If you want a tested solution, we need working SAS code.

--
Paige Miller
mmhxc5
Quartz | Level 8

Hi,

I can not share the data due to confidentiality. I tested your code and it removes everything after R. I just want to remove the revision number beside R but not the numbers after the revision number as shown in my example. Thanks

ballardw
Super User

@mmhxc5 wrote:

Hi,

I can not share the data due to confidentiality. I tested your code and it removes everything after R. I just want to remove the revision number beside R but not the numbers after the revision number as shown in my example. Thanks


Provide REPRESENTATIVE data. Your example all showed only 1 character following the R. So the implication is that there is only one character following.

mmhxc5
Quartz | Level 8

You are right. The revision numbers are single digits just shown in the example.

PaigeMiller
Diamond | Level 26

@mmhxc5 wrote:

Hi,

I can not share the data due to confidentiality. I tested your code and it removes everything after R. I just want to remove the revision number beside R but not the numbers after the revision number as shown in my example. Thanks


I didn't ask for your confidential data. I wanted the data you DID show as working SAS data step code. As shown, it appears that the number you want is a different variable (different column) which would be unaffected by SUBSTR(). So, to avoid this type of confusion, from now on you need to provide data as working SAS data step code.

 

 

--
Paige Miller
mmhxc5
Quartz | Level 8

Oh, Excuse me for the confusion my data made. The data I showed in my original post is just one column of a character variable not several variables.

ballardw
Super User

@mmhxc5 wrote:

Oh, Excuse me for the confusion my data made. The data I showed in my original post is just one column of a character variable not several variables.


Which is one of the reasons we say that pictures of data are not completely descriptive of the values shown. Data step code is.

Make some dummy data that looks like your data. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

Tom
Super User Tom
Super User

Here is how to present example data.  Does this represent the data you are trying to change?

data have;
  input string $20.;
cards;
A 606R1   3
A 607R2   1
A 607R3   1
A 607R1   1
A 607R    1
;

data want;
  input string $20.;
cards;
A 606   3
A 607   1
A 607   1
A 607   1
A 607   1
;

If so then there is a space after the R in the last observation that you also want to remove so you can just remove the R and the following character.

data test;
  set have ;
  r = indexc(string,'R');
  if r then string=substr(string,1,r-1)||substrn(string,r+2);
  drop r;
run;

If in reality there is not that extra space when the R is not followed by a digit then include examples for that case.

Also if there are cases where R does not exist then include those.

What about cases where R exists but it is not followed by a digit (or a space)?  What do you want to do in those cases?  Include them in the example data.

mmhxc5
Quartz | Level 8

Thank you, @Tom for your detailed description. you have pointed great issues for the date set.I updated the data set to include the cases you mentioned.

data have;
  input string $20.;
cards;
A 606R1   3
A 607R2   1
A 607R3   1
A 607R1   1
A 607R   1
A 607   1
400080
A 200
Z 400R
R 186   1
R 186R   1 
R 186R1   1
;

data want;
  input string $20.;
cards;
A 606   3
A 607   1
A 607   1
A 607   1
A 607   1
A 607   1
400080
A 200
Z 400
R 186   1
R 186   1
R 186   1
;

Yes, I would like to keep the values without an R in my want data set. For the case where an R exist but not followed by a digit, I just want to remove the R. Also, I found that there are cases where the character starts with an R which I would like to keep. I would be grateful if you revise your code to account for the new cases I explained. Thank you,

Patrick
Opal | Level 21

@mmhxc5 

Have you tried what @Ksharp proposed? That should do the trick.

mmhxc5
Quartz | Level 8

Hi @Patrick , Yes, I tried @Ksharp Suggestion. it works fine, but for characters starting with an R that I want to keep, the code removes those Rs.

Patrick
Opal | Level 21

@mmhxc5 wrote:

Hi @Patrick , Yes, I tried @Ksharp Suggestion. it works fine, but for characters starting with an R that I want to keep, the code removes those Rs.


That's a simple change. Here you go.

data have;
  input string $20.;
cards;
A 606R1   3
A 607R2   1
A 607R3   1
A 607R1   1
A 607R   1
A 607   1
400080
A 200
Z 400R
R 186   1
R 186R   1 
R 186R1   1
;

data want;
  set have;
  length want $20;
  want=prxchange('s/(?<=\d)R\d*\b//',1,string);
run;

proc print;
run;

 

Patrick_0-1593057544290.png

 

Tom
Super User Tom
Super User

Your examples don't seem that clear, some of the ones without any 'R' to remove still have different output from the input.

Code:

data test;
  infile cards dsd truncover ;
  input (have want) ( :$20.);
  r=findc(have,'R',,2);
  if r then try=substr(have,1,r-1)||substrn(have,r+1+findc(substrn(have,r+1,1),'0123456789'));
  else try=have;
  match=try=want;
cards;
A 606R1   3 ,A 606   3
A 607R2   1 ,A 607   1
A 607R3   1 ,A 607   1
A 607R1   1 ,A 607   1
A 607R    1 ,A 607   1
A607      1 ,A 607   1
400080      ,400080 
A 200       ,A 200
Z 400R      ,Z 400
R 186     1 ,R 186   1
R 186R   1  ,R 186   1
R 186R1  1  ,R 186   1
;

proc print;
run;

Result:

Obs    have           want         r    try            match

  1    A 606R1   3    A 606   3    6    A 606   3        1
  2    A 607R2   1    A 607   1    6    A 607   1        1
  3    A 607R3   1    A 607   1    6    A 607   1        1
  4    A 607R1   1    A 607   1    6    A 607   1        1
  5    A 607R    1    A 607   1    6    A 607    1       0
  6    A607      1    A 607   1    0    A607      1      0
  7    400080         400080       0    400080           1
  8    A 200          A 200        0    A 200            1
  9    Z 400R         Z 400        6    Z 400            1
 10    R 186     1    R 186   1    0    R 186     1      0
 11    R 186R   1     R 186   1    6    R 186   1        1
 12    R 186R1  1     R 186   1    6    R 186  1         0

Issues:

In OBS 6 and 10 there is no R to remove, but the two strings provided do not match.

And in OBS 5 you removed the space after the R but you didn't in OBS 11.

And in OBS 12 you added an extra space, so you replaced the R1 with a space instead of just removing it.

 

Is there a different rule you are tying to implement?  Like always placing that extra digit (when present) in the 9th column, no matter how long the first part is?

mmhxc5
Quartz | Level 8

@Tom, I really appreciate your help.  I would like to attach the original data for you. If you scroll the data you will find the R and the digits following it. Could you please check your code for this original data? Thanks,

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 15 replies
  • 2099 views
  • 3 likes
  • 6 in conversation