Hi all,
I would like to remove a specific letter and the number just following it, but not a similar number from a character format.
data have;
A 606R1 3
A 607R2 1
A 607R3 1
A 607R1 1
A 607R 1
data want;
A 606 3
A 607 1
A 607 1
A 607 1
A 607 1
The data set is large. The R stands for revision and the number just following it the revision number. I want to remove the R and the number following it (not replacing it with space). There might be numbers following the revision number which I want to keep.
@mmhxc5 wrote:
Hi @Patrick , Yes, I tried @Ksharp Suggestion. it works fine, but for characters starting with an R that I want to keep, the code removes those Rs.
That's a simple change. Here you go.
data have;
  input string $20.;
cards;
A 606R1   3
A 607R2   1
A 607R3   1
A 607R1   1
A 607R   1
A 607   1
400080
A 200
Z 400R
R 186   1
R 186R   1 
R 186R1   1
;
data want;
  set have;
  length want $20;
  want=prxchange('s/(?<=\d)R\d*\b//',1,string);
run;
proc print;
run;
UNTESTED CODE
data want;
    set have;
    where = find(variablename,'R');
    variablename = substr(variablename,1,where-1);
run;Side issue:
we normally ask for working SAS code to be presented. Simply sticking data have; above your data does not provide working SAS code. So I have not tested the above code. If you want a tested solution, we need working SAS code.
Hi,
I can not share the data due to confidentiality. I tested your code and it removes everything after R. I just want to remove the revision number beside R but not the numbers after the revision number as shown in my example. Thanks
@mmhxc5 wrote:
Hi,
I can not share the data due to confidentiality. I tested your code and it removes everything after R. I just want to remove the revision number beside R but not the numbers after the revision number as shown in my example. Thanks
Provide REPRESENTATIVE data. Your example all showed only 1 character following the R. So the implication is that there is only one character following.
You are right. The revision numbers are single digits just shown in the example.
@mmhxc5 wrote:
Hi,
I can not share the data due to confidentiality. I tested your code and it removes everything after R. I just want to remove the revision number beside R but not the numbers after the revision number as shown in my example. Thanks
I didn't ask for your confidential data. I wanted the data you DID show as working SAS data step code. As shown, it appears that the number you want is a different variable (different column) which would be unaffected by SUBSTR(). So, to avoid this type of confusion, from now on you need to provide data as working SAS data step code.
Oh, Excuse me for the confusion my data made. The data I showed in my original post is just one column of a character variable not several variables.
@mmhxc5 wrote:
Oh, Excuse me for the confusion my data made. The data I showed in my original post is just one column of a character variable not several variables.
Which is one of the reasons we say that pictures of data are not completely descriptive of the values shown. Data step code is.
Make some dummy data that looks like your data. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.
Here is how to present example data. Does this represent the data you are trying to change?
data have;
  input string $20.;
cards;
A 606R1   3
A 607R2   1
A 607R3   1
A 607R1   1
A 607R    1
;
data want;
  input string $20.;
cards;
A 606   3
A 607   1
A 607   1
A 607   1
A 607   1
;If so then there is a space after the R in the last observation that you also want to remove so you can just remove the R and the following character.
data test;
  set have ;
  r = indexc(string,'R');
  if r then string=substr(string,1,r-1)||substrn(string,r+2);
  drop r;
run;If in reality there is not that extra space when the R is not followed by a digit then include examples for that case.
Also if there are cases where R does not exist then include those.
What about cases where R exists but it is not followed by a digit (or a space)? What do you want to do in those cases? Include them in the example data.
Thank you, @Tom for your detailed description. you have pointed great issues for the date set.I updated the data set to include the cases you mentioned.
data have;
  input string $20.;
cards;
A 606R1   3
A 607R2   1
A 607R3   1
A 607R1   1
A 607R   1
A 607   1
400080
A 200
Z 400R
R 186   1
R 186R   1 
R 186R1   1
;
data want;
  input string $20.;
cards;
A 606   3
A 607   1
A 607   1
A 607   1
A 607   1
A 607   1
400080
A 200
Z 400
R 186   1
R 186   1
R 186   1
;Yes, I would like to keep the values without an R in my want data set. For the case where an R exist but not followed by a digit, I just want to remove the R. Also, I found that there are cases where the character starts with an R which I would like to keep. I would be grateful if you revise your code to account for the new cases I explained. Thank you,
@mmhxc5 wrote:
Hi @Patrick , Yes, I tried @Ksharp Suggestion. it works fine, but for characters starting with an R that I want to keep, the code removes those Rs.
That's a simple change. Here you go.
data have;
  input string $20.;
cards;
A 606R1   3
A 607R2   1
A 607R3   1
A 607R1   1
A 607R   1
A 607   1
400080
A 200
Z 400R
R 186   1
R 186R   1 
R 186R1   1
;
data want;
  set have;
  length want $20;
  want=prxchange('s/(?<=\d)R\d*\b//',1,string);
run;
proc print;
run;
Your examples don't seem that clear, some of the ones without any 'R' to remove still have different output from the input.
Code:
data test;
  infile cards dsd truncover ;
  input (have want) ( :$20.);
  r=findc(have,'R',,2);
  if r then try=substr(have,1,r-1)||substrn(have,r+1+findc(substrn(have,r+1,1),'0123456789'));
  else try=have;
  match=try=want;
cards;
A 606R1   3 ,A 606   3
A 607R2   1 ,A 607   1
A 607R3   1 ,A 607   1
A 607R1   1 ,A 607   1
A 607R    1 ,A 607   1
A607      1 ,A 607   1
400080      ,400080 
A 200       ,A 200
Z 400R      ,Z 400
R 186     1 ,R 186   1
R 186R   1  ,R 186   1
R 186R1  1  ,R 186   1
;
proc print;
run;Result:
Obs have want r try match 1 A 606R1 3 A 606 3 6 A 606 3 1 2 A 607R2 1 A 607 1 6 A 607 1 1 3 A 607R3 1 A 607 1 6 A 607 1 1 4 A 607R1 1 A 607 1 6 A 607 1 1 5 A 607R 1 A 607 1 6 A 607 1 0 6 A607 1 A 607 1 0 A607 1 0 7 400080 400080 0 400080 1 8 A 200 A 200 0 A 200 1 9 Z 400R Z 400 6 Z 400 1 10 R 186 1 R 186 1 0 R 186 1 0 11 R 186R 1 R 186 1 6 R 186 1 1 12 R 186R1 1 R 186 1 6 R 186 1 0
Issues:
In OBS 6 and 10 there is no R to remove, but the two strings provided do not match.
And in OBS 5 you removed the space after the R but you didn't in OBS 11.
And in OBS 12 you added an extra space, so you replaced the R1 with a space instead of just removing it.
Is there a different rule you are tying to implement? Like always placing that extra digit (when present) in the 9th column, no matter how long the first part is?
@Tom, I really appreciate your help. I would like to attach the original data for you. If you scroll the data you will find the R and the digits following it. Could you please check your code for this original data? Thanks,
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
