BookmarkSubscribeRSS Feed
mitchell_keener
Obsidian | Level 7

Hi, 

 

I am using SAS 9.4 (TS1M5). I am trying to use the SUBSTR function to redact text when given the positions that the text lies and what to replace the text with. I cannot use TRANWRD as I do not want to replace all occurrences of the word, only the text at the location provided. My problem is that occasionally the text that I need to replace with is longer than the text being replaced which results in my new text being truncated. I have provided a code which has a very basic example of the problem I am having and what the output looks like now in the image below. I would like 'Kim' to be replaced with the entire string '[Name]' instead of '[Na'. If you know how to use Pearl functions to solve this problem, that could work. I just have never used Pearl functions before but this could be a good opportunity to learn if that is the best solution here. In my example, there are two different pieces of text that need redacted. In reality this could be up to 12 instances from a single piece of text but solving this example should suffice. You can see the method does fine when the text being redacted is longer than the text being inserted, just not the other way around. Please ignore if my datalines is sloppy, it works here and I really do not use this method of data entry often. 

 

My name is Kim.PNG

data example;
infile datalines truncover dlm=',';
   input text $40. begin1 begin2 end1 end2 type1 $5. type2 $5.;
   datalines;
My name is Kim and my birthday is 6/5/90,12,35,3,6,Name Date
;
run;

data test;
set example;
	array type {*} type:;
	array begin {*} begin:;
	array end {*} end:;
	new_text=text;
	do i = 1 to dim(type);
		if type{i}^='' then do;
			BEGIN_POS=begin{i};
			END_POS=end{i};
			substr(new_text, BEGIN_POS, END_POS)='['||strip(type{i})||']'; 
		end;
	end;
run;

 

4 REPLIES 4
PaigeMiller
Diamond | Level 26

Clearly SUBSTR() to the left of the equal sign will not work in the case where the replacement string is a different length than the original string.

 

You probably want to do a concatenation, something like this

 

new_text=trim(substr(new_text,1,begin_pos-1))||strip(type{i})||trim(substr(new_text,end_pos+1));

 

 

--
Paige Miller
mitchell_keener
Obsidian | Level 7

Thanks for the reply. I am going to have to tinker with this more and possibly revisit the question once I explore more. I realized the solution I have was only working as well as it was because it either 1) was replacing the old text using part of the new text which allowed the length of the old text to remain intact for subsequent redaction (if the old text was shorter than the new text), or 2) if the old text was longer than the new text, it was leaving all of the blanks where the old, now blank, characters had been. This is an important component that I had not fully considered until I tried your approach. Once the original length of the text gets altered, the additional redaction needed is impossible to predict using the begin and end variables. 

 

 

PaigeMiller
Diamond | Level 26

You can add a length statement to the data step as the first line under DATA to make the length of the variable longer, and then no truncation on the end.

--
Paige Miller
KachiM
Rhodochrosite | Level 12

Hi @mitchell_keener ,

 

It is easy to find the length of the new string (newL) you want to have. It is equal to

 

   newL = length(text)-end1+length(type1)-end2+length(type2);

Then you can have a new_text with the length of newL to do the concatenation in steps as:

 

[1] new_text = 'My name is', type1;
[2] new_text = new_text, 'and my birthday is';
[3] new_text = new_text, type2; 

You can modify your program using this logic. Actually you can combine the 3 statements as one.

 

Here is one simple way for you. This assumes that TEXT ends with the last string beginning with END2 (35).

 

data want;
   set example;
   array type  type:;
   array begin begin:;
   array end end:;
   length new_text $ &newL;
   call catx(' ', new_text, substr(text,1,begin1-2),type1);
   call catx(' ', new_text, substr(text, begin1+end1, begin2-1-begin1-end1));
   call catx(' ', new_text, type2);
keep new_text;
run;

 

Edited:

 

The newL can be obtained as part of your Input Data Set which is shown under:

 

data example;
infile datalines truncover dlm=',';
   input text $40. begin1 begin2 end1 end2 type1 $5. type2 $5.;
   newL = length(text)-end1+length(type1)-end2+length(type2);
   if _N_ = 1 then call symputx('newL', newL);
   datalines;
My name is Kim and my birthday is 6/5/90,12,35,3,6,Name Date
;
run;

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1052 views
  • 0 likes
  • 3 in conversation