BookmarkSubscribeRSS Feed
Rohit_1990
Calcite | Level 5
Hi Experts,

In the below dataset I have identical words but they do not have any delimiter between them.
I want to separate them by putting a save
For example if value is 'SteveSteve is readingreading' then I want output to be 'Steve Steve is reading reading'

Also I want to extract values of it matches a particular pattern.

Say if for a substring previous substring is 'abc' and next substring is 5 digit number then I want to extract that substring along with 'abc'


So for example in string
' 123 srt abc wer 12345'
I want to extract ' abc wer' as it matches my condition aforementioned.

Regards,
Rohit

6 REPLIES 6
PGStats
Opal | Level 21

Please give some more examples of input and output.

PG
Rohit_1990
Calcite | Level 5
Hi ,

Thanks for you r
Rohit_1990
Calcite | Level 5
Well in the 1st case
Let's say I have a dataset have with column C1
C1
'he is walkingwalking'
'ron was sleeping at thatthat time'

So basically I want to see if two identical words have got stuck together and if yes then try to separate them

So my output would be like thid

C1
he is walking walking'
'ron was sleeping at that that time'

So walking and walking was stuck together but are identical so I want them to be separated.


In my second case.
I want to check if a pattern is matched then pull out that string.

So my pattern is 'abc' followed by any word followed by 5 digit number.


In dataset B column A1 has value ad
A1
' srt abc- rty 50987'
' ftu ght abc trying 76543'

So I want to extract
abc- rty 50987
And trying 76543

I hope this might help you.

Thanks a ton !!!!


Shmuel
Garnet | Level 18

You can loop across words like in:

 

c1 = "heis walkingwalking";

do i=1 to countw(c1);
     word = scan(c1,i);
     put i= word=;
end;

then check each word does it contain pair number of characters then check is first half same as second half

 - using length() and substr() functions.

But what about "aa" - is it a double "a" ? or "walkingwalkimg" a double word with some typo ?

 

As to the second request you need check like:

c1 = 'srt abc- rty 50987';
do i=1 to countw(c1); word = scan(c1,i);
if word='abc' then do;
tmp = combl(c1,'kd'); /* compress blank, keep digits */
if tmp ne ' ' then
do j=1 to (coutw(tmo));
num_str = scan(tmp,j);
if length(numstr) = 5 and
index(c1,numstr) > index(c1,'abc')
then wanted = substr(c1,index(c1));
end;
end;

Try to create a test code of above. In case of issues post
your code, the log and point what issues you have.

 

 

 

 

 

 

Patrick
Opal | Level 21

@Rohit_1990 

Regular Expressions are what you're looking for when it comes to dealing with text patterns.

I wasn't really sure about your 2nd pattern: Is this pattern now require as first word abc or any string that starts with abc? Given the dash you've had in your first sample I've been going for the 2nd option - and string starting with abc.

 

data have;
  infile datalines truncover;
  input str $255.;
  datalines;
he is walkingwalking ww walkingwalkings
ron was sleeping at thatthat time
srt abc- rty 50987
ftu ght abc trying 76543
;
run;

data want;
  set have;
  str_new=str;
  /* add blank between repeated string of at least two characters per repetition */
  str_new=prxchange('s/\b(\w{2,99})(\1)\b/\1 \2/oi',-1,strip(str_new));
  /* extract pattern */
  str_new=prxchange('s/^.*(\babc\S*\s+\w+\s+\d{5}\b).*$/\1/oi',-1,strip(str_new));
run;

 

 

The metacharacters used in the RegEx are documented here:

http://support.sas.com/documentation/cdl/en/lefunctionsref/63354/HTML/default/viewer.htm#p0s9ilagexm...

PGStats
Opal | Level 21

I would simplify to:

 

data want;
  set have;
  /* add blank between repeated string of at least two characters per repetition */
  str_new=prxchange('s/\b(\w{2,})\1\b/\1 \1/oi', -1, str);
run;
PG

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1430 views
  • 2 likes
  • 4 in conversation