- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am trying to see if there is match between the first 6 characters of any of my rate center names. I thought that I was using the \w character correctly, but it doesn't seem to be working. For example, if there was a rate center name XXXXXXYY and a ratecenter named XXXXXXXY, I would want a match returned. However, when I type my code, I am getting a dataset that returns all of the rate centers, and no matches.
data work.perl;
set work.ratecenter;
if _n_=1 then do;
retain re;
re=prxparse("/\w\w\w\w\w\w/");
end;
if prxmatch(re, ratecenter);
run;
If anyone could let me know what my issue is, and why I am not seeing a return of the rate center names that match the first 6 letters, please let me know.
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
PRX functions would not be an appropriate choice of function for what you are explaining you want to do.
To provide you with some better help it would be good for you to include some example of your data and structure of 'work.ratecenter' You can use 'substr(ratecenter,1,6)' to get the first 6 chars and then use that to perform your comparison for other 'ratecenter'
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you! This explains at least part of the issue that I am having. Also, to provide some example, every name in my ratecenter variable is 8 characters long, and I want to see if there are any matches between the first 6 characters (out of 😎 for any of the rate center names.
For example, ratecenter takes on the following form:
ratecenter
XXXXXXYY
XYZXYZXY
YYYYYYYY
YYYYYYZZ
My hopes is that my function will return 'YYYYYYYY' and 'YYYYYYZZ' as ratecenter names that have a match for the first 6 characters.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I should also note that I'm not searching for a particular string - I need to know if there is a string that match any other strings. I have over 900 rate center names. I'm not sure if I can use substr() unless I have a particular string in mind that I am searching for, which I do not have.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Something like this might get you started. This will tend to create duplicates because if center a matches center b then b matches a.
proc sql;
create table matches as
select distrinct a.ratecenter as center1, b.ratecenter as center2
from (select distict ratecenter from have) as a left join
(select distict ratecenter from have) as b
on substr(a.ratecenter,1,6) = substr(b.ratecenter,1,6)
where a.ratecenter ne b.ratecenter;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Would this perhaps be easier:
proc freq data=ratecenter;
tables ratecenter;
format ratecenter $6.;
run;
That will give you a table with counts of how many times each six-character value appears in the data.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
To compliment the method from Astounding, which is the simplist method I can come up with, the following will provide you all the duplicates with the matches attached
data have;
input ratecenter $8.;
cards;
XXXXXXYY
XYZXYZXY
YYYYYYYY
YYYYYYZZ
;
run;
data v_have / view=v_have;
set have;
common6= substr(ratecenter,1,6);
proc sort data=v_have out=_want nounikey; by common6;
run;
proc transpose data=_want out=want(drop=_:) prefix=ratecenter;
var ratecenter;
by common6;
run;
Obs common6 ratecenter1 ratecenter2 1
YYYYYY | YYYYYYYY | YYYYYYZZ |