Hi,
I am trying to see if there is match between the first 6 characters of any of my rate center names. I thought that I was using the \w character correctly, but it doesn't seem to be working. For example, if there was a rate center name XXXXXXYY and a ratecenter named XXXXXXXY, I would want a match returned. However, when I type my code, I am getting a dataset that returns all of the rate centers, and no matches.
data work.perl;
set work.ratecenter;
if _n_=1 then do;
retain re;
re=prxparse("/\w\w\w\w\w\w/");
end;
if prxmatch(re, ratecenter);
run;
If anyone could let me know what my issue is, and why I am not seeing a return of the rate center names that match the first 6 letters, please let me know.
Thanks!
Thank you! This explains at least part of the issue that I am having. Also, to provide some example, every name in my ratecenter variable is 8 characters long, and I want to see if there are any matches between the first 6 characters (out of 😎 for any of the rate center names.
For example, ratecenter takes on the following form:
ratecenter
XXXXXXYY
XYZXYZXY
YYYYYYYY
YYYYYYZZ
My hopes is that my function will return 'YYYYYYYY' and 'YYYYYYZZ' as ratecenter names that have a match for the first 6 characters.
I should also note that I'm not searching for a particular string - I need to know if there is a string that match any other strings. I have over 900 rate center names. I'm not sure if I can use substr() unless I have a particular string in mind that I am searching for, which I do not have.
Something like this might get you started. This will tend to create duplicates because if center a matches center b then b matches a.
proc sql;
create table matches as
select distrinct a.ratecenter as center1, b.ratecenter as center2
from (select distict ratecenter from have) as a left join
(select distict ratecenter from have) as b
on substr(a.ratecenter,1,6) = substr(b.ratecenter,1,6)
where a.ratecenter ne b.ratecenter;
quit;
Would this perhaps be easier:
proc freq data=ratecenter;
tables ratecenter;
format ratecenter $6.;
run;
That will give you a table with counts of how many times each six-character value appears in the data.
To compliment the method from Astounding, which is the simplist method I can come up with, the following will provide you all the duplicates with the matches attached
data have;
input ratecenter $8.;
cards;
XXXXXXYY
XYZXYZXY
YYYYYYYY
YYYYYYZZ
;
run;
data v_have / view=v_have;
set have;
common6= substr(ratecenter,1,6);
proc sort data=v_have out=_want nounikey; by common6;
run;
proc transpose data=_want out=want(drop=_:) prefix=ratecenter;
var ratecenter;
by common6;
run;
Obs common6 ratecenter1 ratecenter2 1
YYYYYY | YYYYYYYY | YYYYYYZZ |
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.