03-08-2016 02:06 PM - edited 03-08-2016 02:33 PM
I am trying to see if there is match between the first 6 characters of any of my rate center names. I thought that I was using the \w character correctly, but it doesn't seem to be working. For example, if there was a rate center name XXXXXXYY and a ratecenter named XXXXXXXY, I would want a match returned. However, when I type my code, I am getting a dataset that returns all of the rate centers, and no matches.
if _n_=1 then do;
if prxmatch(re, ratecenter);
If anyone could let me know what my issue is, and why I am not seeing a return of the rate center names that match the first 6 letters, please let me know.
03-08-2016 02:33 PM
03-08-2016 02:39 PM
Thank you! This explains at least part of the issue that I am having. Also, to provide some example, every name in my ratecenter variable is 8 characters long, and I want to see if there are any matches between the first 6 characters (out of 8) for any of the rate center names.
For example, ratecenter takes on the following form:
My hopes is that my function will return 'YYYYYYYY' and 'YYYYYYZZ' as ratecenter names that have a match for the first 6 characters.
03-08-2016 02:58 PM
I should also note that I'm not searching for a particular string - I need to know if there is a string that match any other strings. I have over 900 rate center names. I'm not sure if I can use substr() unless I have a particular string in mind that I am searching for, which I do not have.
03-08-2016 03:12 PM
Something like this might get you started. This will tend to create duplicates because if center a matches center b then b matches a.
proc sql; create table matches as select distrinct a.ratecenter as center1, b.ratecenter as center2 from (select distict ratecenter from have) as a left join (select distict ratecenter from have) as b on substr(a.ratecenter,1,6) = substr(b.ratecenter,1,6) where a.ratecenter ne b.ratecenter; quit;
03-08-2016 03:07 PM
Would this perhaps be easier:
proc freq data=ratecenter;
format ratecenter $6.;
That will give you a table with counts of how many times each six-character value appears in the data.
03-08-2016 03:11 PM - edited 03-08-2016 03:12 PM
To compliment the method from Astounding, which is the simplist method I can come up with, the following will provide you all the duplicates with the matches attached
data have; input ratecenter $8.; cards; XXXXXXYY XYZXYZXY YYYYYYYY YYYYYYZZ ; run; data v_have / view=v_have; set have; common6= substr(ratecenter,1,6); proc sort data=v_have out=_want nounikey; by common6; run; proc transpose data=_want out=want(drop=_:) prefix=ratecenter; var ratecenter; by common6; run;
Obs common6 ratecenter1 ratecenter2 1