Hello,
I can't find any examples that help me solve my prxchange problem where I want to remove a string up to including the search term. For example, I've got "Good Investments Trust 12 Moon St London" but I want it to search on the word "Trust" and remove everything up to and including Trust, so that I just have "12 Moon St London" remaining. My attempt below returns "Trust 12 Moon St London" so I've only been successful at removing the words before the word Trust, but it still returns the search term itself. Note that the trust names vary in length, so it might be named "Bad Trust" or "A trust with a very long name Trust." Thanks for your help.
data address; infile datalines dlm=',' dsd truncover; input Mail_addr1 :$30. City :$10. ; datalines; Good Investments Trust 12 Moon St, London Smith Family Trust 3 Hotel St, Paris ; data prx; set address; Address=cats(Mail_addr1, City); corrected_address =prxchange("s/.*?(Trust.*)/$1/",1, Address); drop Mail_addr1 City; run;
You only need to match the substring that you want to change:
corrected_address = prxchange("s/.*\bTrust\s*//i", 1, Address);
Note The "i" suffix requests case insensitive match.
Why not use next code:
corrected_address = substr(address,find(address,"trust","i")+5);
You are right, @Ksharp , I have edited my proposed code.
@Buzzy_Bee: is it now and for all times secured that the word "trust" appears only once per observation? What should happen, if there is
Trustowsky Trust 42 Trust Street, Little Rock
in your data?
Good point Andreas. I should have mentioned that trusts tend to choose sensible names where the term 'trust' only appears once in the name, such as "Brown Family Trust" rather than "A trust with a long name Trust."
However, there are some cases where an address is entered as "C/o The Trustees, Brown Family Trust" so it is definitely possible to see the word trust appear twice in the address.
Can you clarify if "Trust" and "trust" is a different word?
e.g.
Case:If "trust" is a word that has to be replaced.
Before: After
Good Investments trust 12 Moon St London ---> 12 Moon St London
Case:If "trust" is not
Before: After
Good Investments trust 12 Moon St London ---> (nothing appears)
If you insist using prx functions, for your specific example, check next code:
data _null_;
adrs = "Good Investments Trust 12 Moon St London";
ad1 = substr(adrs,findw(adrs,"trust","i")+5);
ad2 = prxchange("s/.*?(Trust.*)/$1/",1, adrs);
prx_id= prxparse("/.*?(Trust.*)/");
ad3 = prxchange("s/Trust/ /",1,ad2);
put ad1 = / ad2= / ad3=;
run;
and be aware of all previous given remarks.
Next code is not complete but is worth to run and check:
data test;
input adrs $50.;
cards;
Good Investments Trust 12 Moon St, London
Smith Family Trust 3 Hotel St, Paris
Trustowsky Trust 42 Trust Street, Little Rock
Good Investments trust 12 Moon St London
;
run;
data check;
obs = _N_;
set test;
ad2 = prxchange("s/.*([T]|[t]rust.){1}/$1/",1,adrs);
prx_id= prxparse("/.*(.{5})/");
ad3 = prxchange("s/.{5}/ /",1,ad2);
ad4 = prxchange("s/(.*)(?=\bTrust)//i",1, adrs);
drop prx_id;
run;
data address; infile datalines dlm=',' dsd truncover; input Mail_addr1 :$30. City :$10. ; datalines; Good Investments Trust 12 Moon St, London Smith Family Trust 3 Hotel St, Paris ; data prx; set address; Address=cats(Mail_addr1, City); corrected_address =prxchange("s/(.*)(?=\bTrust)//i",1, Address); drop Mail_addr1 City; run;
Hi KSharp, your version produces the same problems as mine when I run it.
The result is: "Trust 12 Moon St London" so it still has Trust at the front.
You only need to match the substring that you want to change:
corrected_address = prxchange("s/.*\bTrust\s*//i", 1, Address);
Note The "i" suffix requests case insensitive match.
Thank you - that is the right prxchange formula that I was trying to create.
I've got thousands of companies with names like "Very Amazing Investments Trust" and "Serious Lawyers Limited" so this prxchange is exactly what I needed as I can just change the term 'Trust' to 'Limited' and then change it to any of the other company suffixes that appear in the list. Your method means I only need to change the search term once in the formula.
Hi, @PGStats , may I ask how this "s/.*\bTrust\s*//i" works? Is it just like "s/.*\bTrust\s(.*?)//i" ?
Sure, here is how it reads:
s/
.* Match anything, any length
\bTrust Match a word starting with Trust
\s* Match any number of spaces, including none
// Replace the whole matched substring (if any) with nothing
i Make the matching case-insensitive
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.