how can you use regular expression in sas to select the name of the website from a data and display it.
Examples:
A1 = google.com the result should be equal to: google
A2 = http://twitter.com/Marko_met_een_K/status/1725797169897021653 the result should be equal to: twitter
A3 = https://regioonline.nl/regio-den-bosch/schade-aan-stuw-lith/ then the result should be equal to: regioonline
A4 = https://www.aa.com/en/how-to-regex?id=123 the result should be equal to: aa
The following code generated by chatGPT using prompts: Using SAS code <copy/paste your question>
The chatGPT returned code required only one small fix to make it work.
data websites;
input url :$100.;
datalines;
google.com
http://twitter.com/Marko_met_een_K/status/1725797169897021653
https://regioonline.nl/regio-den-bosch/schade-aan-stuw-lith/
https://www.aa.com/en/how-to-regex?id=123
;
run;
data extracted_names;
set websites;
/* Use PRX to define a regex pattern to extract the website name */
retain pattern;
if _N_ = 1 then pattern = prxparse('/(?:https?:\/\/)?(?:www\.)?([^\/\.]+)\./');
/* Apply the regex to the url and store the result in website_name */
if prxmatch(pattern, url) then do;
call prxsubstr(pattern, url, start_pos);
website_name = prxposn(pattern, 1, url);
end;
/* Keep only the relevant columns */
keep url website_name;
run;
proc print data=extracted_names noobs;
title "Extracted Website Names";
run;
The following code generated by chatGPT using prompts: Using SAS code <copy/paste your question>
The chatGPT returned code required only one small fix to make it work.
data websites;
input url :$100.;
datalines;
google.com
http://twitter.com/Marko_met_een_K/status/1725797169897021653
https://regioonline.nl/regio-den-bosch/schade-aan-stuw-lith/
https://www.aa.com/en/how-to-regex?id=123
;
run;
data extracted_names;
set websites;
/* Use PRX to define a regex pattern to extract the website name */
retain pattern;
if _N_ = 1 then pattern = prxparse('/(?:https?:\/\/)?(?:www\.)?([^\/\.]+)\./');
/* Apply the regex to the url and store the result in website_name */
if prxmatch(pattern, url) then do;
call prxsubstr(pattern, url, start_pos);
website_name = prxposn(pattern, 1, url);
end;
/* Keep only the relevant columns */
keep url website_name;
run;
proc print data=extracted_names noobs;
title "Extracted Website Names";
run;
Thank you for your quick response i really appreciate it
Why you have to use PRX ? using classic sas function would be a lot easy.
data websites;
input url :$100.;
datalines;
google.com
http://twitter.com/Marko_met_een_K/status/1725797169897021653
https://regioonline.nl/regio-den-bosch/schade-aan-stuw-lith/
https://www.aa.com/en/how-to-regex?id=123
;
run;
data want;
set websites;
temp=scan(substrn(url,find(url,'//')),1,'/');
if scan(temp,1,'.')='www' then want=scan(temp,2,'.');
else want=scan(temp,1,'.');
run;
thank you for your response.
That is very nice of you.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.