how can you use regular expression in sas to select the name of the website from a data and display it.
Examples:
A1 = google.com the result should be equal to: google
A2 = http://twitter.com/Marko_met_een_K/status/1725797169897021653 the result should be equal to: twitter
A3 = https://regioonline.nl/regio-den-bosch/schade-aan-stuw-lith/ then the result should be equal to: regioonline
A4 = https://www.aa.com/en/how-to-regex?id=123 the result should be equal to: aa
The following code generated by chatGPT using prompts: Using SAS code <copy/paste your question>
The chatGPT returned code required only one small fix to make it work.
data websites;
input url :$100.;
datalines;
google.com
http://twitter.com/Marko_met_een_K/status/1725797169897021653
https://regioonline.nl/regio-den-bosch/schade-aan-stuw-lith/
https://www.aa.com/en/how-to-regex?id=123
;
run;
data extracted_names;
set websites;
/* Use PRX to define a regex pattern to extract the website name */
retain pattern;
if _N_ = 1 then pattern = prxparse('/(?:https?:\/\/)?(?:www\.)?([^\/\.]+)\./');
/* Apply the regex to the url and store the result in website_name */
if prxmatch(pattern, url) then do;
call prxsubstr(pattern, url, start_pos);
website_name = prxposn(pattern, 1, url);
end;
/* Keep only the relevant columns */
keep url website_name;
run;
proc print data=extracted_names noobs;
title "Extracted Website Names";
run;
The following code generated by chatGPT using prompts: Using SAS code <copy/paste your question>
The chatGPT returned code required only one small fix to make it work.
data websites;
input url :$100.;
datalines;
google.com
http://twitter.com/Marko_met_een_K/status/1725797169897021653
https://regioonline.nl/regio-den-bosch/schade-aan-stuw-lith/
https://www.aa.com/en/how-to-regex?id=123
;
run;
data extracted_names;
set websites;
/* Use PRX to define a regex pattern to extract the website name */
retain pattern;
if _N_ = 1 then pattern = prxparse('/(?:https?:\/\/)?(?:www\.)?([^\/\.]+)\./');
/* Apply the regex to the url and store the result in website_name */
if prxmatch(pattern, url) then do;
call prxsubstr(pattern, url, start_pos);
website_name = prxposn(pattern, 1, url);
end;
/* Keep only the relevant columns */
keep url website_name;
run;
proc print data=extracted_names noobs;
title "Extracted Website Names";
run;
Thank you for your quick response i really appreciate it
Why you have to use PRX ? using classic sas function would be a lot easy.
data websites;
input url :$100.;
datalines;
google.com
http://twitter.com/Marko_met_een_K/status/1725797169897021653
https://regioonline.nl/regio-den-bosch/schade-aan-stuw-lith/
https://www.aa.com/en/how-to-regex?id=123
;
run;
data want;
set websites;
temp=scan(substrn(url,find(url,'//')),1,'/');
if scan(temp,1,'.')='www' then want=scan(temp,2,'.');
else want=scan(temp,1,'.');
run;
thank you for your response.
That is very nice of you.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.