how can you use regular expression in sas to select the name of the website from a data and display it.
Examples:
A1 = google.com the result should be equal to: google
A2 = http://twitter.com/Marko_met_een_K/status/1725797169897021653 the result should be equal to: twitter
A3 = https://regioonline.nl/regio-den-bosch/schade-aan-stuw-lith/ then the result should be equal to: regioonline
A4 = https://www.aa.com/en/how-to-regex?id=123 the result should be equal to: aa
The following code generated by chatGPT using prompts: Using SAS code <copy/paste your question>
The chatGPT returned code required only one small fix to make it work.
data websites;
input url :$100.;
datalines;
google.com
http://twitter.com/Marko_met_een_K/status/1725797169897021653
https://regioonline.nl/regio-den-bosch/schade-aan-stuw-lith/
https://www.aa.com/en/how-to-regex?id=123
;
run;
data extracted_names;
set websites;
/* Use PRX to define a regex pattern to extract the website name */
retain pattern;
if _N_ = 1 then pattern = prxparse('/(?:https?:\/\/)?(?:www\.)?([^\/\.]+)\./');
/* Apply the regex to the url and store the result in website_name */
if prxmatch(pattern, url) then do;
call prxsubstr(pattern, url, start_pos);
website_name = prxposn(pattern, 1, url);
end;
/* Keep only the relevant columns */
keep url website_name;
run;
proc print data=extracted_names noobs;
title "Extracted Website Names";
run;
The following code generated by chatGPT using prompts: Using SAS code <copy/paste your question>
The chatGPT returned code required only one small fix to make it work.
data websites;
input url :$100.;
datalines;
google.com
http://twitter.com/Marko_met_een_K/status/1725797169897021653
https://regioonline.nl/regio-den-bosch/schade-aan-stuw-lith/
https://www.aa.com/en/how-to-regex?id=123
;
run;
data extracted_names;
set websites;
/* Use PRX to define a regex pattern to extract the website name */
retain pattern;
if _N_ = 1 then pattern = prxparse('/(?:https?:\/\/)?(?:www\.)?([^\/\.]+)\./');
/* Apply the regex to the url and store the result in website_name */
if prxmatch(pattern, url) then do;
call prxsubstr(pattern, url, start_pos);
website_name = prxposn(pattern, 1, url);
end;
/* Keep only the relevant columns */
keep url website_name;
run;
proc print data=extracted_names noobs;
title "Extracted Website Names";
run;
Thank you for your quick response i really appreciate it
Why you have to use PRX ? using classic sas function would be a lot easy.
data websites;
input url :$100.;
datalines;
google.com
http://twitter.com/Marko_met_een_K/status/1725797169897021653
https://regioonline.nl/regio-den-bosch/schade-aan-stuw-lith/
https://www.aa.com/en/how-to-regex?id=123
;
run;
data want;
set websites;
temp=scan(substrn(url,find(url,'//')),1,'/');
if scan(temp,1,'.')='www' then want=scan(temp,2,'.');
else want=scan(temp,1,'.');
run;
thank you for your response.
That is very nice of you.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.