BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
melassiri
Fluorite | Level 6

how can you use regular expression in sas to select the name of the website from a data and display it.

Examples:

A1 = google.com   the result should be equal to:  google

A2 = http://twitter.com/Marko_met_een_K/status/1725797169897021653  the result should be equal to:  twitter

A3 = https://regioonline.nl/regio-den-bosch/schade-aan-stuw-lith/   then the result should be equal to:  regioonline 

A4 = https://www.aa.com/en/how-to-regex?id=123   the result should be equal to:   aa

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

The following code generated by chatGPT using prompts: Using SAS code <copy/paste your question>

The chatGPT returned code required only one small fix to make it work. 

data websites;
    input url :$100.;
    datalines;
google.com
http://twitter.com/Marko_met_een_K/status/1725797169897021653
https://regioonline.nl/regio-den-bosch/schade-aan-stuw-lith/
https://www.aa.com/en/how-to-regex?id=123
;
run;

data extracted_names;
    set websites;
    /* Use PRX to define a regex pattern to extract the website name */
    retain pattern;
    if _N_ = 1 then pattern = prxparse('/(?:https?:\/\/)?(?:www\.)?([^\/\.]+)\./');

    /* Apply the regex to the url and store the result in website_name */
    if prxmatch(pattern, url) then do;
        call prxsubstr(pattern, url, start_pos);
        website_name = prxposn(pattern, 1, url);
    end;

    /* Keep only the relevant columns */
    keep url website_name;
run;

proc print data=extracted_names noobs;
    title "Extracted Website Names";
run;

Patrick_0-1726702422278.png

 

View solution in original post

4 REPLIES 4
Patrick
Opal | Level 21

The following code generated by chatGPT using prompts: Using SAS code <copy/paste your question>

The chatGPT returned code required only one small fix to make it work. 

data websites;
    input url :$100.;
    datalines;
google.com
http://twitter.com/Marko_met_een_K/status/1725797169897021653
https://regioonline.nl/regio-den-bosch/schade-aan-stuw-lith/
https://www.aa.com/en/how-to-regex?id=123
;
run;

data extracted_names;
    set websites;
    /* Use PRX to define a regex pattern to extract the website name */
    retain pattern;
    if _N_ = 1 then pattern = prxparse('/(?:https?:\/\/)?(?:www\.)?([^\/\.]+)\./');

    /* Apply the regex to the url and store the result in website_name */
    if prxmatch(pattern, url) then do;
        call prxsubstr(pattern, url, start_pos);
        website_name = prxposn(pattern, 1, url);
    end;

    /* Keep only the relevant columns */
    keep url website_name;
run;

proc print data=extracted_names noobs;
    title "Extracted Website Names";
run;

Patrick_0-1726702422278.png

 

melassiri
Fluorite | Level 6

Thank you for your quick response i really appreciate it

Ksharp
Super User

Why you have to use PRX ? using classic sas function would be a lot easy.

 

data websites;
    input url :$100.;
    datalines;
google.com
http://twitter.com/Marko_met_een_K/status/1725797169897021653
https://regioonline.nl/regio-den-bosch/schade-aan-stuw-lith/
https://www.aa.com/en/how-to-regex?id=123
;
run;
data want;
 set websites;
temp=scan(substrn(url,find(url,'//')),1,'/');
if scan(temp,1,'.')='www' then want=scan(temp,2,'.');
 else want=scan(temp,1,'.');
run;
melassiri
Fluorite | Level 6

thank you for your response.

That is very nice of you.

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 606 views
  • 2 likes
  • 3 in conversation