BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
melassiri
Fluorite | Level 6

how can you use regular expression in sas to select the name of the website from a data and display it.

Examples:

A1 = google.com   the result should be equal to:  google

A2 = http://twitter.com/Marko_met_een_K/status/1725797169897021653  the result should be equal to:  twitter

A3 = https://regioonline.nl/regio-den-bosch/schade-aan-stuw-lith/   then the result should be equal to:  regioonline 

A4 = https://www.aa.com/en/how-to-regex?id=123   the result should be equal to:   aa

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

The following code generated by chatGPT using prompts: Using SAS code <copy/paste your question>

The chatGPT returned code required only one small fix to make it work. 

data websites;
    input url :$100.;
    datalines;
google.com
http://twitter.com/Marko_met_een_K/status/1725797169897021653
https://regioonline.nl/regio-den-bosch/schade-aan-stuw-lith/
https://www.aa.com/en/how-to-regex?id=123
;
run;

data extracted_names;
    set websites;
    /* Use PRX to define a regex pattern to extract the website name */
    retain pattern;
    if _N_ = 1 then pattern = prxparse('/(?:https?:\/\/)?(?:www\.)?([^\/\.]+)\./');

    /* Apply the regex to the url and store the result in website_name */
    if prxmatch(pattern, url) then do;
        call prxsubstr(pattern, url, start_pos);
        website_name = prxposn(pattern, 1, url);
    end;

    /* Keep only the relevant columns */
    keep url website_name;
run;

proc print data=extracted_names noobs;
    title "Extracted Website Names";
run;

Patrick_0-1726702422278.png

 

View solution in original post

4 REPLIES 4
Patrick
Opal | Level 21

The following code generated by chatGPT using prompts: Using SAS code <copy/paste your question>

The chatGPT returned code required only one small fix to make it work. 

data websites;
    input url :$100.;
    datalines;
google.com
http://twitter.com/Marko_met_een_K/status/1725797169897021653
https://regioonline.nl/regio-den-bosch/schade-aan-stuw-lith/
https://www.aa.com/en/how-to-regex?id=123
;
run;

data extracted_names;
    set websites;
    /* Use PRX to define a regex pattern to extract the website name */
    retain pattern;
    if _N_ = 1 then pattern = prxparse('/(?:https?:\/\/)?(?:www\.)?([^\/\.]+)\./');

    /* Apply the regex to the url and store the result in website_name */
    if prxmatch(pattern, url) then do;
        call prxsubstr(pattern, url, start_pos);
        website_name = prxposn(pattern, 1, url);
    end;

    /* Keep only the relevant columns */
    keep url website_name;
run;

proc print data=extracted_names noobs;
    title "Extracted Website Names";
run;

Patrick_0-1726702422278.png

 

melassiri
Fluorite | Level 6

Thank you for your quick response i really appreciate it

Ksharp
Super User

Why you have to use PRX ? using classic sas function would be a lot easy.

 

data websites;
    input url :$100.;
    datalines;
google.com
http://twitter.com/Marko_met_een_K/status/1725797169897021653
https://regioonline.nl/regio-den-bosch/schade-aan-stuw-lith/
https://www.aa.com/en/how-to-regex?id=123
;
run;
data want;
 set websites;
temp=scan(substrn(url,find(url,'//')),1,'/');
if scan(temp,1,'.')='www' then want=scan(temp,2,'.');
 else want=scan(temp,1,'.');
run;
melassiri
Fluorite | Level 6

thank you for your response.

That is very nice of you.

 

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1954 views
  • 2 likes
  • 3 in conversation