BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ilikesas
Barite | Level 11

Hi,

 

suppose I have a data set with datalines in the following manner:

 

the first site is www.abc.com

www.123.com is the second website.

 

I am trying to figure out how to extract the websites, that is, the part of the string which is between (and including) "www." and ".com"

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
mohamed_zaki
Barite | Level 11

one way

data have;
input x $37.;
cards;
the first site is www.abc.com
www.187878723.com is the second website.
www.computer.com is the second website.
;
run;

data want ;
set have;
www=index(x, "www.");
com=index(substr(x,www+4), ".com");
if www and com then website=substr(x,www,com+7);
drop www com;
run;

View solution in original post

7 REPLIES 7
mohamed_zaki
Barite | Level 11

one way

data have;
input x $37.;
cards;
the first site is www.abc.com
www.187878723.com is the second website.
www.computer.com is the second website.
;
run;

data want ;
set have;
www=index(x, "www.");
com=index(substr(x,www+4), ".com");
if www and com then website=substr(x,www,com+7);
drop www com;
run;
mohamed_zaki
Barite | Level 11

updated to handle more cases

ilikesas
Barite | Level 11

Hi Mohamed,

 

thank you for answering my question, everything works nicely!

 

Just on the sidenote, if I add a dataline " a pseudo site www. name .com", the code will still select "www. name .com" into want, but it isn't a real website becasue of the space after www. and before .com

So is there a way to avoide it by specifying that right after www. and right before .com there should be a character?

 

Thank you! 

mohamed_zaki
Barite | Level 11
if www and com then website=compress(substr(x,www,com+7));
mohamed_zaki
Barite | Level 11

Do you still want to extreact it correctly? .... or to consider it wrong and neglect it?

ilikesas
Barite | Level 11

I ran the code with your new input and understand that it corrects it.

 

Could you please also show me the option to neglect such a case?

 

thnak you!

mohamed_zaki
Barite | Level 11
data want ;
set have;
www=index(x, "www.");
com=index(substr(x,www+4), ".com");
website=substr(x,www,com+7);
if www and com;
if index(trim(website),' ')> 0 then website="";
drop www com;
run;

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 3674 views
  • 3 likes
  • 2 in conversation