BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ilikesas
Barite | Level 11

Hi,

 

suppose I have a data set with datalines in the following manner:

 

the first site is www.abc.com

www.123.com is the second website.

 

I am trying to figure out how to extract the websites, that is, the part of the string which is between (and including) "www." and ".com"

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
mohamed_zaki
Barite | Level 11

one way

data have;
input x $37.;
cards;
the first site is www.abc.com
www.187878723.com is the second website.
www.computer.com is the second website.
;
run;

data want ;
set have;
www=index(x, "www.");
com=index(substr(x,www+4), ".com");
if www and com then website=substr(x,www,com+7);
drop www com;
run;

View solution in original post

7 REPLIES 7
mohamed_zaki
Barite | Level 11

one way

data have;
input x $37.;
cards;
the first site is www.abc.com
www.187878723.com is the second website.
www.computer.com is the second website.
;
run;

data want ;
set have;
www=index(x, "www.");
com=index(substr(x,www+4), ".com");
if www and com then website=substr(x,www,com+7);
drop www com;
run;
mohamed_zaki
Barite | Level 11

updated to handle more cases

ilikesas
Barite | Level 11

Hi Mohamed,

 

thank you for answering my question, everything works nicely!

 

Just on the sidenote, if I add a dataline " a pseudo site www. name .com", the code will still select "www. name .com" into want, but it isn't a real website becasue of the space after www. and before .com

So is there a way to avoide it by specifying that right after www. and right before .com there should be a character?

 

Thank you! 

mohamed_zaki
Barite | Level 11
if www and com then website=compress(substr(x,www,com+7));
mohamed_zaki
Barite | Level 11

Do you still want to extreact it correctly? .... or to consider it wrong and neglect it?

ilikesas
Barite | Level 11

I ran the code with your new input and understand that it corrects it.

 

Could you please also show me the option to neglect such a case?

 

thnak you!

mohamed_zaki
Barite | Level 11
data want ;
set have;
www=index(x, "www.");
com=index(substr(x,www+4), ".com");
website=substr(x,www,com+7);
if www and com;
if index(trim(website),' ')> 0 then website="";
drop www com;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 2177 views
  • 3 likes
  • 2 in conversation