Help using Base SAS procedures

how can I translate html embedded in strings?

Accepted Solution Solved
Reply
Super Contributor
Posts: 337
Accepted Solution

how can I translate html embedded in strings?

All,

I have a data set that with embedded htlm in strings. Can I use some SAS function to translate that html?

Data set example:

data posts;

   length id $5 username $10 post $250;

   infile datalines delimiter=',';

   input id $ username $ post $;

   datalines;

338,"jessykate",<a class=\"mention\" href=\"/users/tim\">@tim<\/a> AHA!! thank you! i will try that Smiley Very Happy

223,"chris",Hi All <\/p>\n\n<p>Quick update on discourse - File uploads now work - Upload away!

1017,"ralfe",Hi <\/p>\n\n<p>I quite like <a class=\"mention\" href=\"/users/ahnjune\">@ahnjune<\/a> 's suggestion. If you go to <a href=\"https://p2pu.org\">https://p2pu.org<\/a>  then click on the drop-down arrow to the right of your profile

;

Concrete example, after applying some SAS magic, I would like the string in the first observation "<a class=\"mention\" href=\"/users/tim\">@tim<\/a>" to get translated to "@tim".

I read about libname OLEDB here () from and but I did not get very far.

Can I borrow some code to quick fidx my data set?

Thanks,

Miguel


Accepted Solutions
Solution
‎04-27-2015 11:12 AM
SAS Super FREQ
Posts: 708

Re: how can I translate html embedded in strings?

Posted in reply to M_Maldonado

Hi Miguel

You can make use of Perl Regular Expression functions to do what you want. For the case you are interested you can make use of the PRXCHANGE function See sample below. The HTML tags are placed by "*". As I am not the Regex expert, I search for what I want to do, and then adapt the Regex to the appropriate SAS PRX... function.

data posts;
  length id $5 username $10 post $250;
 
infile datalines delimiter=',';
 
input id $ username $ post $;
  post2 = prxchange("s/<[^>]*>/*/ ", -1, post);
datalines4;
338,"jessykate",<a class=\"mention\" href=\"/users/tim\">@tim<\/a> AHA!! thank you! i will try that Smiley Very Happy.
223,"chris",Hi All <\/p>\n\n<p>Quick update on discourse - File uploads now work - Upload away!
1017,"ralfe",Hi <\/p>\n\n<p>I quite like <a class=\"mention\" href=\"/users/ahnjune\">@ahnjune<\/a> 's suggestion. If you go to <a href=\"https://p2pu.org\">https://p2pu.org<\/a>  then click on the drop-down arrow to the right of your profile
;;;;

proc print;
run;

View solution in original post


All Replies
SAS Super FREQ
Posts: 8,866

Re: how can I translate html embedded in strings?

Posted in reply to M_Maldonado

HI:

    So you explained what you want for the first row, can you explain what you would expect to get on the 2nd row and the 3rd row, too??? I do not believe you really need OLEDB, If all you are doing is extracting the string BEFORE the brackets, then you can probably use the SCAN or PRX functions.

338,"jessykate",<a class=\"mention\" href=\"/users/tim\">@tim<\/a> AHA!! thank you! i will try that Smiley Very Happy.

223,"chris",Hi All <\/p>\n\n<p>Quick update on discourse - File uploads now work - Upload away!

1017,"ralfe",Hi <\/p>\n\n<p>I quite like <a class=\"mention\" href=\"/users/ahnjune\">@ahnjune<\/a> 's suggestion. If you go to <a href=\"https://p2pu.org\">https://p2pu.org<\/a>  then click on the drop-down arrow to the right of your profile

Would you want @ahnjune for the 3rd row??? but what about the 2nd row of data????
cynthia

Super Contributor
Posts: 337

Re: how can I translate html embedded in strings?

Posted in reply to Cynthia_sas

Hi Cynthia,

Ideally I want to avoid writing a regular expression or a string code for each case.

I was wondering if we have some way to translate html text directly. Otherwise I need to come up with all these rules myself. This data set is quite large...

Examples of rules that I would need to come up with, but are too many to even try :smileyplain:

StringTranslates to
<a class=\"mention\" href=\"/users/FOO\">@FOO<\/a>@FOO
<\/p>\n\n<p>" "

<a href=\"https://FOO.ORG\">https://FOO.org<\/a>

FOO.org

thanks,

M

Solution
‎04-27-2015 11:12 AM
SAS Super FREQ
Posts: 708

Re: how can I translate html embedded in strings?

Posted in reply to M_Maldonado

Hi Miguel

You can make use of Perl Regular Expression functions to do what you want. For the case you are interested you can make use of the PRXCHANGE function See sample below. The HTML tags are placed by "*". As I am not the Regex expert, I search for what I want to do, and then adapt the Regex to the appropriate SAS PRX... function.

data posts;
  length id $5 username $10 post $250;
 
infile datalines delimiter=',';
 
input id $ username $ post $;
  post2 = prxchange("s/<[^>]*>/*/ ", -1, post);
datalines4;
338,"jessykate",<a class=\"mention\" href=\"/users/tim\">@tim<\/a> AHA!! thank you! i will try that Smiley Very Happy.
223,"chris",Hi All <\/p>\n\n<p>Quick update on discourse - File uploads now work - Upload away!
1017,"ralfe",Hi <\/p>\n\n<p>I quite like <a class=\"mention\" href=\"/users/ahnjune\">@ahnjune<\/a> 's suggestion. If you go to <a href=\"https://p2pu.org\">https://p2pu.org<\/a>  then click on the drop-down arrow to the right of your profile
;;;;

proc print;
run;
Super Contributor
Posts: 337

Re: how can I translate html embedded in strings?

Posted in reply to Bruno_SAS

Hi Bruno,

I could not find a proc that handles html gracefully. But the regular expressions were not as bad as I thought.

You get full credit for your RegEx! it is way better than mine! And it does 99% of the job. I still get weird strings like \n\n but they are easy to remove with some SAS code.

Thanks again!

Miguel

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 238 views
  • 0 likes
  • 3 in conversation