BookmarkSubscribeRSS Feed
ChadAtkinson
Calcite | Level 5

Hello All,

I am working with data that contains comments from American college students.  There are a large number of emoticons in these comments, which I initially removed ( comment1=compress(comment,,'p'); ).  The problem is that this appears to lead to a reasonably large loss of information.  Does anyone have any code that changes emoticons to words?

Thanks,

Chad Atkinson

6 REPLIES 6
art297
Opal | Level 21

The list shown here might give you a start: List of emoticons - Wikipedia, the free encyclopedia

ChadAtkinson
Calcite | Level 5

I have been looking at the tranwrd function as a starting point for this.  The longer term goal is to generate a macro that has a set of translations.

/*input test data*/

data comments;

input f1 & $1000.;

datalines4;

We went through everything very quickly and I feel like I'm ready to make changes to my paper. Smiley Happy /*there is a blank, colon, right parens here but it gets converted by the bbs*/

Thank you so much B*****Smiley Happy

she was wonderful, i has been here since 2:00P.M and it is now 7:10. Wow, please pray for me Ms. R*****

Friendly 😃

Smiley Happy

;;;;

run;

/*use tranwrd to replace emoticon strings with words*/

data comments_test;

set comments;

length em_comment $ 1000;

em_comment=f1;

em_comment=tranwrd(f1,":)", " smile");

em_comment=tranwrd(f1,"=)", " smile");

em_comment=tranwrd(f1,";)", " smile");

run;

The issue that I have is that when I run the code, only the fifth record is changed (semicolon, right parenthesis becomes 'smile').  The others are not altered.

Any suggestions?

ballardw
Super User

try

 

em_comment=tranwrd(em_comment,":)", " smile");

em_comment=tranwrd(em_comment,"=)", " smile");

em_comment=tranwrd(em_comment,";)", " smile");


art297
Opal | Level 21

I agree with that you have to write your statements such that they don't conflict with each other, but I think the problem you are running into goes well beyond that.

The emoticons you are trying to capture are gifs .. not text, thus they don't copy and paste when you try to bring them into your code.

Hopefully someone comes up with a better solution but, at least, the following would work.  In editing your post, using the forum's advanced editor feature, I was able to convert the post to its html format.  The results are shown in the following datastep.  The lines will probably wrap, unfortunately:

/*input test data*/

data comments;

input @'sans-serif;">' f1 & $1000.;

datalines4;

<body><p style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"><span style="font-family: arial, helvetica, sans-serif;">We went through everything very quickly and I feel like I'm ready to make changes to my paper. <img class="jive_macro jive_emote" src="/5.0.2/images/emoticons/happy.gif" jivemacro="emoticon" ___jive_emoticon_name="happy" /></span></p>

<p style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"><span style="font-family: arial, helvetica, sans-serif;">Thank you so much B*****<img class="jive_macro jive_emote" src="/5.0.2/images/emoticons/happy.gif" jivemacro="emoticon" ___jive_emoticon_name="happy" /></span></p>

<p style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"><span style="font-family: arial, helvetica, sans-serif;">she was wonderful, i has been here since 2:00P.M and it is now 7:10. Wow, please pray for me Ms. R*****</span></p>

<p style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"><span style="font-family: arial, helvetica, sans-serif;">Friendly =)</span></p>

<p style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"><span style="font-family: arial, helvetica, sans-serif;"><img class="jive_macro jive_emote" src="/5.0.2/images/emoticons/happy.gif" jivemacro="emoticon" ___jive_emoticon_name="happy" /></span></p></body>

;;;;

run;

data comments_test;

set comments;

length em_comment $ 1000;

em_comment=TRANWRD(F1,"</span></p>","");

em_comment=TRANWRD(em_comment,"</body>","");

em_comment=tranwrd(em_comment,":)", " smile");

em_comment=tranwrd(em_comment,"=)", " smile");

em_comment=tranwrd(em_comment,";)", " smile");

em_comment=tranwrd(em_comment,'<img class="jive_macro jive_emote" src="/5.0.2/images/emoticons/happy.gif" jivemacro="emoticon" ___jive_emoticon_name="happy" />', " smile");

run;

ChadAtkinson
Calcite | Level 5

Thanks for the comments.

I should clarify that I am only concerned with processing text with this project.  (The SAS board is where the gifs are being produced.)

Thanks to Edward Ballard's comment, I was able to obtain a solution to the problem:

/*

emoticon: a macro to replace common emoticons with text.  It is designed to process text comments

prior to text/sentiment analysis.

To call the macro:

%emoticon(comment=name_of_variable_containing_comments, set= name_of_dataset_containing_comments);

*/

%macro emoticon (comment=, set=);

data work.transformed_&set.;

set &set;

&comment.=tranwrd(&comment.,">:]"," smile");

&comment.=tranwrd(&comment.,"(-:"," smile");

&comment.=tranwrd(&comment.,":-)"," smile");

&comment.=tranwrd(&comment.,":)"," smile");

&comment.=tranwrd(&comment.,":o)"," smile");

&comment.=tranwrd(&comment.,":]"," smile");

&comment.=tranwrd(&comment.,":3"," smile");

&comment.=tranwrd(&comment.,":c)"," smile");

&comment.=tranwrd(&comment.,":>"," smile");

&comment.=tranwrd(&comment.,"=]"," smile");

&comment.=tranwrd(&comment.,"8)"," smile");

&comment.=tranwrd(&comment.,"=)"," smile");

&comment.=tranwrd(&comment.,":}"," smile");

&comment.=tranwrd(&comment.,":^)"," smile");

&comment.=tranwrd(&comment.,":-)="," smile");

&comment.=tranwrd(&comment.,"#:-)"," smile");

&comment.=tranwrd(&comment.,":-)8"," smile");

&comment.=tranwrd(&comment.,"d:-)"," smile");

&comment.=tranwrd(&comment.,"&:-)"," smile");

&comment.=tranwrd(&comment.,"8-)"," smile");

&comment.=tranwrd(&comment.,"{:-)"," smile");

&comment.=tranwrd(&comment.,"(:-)"," smile");

&comment.=tranwrd(&comment.,":-( )"," smile");

&comment.=tranwrd(&comment.,"C|:-)"," smile");

&comment.=tranwrd(&comment.,"[:-)"," smile");

&comment.=tranwrd(&comment.,">:D"," laugh");

&comment.=tranwrd(&comment.,":-D"," laugh");

&comment.=tranwrd(&comment.,":D"," laugh");

&comment.=tranwrd(&comment.,"8-D"," laugh");

&comment.=tranwrd(&comment.,"8D"," laugh");

&comment.=tranwrd(&comment.,"x-D"," laugh");

&comment.=tranwrd(&comment.,"xD"," laugh");

&comment.=tranwrd(&comment.,"X-D"," laugh");

&comment.=tranwrd(&comment.,"XD"," laugh");

&comment.=tranwrd(&comment.,"=-D"," laugh");

&comment.=tranwrd(&comment.,"=D"," laugh");

&comment.=tranwrd(&comment.,"=-3"," laugh");

&comment.=tranwrd(&comment.,"=3"," laugh");

&comment.=tranwrd(&comment.,"B^D"," laugh");

&comment.=tranwrd(&comment.,":-))"," happy");

&comment.=tranwrd(&comment.,">:["," sad");

&comment.=tranwrd(&comment.,":-("," sad");

&comment.=tranwrd(&comment.,":("," sad");

&comment.=tranwrd(&comment.,":-c"," sad");

&comment.=tranwrd(&comment.,":c"," sad");

&comment.=tranwrd(&comment.,":-<"," sad");

&comment.=tranwrd(&comment.,":<"," sad");

&comment.=tranwrd(&comment.,":-["," sad");

&comment.=tranwrd(&comment.,":["," sad");

&comment.=tranwrd(&comment.,":{"," sad");

&comment.=tranwrd(&comment.,":-||"," angry");

&comment.=tranwrd(&comment.,":@"," angry");

&comment.=tranwrd(&comment.,">:-("," angry");

&comment.=tranwrd(&comment.,"QQ"," crying");

&comment.=tranwrd(&comment.,"D:<"," disgust or sadness");

&comment.=tranwrd(&comment.,"D:"," disgust or sadness");

&comment.=tranwrd(&comment.,"D8"," disgust or sadness");

&comment.=tranwrd(&comment.,"D;"," disgust or sadness");

&comment.=tranwrd(&comment.,"D="," disgust or sadness");

&comment.=tranwrd(&comment.,"DX"," disgust or sadness");

&comment.=tranwrd(&comment.,"v.v"," disgust or sadness");

&comment.=tranwrd(&comment.,"D-':"," disgust or sadness");

&comment.=tranwrd(&comment.,">:o"," surprise");

&comment.=tranwrd(&comment.,">:O"," surprise");

&comment.=tranwrd(&comment.,":-O"," surprise");

&comment.=tranwrd(&comment.,":O"," surprise");

&comment.=tranwrd(&comment.,"°o°"," surprise");

&comment.=tranwrd(&comment.,"°O°"," surprise");

&comment.=tranwrd(&comment.,":O"," surprise");

&comment.=tranwrd(&comment.,"o_O"," surprise");

&comment.=tranwrd(&comment.,"o_0"," surprise");

&comment.=tranwrd(&comment.,"o.O"," surprise");

&comment.=tranwrd(&comment.,"8-0"," surprise");

&comment.=tranwrd(&comment.,":-<>"," surprise");

&comment.=tranwrd(&comment.,":*"," kiss");

&comment.=tranwrd(&comment.,":^*"," kiss");

&comment.=tranwrd(&comment.,">;]"," wink");

&comment.=tranwrd(&comment.,";-)"," wink");

&comment.=tranwrd(&comment.,";)"," wink");

&comment.=tranwrd(&comment.,"*-)"," wink");

&comment.=tranwrd(&comment.,"*)"," wink");

&comment.=tranwrd(&comment.,";-]"," wink");

&comment.=tranwrd(&comment.,";]"," wink");

&comment.=tranwrd(&comment.,";D"," wink");

&comment.=tranwrd(&comment.,";^)"," wink");

&comment.=tranwrd(&comment.,":-,"," wink");

&comment.=tranwrd(&comment.,">:P"," playfully sticking out tongue");

&comment.=tranwrd(&comment.,":-P"," playfully sticking out tongue");

&comment.=tranwrd(&comment.,":P"," playfully sticking out tongue");

&comment.=tranwrd(&comment.,"X-P"," playfully sticking out tongue");

&comment.=tranwrd(&comment.,"x-p"," playfully sticking out tongue");

&comment.=tranwrd(&comment.," xp "," playfully sticking out tongue");

&comment.=tranwrd(&comment.," XP "," playfully sticking out tongue");

&comment.=tranwrd(&comment.,":-p"," playfully sticking out tongue");

&comment.=tranwrd(&comment.,":p"," playfully sticking out tongue");

&comment.=tranwrd(&comment.,"=p"," playfully sticking out tongue");

&comment.=tranwrd(&comment.,":-b"," playfully sticking out tongue");

&comment.=tranwrd(&comment.,":b"," playfully sticking out tongue");

&comment.=tranwrd(&comment.,">:\"," annoyed");

&comment.=tranwrd(&comment.,">:/"," annoyed");

&comment.=tranwrd(&comment.,":-\"," annoyed");

&comment.=tranwrd(&comment.,":-/"," annoyed");

&comment.=tranwrd(&comment.,":-."," annoyed");

&comment.=tranwrd(&comment.,":/"," annoyed");

&comment.=tranwrd(&comment.,":\"," annoyed");

&comment.=tranwrd(&comment.,"=/"," annoyed");

&comment.=tranwrd(&comment.,"=\"," annoyed");

&comment.=tranwrd(&comment.,":S"," annoyed");

&comment.=tranwrd(&comment.,">.<"," annoyed");

&comment.=tranwrd(&comment.,":-|"," determined or straight face");

&comment.=tranwrd(&comment.,":$"," embarrassed");

&comment.=tranwrd(&comment.,">:X"," not speaking");

&comment.=tranwrd(&comment.,":-X"," not speaking");

&comment.=tranwrd(&comment.,":X"," not speaking");

&comment.=tranwrd(&comment.,"O:-)"," angel or innocent");

&comment.=tranwrd(&comment.,"0:-3"," angel or innocent");

&comment.=tranwrd(&comment.,"0:3"," angel or innocent");

&comment.=tranwrd(&comment.,"0:-)"," angel or innocent");

&comment.=tranwrd(&comment.,"0:)"," angel or innocent");

&comment.=tranwrd(&comment.,"0;^)"," angel or innocent");

&comment.=tranwrd(&comment.,">:)"," evil");

&comment.=tranwrd(&comment.,">;)"," evil");

&comment.=tranwrd(&comment.,">:-)"," evil");

&comment.=tranwrd(&comment.,"}:-)"," devilish");

&comment.=tranwrd(&comment.,"}:)"," devilish");

&comment.=tranwrd(&comment.,"3:-)"," devilish");

&comment.=tranwrd(&comment.,"3:)"," devilish");

&comment.=tranwrd(&comment.,"o/\o"," high five");

&comment.=tranwrd(&comment.,"^5"," high five");

&comment.=tranwrd(&comment.,"|-O"," bored");

&comment.=tranwrd(&comment.,":-&"," tongue-tied");

&comment.=tranwrd(&comment.,":&"," tongue-tied");

&comment.=tranwrd(&comment.,"#-)"," confused");

&comment.=tranwrd(&comment.,"%-)"," confused");

&comment.=tranwrd(&comment.,"%)"," confused");

&comment.=tranwrd(&comment.,":-###.."," sick");

&comment.=tranwrd(&comment.,":###.."," sick");

&comment.=tranwrd(&comment.,"<:-|"," dunce");

&comment.=tranwrd(&comment.,"<*)))-{"," fish");

&comment.=tranwrd(&comment.,"><(((*>"," fish");

&comment.=tranwrd(&comment.,"><> "," fish");

&comment.=tranwrd(&comment.,"*\0/*"," cheerleader");

&comment.=tranwrd(&comment.,"@}-;-'---"," rose");

&comment.=tranwrd(&comment.,"@>-->--"," rose");

&comment.=tranwrd(&comment.,"<3"," heart");

&comment.=tranwrd(&comment.,"</3"," broken heart");

&comment.=tranwrd(&comment.,":-o zz"," bored");

&comment.=tranwrd(&comment.,": @"," shouting");

&comment.=tranwrd(&comment.,":-(0)"," shouting");

&comment.=tranwrd(&comment.,"(-.-)"," sleeping");

&comment.=tranwrd(&comment.,"|-I"," sleeping");

&comment.=tranwrd(&comment.,"|-O"," snoring");

&comment.=tranwrd(&comment.,":-v"," talking");

run;

%mend emoticon;

I also received some local assistance that used perl:

data comments_test;

set comments;

em_comment=f1;

x=compress(compress(em_comment,'','kp'),' ');

y=prxparse('s/[:;=]-?[D)]/Smile/');

   call prxchange(y,-1,x);

run;


The issue in that case was

A) I am a novice SAS programmer and a neophyte perl coder

B) how to handle emoticons of different lengths (i.e. when the first buffer can be 1 to n characters and the second one can be 1 to n as well...)


Thanks again for the assistance.


Chad

art297
Opal | Level 21

Glad to hear that you got what you needed, but I'd still be interested to hear if someone has an easier solution for dealing with imbedded gif emoticons.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2465 views
  • 3 likes
  • 3 in conversation