DATA Step, Macro, Functions and more

Processing emoticons in textual data?

Reply
Occasional Contributor
Posts: 5

Processing emoticons in textual data?

Hello All,

I am working with data that contains comments from American college students.  There are a large number of emoticons in these comments, which I initially removed ( comment1=compress(comment,,'p'); ).  The problem is that this appears to lead to a reasonably large loss of information.  Does anyone have any code that changes emoticons to words?

Thanks,

Chad Atkinson

PROC Star
Posts: 7,363

Re: Processing emoticons in textual data?

The list shown here might give you a start: List of emoticons - Wikipedia, the free encyclopedia

Occasional Contributor
Posts: 5

Re: Processing emoticons in textual data?

I have been looking at the tranwrd function as a starting point for this.  The longer term goal is to generate a macro that has a set of translations.

/*input test data*/

data comments;

input f1 & $1000.;

datalines4;

We went through everything very quickly and I feel like I'm ready to make changes to my paper. Smiley Happy /*there is a blank, colon, right parens here but it gets converted by the bbs*/

Thank you so much B*****Smiley Happy

she was wonderful, i has been here since 2:00P.M and it is now 7:10. Wow, please pray for me Ms. R*****

Friendly =)

Smiley Happy

;;;;

run;

/*use tranwrd to replace emoticon strings with words*/

data comments_test;

set comments;

length em_comment $ 1000;

em_comment=f1;

em_comment=tranwrd(f1,"Smiley Happy", " smile");

em_comment=tranwrd(f1,"=)", " smile");

em_comment=tranwrd(f1,"Smiley Wink", " smile");

run;

The issue that I have is that when I run the code, only the fifth record is changed (semicolon, right parenthesis becomes 'smile').  The others are not altered.

Any suggestions?

Super User
Posts: 10,500

Re: Processing emoticons in textual data?

try

 

em_comment=tranwrd(em_comment,"Smiley Happy", " smile");

em_comment=tranwrd(em_comment,"=)", " smile");

em_comment=tranwrd(em_comment,"Smiley Wink", " smile");


PROC Star
Posts: 7,363

Re: Processing emoticons in textual data?

I agree with that you have to write your statements such that they don't conflict with each other, but I think the problem you are running into goes well beyond that.

The emoticons you are trying to capture are gifs .. not text, thus they don't copy and paste when you try to bring them into your code.

Hopefully someone comes up with a better solution but, at least, the following would work.  In editing your post, using the forum's advanced editor feature, I was able to convert the post to its html format.  The results are shown in the following datastep.  The lines will probably wrap, unfortunately:

/*input test data*/

data comments;

input @'sans-serif;">' f1 & $1000.;

datalines4;

<body><p style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"><span style="font-family: arial, helvetica, sans-serif;">We went through everything very quickly and I feel like I'm ready to make changes to my paper. <img class="jive_macro jive_emote" src="/5.0.2/images/emoticons/happy.gif" jivemacro="emoticon" ___jive_emoticon_name="happy" /></span></p>

<p style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"><span style="font-family: arial, helvetica, sans-serif;">Thank you so much B*****<img class="jive_macro jive_emote" src="/5.0.2/images/emoticons/happy.gif" jivemacro="emoticon" ___jive_emoticon_name="happy" /></span></p>

<p style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"><span style="font-family: arial, helvetica, sans-serif;">she was wonderful, i has been here since 2:00P.M and it is now 7:10. Wow, please pray for me Ms. R*****</span></p>

<p style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"><span style="font-family: arial, helvetica, sans-serif;">Friendly =)</span></p>

<p style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"><span style="font-family: arial, helvetica, sans-serif;"><img class="jive_macro jive_emote" src="/5.0.2/images/emoticons/happy.gif" jivemacro="emoticon" ___jive_emoticon_name="happy" /></span></p></body>

;;;;

run;

data comments_test;

set comments;

length em_comment $ 1000;

em_comment=TRANWRD(F1,"</span></p>","");

em_comment=TRANWRD(em_comment,"</body>","");

em_comment=tranwrd(em_comment,"Smiley Happy", " smile");

em_comment=tranwrd(em_comment,"=)", " smile");

em_comment=tranwrd(em_comment,"Smiley Wink", " smile");

em_comment=tranwrd(em_comment,'<img class="jive_macro jive_emote" src="/5.0.2/images/emoticons/happy.gif" jivemacro="emoticon" ___jive_emoticon_name="happy" />', " smile");

run;

Occasional Contributor
Posts: 5

Re: Processing emoticons in textual data?

Thanks for the comments.

I should clarify that I am only concerned with processing text with this project.  (The SAS board is where the gifs are being produced.)

Thanks to Edward Ballard's comment, I was able to obtain a solution to the problem:

/*

emoticon: a macro to replace common emoticons with text.  It is designed to process text comments

prior to text/sentiment analysis.

To call the macro:

%emoticon(comment=name_of_variable_containing_comments, set= name_of_dataset_containing_comments);

*/

%macro emoticon (comment=, set=);

data work.transformed_&set.;

set &set;

&comment.=tranwrd(&comment.,">:]"," smile");

&comment.=tranwrd(&comment.,"(-:"," smile");

&comment.=tranwrd(&comment.,":-)"," smile");

&comment.=tranwrd(&comment.,"Smiley Happy"," smile");

&comment.=tranwrd(&comment.,"Smiley Surprised)"," smile");

&comment.=tranwrd(&comment.,":]"," smile");

&comment.=tranwrd(&comment.,":3"," smile");

&comment.=tranwrd(&comment.,":c)"," smile");

&comment.=tranwrd(&comment.,"Smiley Embarassed"," smile");

&comment.=tranwrd(&comment.,"=]"," smile");

&comment.=tranwrd(&comment.,"8)"," smile");

&comment.=tranwrd(&comment.,"=)"," smile");

&comment.=tranwrd(&comment.,":}"," smile");

&comment.=tranwrd(&comment.,":^)"," smile");

&comment.=tranwrd(&comment.,":-)="," smile");

&comment.=tranwrd(&comment.,"#:-)"," smile");

&comment.=tranwrd(&comment.,":-)8"," smile");

&comment.=tranwrd(&comment.,"d:-)"," smile");

&comment.=tranwrd(&comment.,"&:-)"," smile");

&comment.=tranwrd(&comment.,"8-)"," smile");

&comment.=tranwrd(&comment.,"{:-)"," smile");

&comment.=tranwrd(&comment.,"(:-)"," smile");

&comment.=tranwrd(&comment.,":-( )"," smile");

&comment.=tranwrd(&comment.,"C|:-)"," smile");

&comment.=tranwrd(&comment.,"[:-)"," smile");

&comment.=tranwrd(&comment.,">Smiley Very Happy"," laugh");

&comment.=tranwrd(&comment.,":-D"," laugh");

&comment.=tranwrd(&comment.,"Smiley Very Happy"," laugh");

&comment.=tranwrd(&comment.,"8-D"," laugh");

&comment.=tranwrd(&comment.,"8D"," laugh");

&comment.=tranwrd(&comment.,"x-D"," laugh");

&comment.=tranwrd(&comment.,"xD"," laugh");

&comment.=tranwrd(&comment.,"X-D"," laugh");

&comment.=tranwrd(&comment.,"XD"," laugh");

&comment.=tranwrd(&comment.,"=-D"," laugh");

&comment.=tranwrd(&comment.,"=D"," laugh");

&comment.=tranwrd(&comment.,"=-3"," laugh");

&comment.=tranwrd(&comment.,"=3"," laugh");

&comment.=tranwrd(&comment.,"B^D"," laugh");

&comment.=tranwrd(&comment.,":-))"," happy");

&comment.=tranwrd(&comment.,">:["," sad");

&comment.=tranwrd(&comment.,":-("," sad");

&comment.=tranwrd(&comment.,"Smiley Sad"," sad");

&comment.=tranwrd(&comment.,":-c"," sad");

&comment.=tranwrd(&comment.,":c"," sad");

&comment.=tranwrd(&comment.,":-<"," sad");

&comment.=tranwrd(&comment.,":<"," sad");

&comment.=tranwrd(&comment.,":-["," sad");

&comment.=tranwrd(&comment.,":["," sad");

&comment.=tranwrd(&comment.,":{"," sad");

&comment.=tranwrd(&comment.,":-||"," angry");

&comment.=tranwrd(&comment.,":@"," angry");

&comment.=tranwrd(&comment.,">:-("," angry");

&comment.=tranwrd(&comment.,"QQ"," crying");

&comment.=tranwrd(&comment.,"D:<"," disgust or sadness");

&comment.=tranwrd(&comment.,"D:"," disgust or sadness");

&comment.=tranwrd(&comment.,"D8"," disgust or sadness");

&comment.=tranwrd(&comment.,"D;"," disgust or sadness");

&comment.=tranwrd(&comment.,"D="," disgust or sadness");

&comment.=tranwrd(&comment.,"DX"," disgust or sadness");

&comment.=tranwrd(&comment.,"v.v"," disgust or sadness");

&comment.=tranwrd(&comment.,"D-':"," disgust or sadness");

&comment.=tranwrd(&comment.,">Smiley Surprised"," surprise");

&comment.=tranwrd(&comment.,">Smiley Surprised"," surprise");

&comment.=tranwrd(&comment.,":-O"," surprise");

&comment.=tranwrd(&comment.,"Smiley Surprised"," surprise");

&comment.=tranwrd(&comment.,"°o°"," surprise");

&comment.=tranwrd(&comment.,"°O°"," surprise");

&comment.=tranwrd(&comment.,"Smiley Surprised"," surprise");

&comment.=tranwrd(&comment.,"o_O"," surprise");

&comment.=tranwrd(&comment.,"o_0"," surprise");

&comment.=tranwrd(&comment.,"o.O"," surprise");

&comment.=tranwrd(&comment.,"8-0"," surprise");

&comment.=tranwrd(&comment.,":-<>"," surprise");

&comment.=tranwrd(&comment.,":*"," kiss");

&comment.=tranwrd(&comment.,":^*"," kiss");

&comment.=tranwrd(&comment.,">;]"," wink");

&comment.=tranwrd(&comment.,";-)"," wink");

&comment.=tranwrd(&comment.,"Smiley Wink"," wink");

&comment.=tranwrd(&comment.,"*-)"," wink");

&comment.=tranwrd(&comment.,"*)"," wink");

&comment.=tranwrd(&comment.,";-]"," wink");

&comment.=tranwrd(&comment.,";]"," wink");

&comment.=tranwrd(&comment.,";D"," wink");

&comment.=tranwrd(&comment.,";^)"," wink");

&comment.=tranwrd(&comment.,":-,"," wink");

&comment.=tranwrd(&comment.,">Smiley Tongue"," playfully sticking out tongue");

&comment.=tranwrd(&comment.,":-P"," playfully sticking out tongue");

&comment.=tranwrd(&comment.,"Smiley Tongue"," playfully sticking out tongue");

&comment.=tranwrd(&comment.,"X-P"," playfully sticking out tongue");

&comment.=tranwrd(&comment.,"x-p"," playfully sticking out tongue");

&comment.=tranwrd(&comment.," xp "," playfully sticking out tongue");

&comment.=tranwrd(&comment.," XP "," playfully sticking out tongue");

&comment.=tranwrd(&comment.,":-p"," playfully sticking out tongue");

&comment.=tranwrd(&comment.,"Smiley Tongue"," playfully sticking out tongue");

&comment.=tranwrd(&comment.,"=p"," playfully sticking out tongue");

&comment.=tranwrd(&comment.,":-b"," playfully sticking out tongue");

&comment.=tranwrd(&comment.,":b"," playfully sticking out tongue");

&comment.=tranwrd(&comment.,">:\"," annoyed");

&comment.=tranwrd(&comment.,">:/"," annoyed");

&comment.=tranwrd(&comment.,":-\"," annoyed");

&comment.=tranwrd(&comment.,":-/"," annoyed");

&comment.=tranwrd(&comment.,":-."," annoyed");

&comment.=tranwrd(&comment.,":/"," annoyed");

&comment.=tranwrd(&comment.,":\"," annoyed");

&comment.=tranwrd(&comment.,"=/"," annoyed");

&comment.=tranwrd(&comment.,"=\"," annoyed");

&comment.=tranwrd(&comment.,"Smiley Frustrated"," annoyed");

&comment.=tranwrd(&comment.,">.<"," annoyed");

&comment.=tranwrd(&comment.,":-|"," determined or straight face");

&comment.=tranwrd(&comment.,":$"," embarrassed");

&comment.=tranwrd(&comment.,">:X"," not speaking");

&comment.=tranwrd(&comment.,":-X"," not speaking");

&comment.=tranwrd(&comment.,":X"," not speaking");

&comment.=tranwrd(&comment.,"O:-)"," angel or innocent");

&comment.=tranwrd(&comment.,"0:-3"," angel or innocent");

&comment.=tranwrd(&comment.,"0:3"," angel or innocent");

&comment.=tranwrd(&comment.,"0:-)"," angel or innocent");

&comment.=tranwrd(&comment.,"0Smiley Happy"," angel or innocent");

&comment.=tranwrd(&comment.,"0;^)"," angel or innocent");

&comment.=tranwrd(&comment.,">Smiley Happy"," evil");

&comment.=tranwrd(&comment.,">Smiley Wink"," evil");

&comment.=tranwrd(&comment.,">:-)"," evil");

&comment.=tranwrd(&comment.,"}:-)"," devilish");

&comment.=tranwrd(&comment.,"}Smiley Happy"," devilish");

&comment.=tranwrd(&comment.,"3:-)"," devilish");

&comment.=tranwrd(&comment.,"3Smiley Happy"," devilish");

&comment.=tranwrd(&comment.,"o/\o"," high five");

&comment.=tranwrd(&comment.,"^5"," high five");

&comment.=tranwrd(&comment.,"|-O"," bored");

&comment.=tranwrd(&comment.,":-&"," tongue-tied");

&comment.=tranwrd(&comment.,":&"," tongue-tied");

&comment.=tranwrd(&comment.,"#-)"," confused");

&comment.=tranwrd(&comment.,"%-)"," confused");

&comment.=tranwrd(&comment.,"%)"," confused");

&comment.=tranwrd(&comment.,":-###.."," sick");

&comment.=tranwrd(&comment.,":###.."," sick");

&comment.=tranwrd(&comment.,"<:-|"," dunce");

&comment.=tranwrd(&comment.,"<*)))-{"," fish");

&comment.=tranwrd(&comment.,"><(((*>"," fish");

&comment.=tranwrd(&comment.,"><> "," fish");

&comment.=tranwrd(&comment.,"*\0/*"," cheerleader");

&comment.=tranwrd(&comment.,"@}-;-'---"," rose");

&comment.=tranwrd(&comment.,"@>-->--"," rose");

&comment.=tranwrd(&comment.,"<3"," heart");

&comment.=tranwrd(&comment.,"</3"," broken heart");

&comment.=tranwrd(&comment.,":-o zz"," bored");

&comment.=tranwrd(&comment.,": @"," shouting");

&comment.=tranwrd(&comment.,":-(0)"," shouting");

&comment.=tranwrd(&comment.,"(-.-)"," sleeping");

&comment.=tranwrd(&comment.,"|-I"," sleeping");

&comment.=tranwrd(&comment.,"|-O"," snoring");

&comment.=tranwrd(&comment.,":-v"," talking");

run;

%mend emoticon;

I also received some local assistance that used perl:

data comments_test;

set comments;

em_comment=f1;

x=compress(compress(em_comment,'','kp'),' ');

y=prxparse('s/[:;=]-?[D)]/Smile/');

   call prxchange(y,-1,x);

run;


The issue in that case was

A) I am a novice SAS programmer and a neophyte perl coder

B) how to handle emoticons of different lengths (i.e. when the first buffer can be 1 to n characters and the second one can be 1 to n as well...)


Thanks again for the assistance.


Chad

PROC Star
Posts: 7,363

Re: Processing emoticons in textual data?

Glad to hear that you got what you needed, but I'd still be interested to hear if someone has an easier solution for dealing with imbedded gif emoticons.

Ask a Question
Discussion stats
  • 6 replies
  • 696 views
  • 3 likes
  • 3 in conversation