- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello All,
I am working with data that contains comments from American college students. There are a large number of emoticons in these comments, which I initially removed ( comment1=compress(comment,,'p'); ). The problem is that this appears to lead to a reasonably large loss of information. Does anyone have any code that changes emoticons to words?
Thanks,
Chad Atkinson
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The list shown here might give you a start: List of emoticons - Wikipedia, the free encyclopedia
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have been looking at the tranwrd function as a starting point for this. The longer term goal is to generate a macro that has a set of translations.
/*input test data*/
data comments;
input f1 & $1000.;
datalines4;
We went through everything very quickly and I feel like I'm ready to make changes to my paper. /*there is a blank, colon, right parens here but it gets converted by the bbs*/
Thank you so much B*****
she was wonderful, i has been here since 2:00P.M and it is now 7:10. Wow, please pray for me Ms. R*****
Friendly 😃
;;;;
run;
/*use tranwrd to replace emoticon strings with words*/
data comments_test;
set comments;
length em_comment $ 1000;
em_comment=f1;
em_comment=tranwrd(f1,":)", " smile");
em_comment=tranwrd(f1,"=)", " smile");
em_comment=tranwrd(f1,";)", " smile");
run;
The issue that I have is that when I run the code, only the fifth record is changed (semicolon, right parenthesis becomes 'smile'). The others are not altered.
Any suggestions?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
try
em_comment=tranwrd(em_comment,":)", " smile");
em_comment=tranwrd(em_comment,"=)", " smile");
em_comment=tranwrd(em_comment,";)", " smile");
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I agree with that you have to write your statements such that they don't conflict with each other, but I think the problem you are running into goes well beyond that.
The emoticons you are trying to capture are gifs .. not text, thus they don't copy and paste when you try to bring them into your code.
Hopefully someone comes up with a better solution but, at least, the following would work. In editing your post, using the forum's advanced editor feature, I was able to convert the post to its html format. The results are shown in the following datastep. The lines will probably wrap, unfortunately:
/*input test data*/
data comments;
input @'sans-serif;">' f1 & $1000.;
datalines4;
<body><p style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"><span style="font-family: arial, helvetica, sans-serif;">We went through everything very quickly and I feel like I'm ready to make changes to my paper. <img class="jive_macro jive_emote" src="/5.0.2/images/emoticons/happy.gif" jivemacro="emoticon" ___jive_emoticon_name="happy" /></span></p>
<p style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"><span style="font-family: arial, helvetica, sans-serif;">Thank you so much B*****<img class="jive_macro jive_emote" src="/5.0.2/images/emoticons/happy.gif" jivemacro="emoticon" ___jive_emoticon_name="happy" /></span></p>
<p style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"><span style="font-family: arial, helvetica, sans-serif;">she was wonderful, i has been here since 2:00P.M and it is now 7:10. Wow, please pray for me Ms. R*****</span></p>
<p style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"><span style="font-family: arial, helvetica, sans-serif;">Friendly =)</span></p>
<p style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"><span style="font-family: arial, helvetica, sans-serif;"><img class="jive_macro jive_emote" src="/5.0.2/images/emoticons/happy.gif" jivemacro="emoticon" ___jive_emoticon_name="happy" /></span></p></body>
;;;;
run;
data comments_test;
set comments;
length em_comment $ 1000;
em_comment=TRANWRD(F1,"</span></p>","");
em_comment=TRANWRD(em_comment,"</body>","");
em_comment=tranwrd(em_comment,":)", " smile");
em_comment=tranwrd(em_comment,"=)", " smile");
em_comment=tranwrd(em_comment,";)", " smile");
em_comment=tranwrd(em_comment,'<img class="jive_macro jive_emote" src="/5.0.2/images/emoticons/happy.gif" jivemacro="emoticon" ___jive_emoticon_name="happy" />', " smile");
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the comments.
I should clarify that I am only concerned with processing text with this project. (The SAS board is where the gifs are being produced.)
Thanks to Edward Ballard's comment, I was able to obtain a solution to the problem:
/*
emoticon: a macro to replace common emoticons with text. It is designed to process text comments
prior to text/sentiment analysis.
To call the macro:
%emoticon(comment=name_of_variable_containing_comments, set= name_of_dataset_containing_comments);
*/
%macro emoticon (comment=, set=);
data work.transformed_&set.;
set &set;
&comment.=tranwrd(&comment.,">:]"," smile");
&comment.=tranwrd(&comment.,"(-:"," smile");
&comment.=tranwrd(&comment.,":-)"," smile");
&comment.=tranwrd(&comment.,":)"," smile");
&comment.=tranwrd(&comment.,":o)"," smile");
&comment.=tranwrd(&comment.,":]"," smile");
&comment.=tranwrd(&comment.,":3"," smile");
&comment.=tranwrd(&comment.,":c)"," smile");
&comment.=tranwrd(&comment.,":>"," smile");
&comment.=tranwrd(&comment.,"=]"," smile");
&comment.=tranwrd(&comment.,"8)"," smile");
&comment.=tranwrd(&comment.,"=)"," smile");
&comment.=tranwrd(&comment.,":}"," smile");
&comment.=tranwrd(&comment.,":^)"," smile");
&comment.=tranwrd(&comment.,":-)="," smile");
&comment.=tranwrd(&comment.,"#:-)"," smile");
&comment.=tranwrd(&comment.,":-)8"," smile");
&comment.=tranwrd(&comment.,"d:-)"," smile");
&comment.=tranwrd(&comment.,"&:-)"," smile");
&comment.=tranwrd(&comment.,"8-)"," smile");
&comment.=tranwrd(&comment.,"{:-)"," smile");
&comment.=tranwrd(&comment.,"(:-)"," smile");
&comment.=tranwrd(&comment.,":-( )"," smile");
&comment.=tranwrd(&comment.,"C|:-)"," smile");
&comment.=tranwrd(&comment.,"[:-)"," smile");
&comment.=tranwrd(&comment.,">:D"," laugh");
&comment.=tranwrd(&comment.,":-D"," laugh");
&comment.=tranwrd(&comment.,":D"," laugh");
&comment.=tranwrd(&comment.,"8-D"," laugh");
&comment.=tranwrd(&comment.,"8D"," laugh");
&comment.=tranwrd(&comment.,"x-D"," laugh");
&comment.=tranwrd(&comment.,"xD"," laugh");
&comment.=tranwrd(&comment.,"X-D"," laugh");
&comment.=tranwrd(&comment.,"XD"," laugh");
&comment.=tranwrd(&comment.,"=-D"," laugh");
&comment.=tranwrd(&comment.,"=D"," laugh");
&comment.=tranwrd(&comment.,"=-3"," laugh");
&comment.=tranwrd(&comment.,"=3"," laugh");
&comment.=tranwrd(&comment.,"B^D"," laugh");
&comment.=tranwrd(&comment.,":-))"," happy");
&comment.=tranwrd(&comment.,">:["," sad");
&comment.=tranwrd(&comment.,":-("," sad");
&comment.=tranwrd(&comment.,":("," sad");
&comment.=tranwrd(&comment.,":-c"," sad");
&comment.=tranwrd(&comment.,":c"," sad");
&comment.=tranwrd(&comment.,":-<"," sad");
&comment.=tranwrd(&comment.,":<"," sad");
&comment.=tranwrd(&comment.,":-["," sad");
&comment.=tranwrd(&comment.,":["," sad");
&comment.=tranwrd(&comment.,":{"," sad");
&comment.=tranwrd(&comment.,":-||"," angry");
&comment.=tranwrd(&comment.,":@"," angry");
&comment.=tranwrd(&comment.,">:-("," angry");
&comment.=tranwrd(&comment.,"QQ"," crying");
&comment.=tranwrd(&comment.,"D:<"," disgust or sadness");
&comment.=tranwrd(&comment.,"D:"," disgust or sadness");
&comment.=tranwrd(&comment.,"D8"," disgust or sadness");
&comment.=tranwrd(&comment.,"D;"," disgust or sadness");
&comment.=tranwrd(&comment.,"D="," disgust or sadness");
&comment.=tranwrd(&comment.,"DX"," disgust or sadness");
&comment.=tranwrd(&comment.,"v.v"," disgust or sadness");
&comment.=tranwrd(&comment.,"D-':"," disgust or sadness");
&comment.=tranwrd(&comment.,">:o"," surprise");
&comment.=tranwrd(&comment.,">:O"," surprise");
&comment.=tranwrd(&comment.,":-O"," surprise");
&comment.=tranwrd(&comment.,":O"," surprise");
&comment.=tranwrd(&comment.,"°o°"," surprise");
&comment.=tranwrd(&comment.,"°O°"," surprise");
&comment.=tranwrd(&comment.,":O"," surprise");
&comment.=tranwrd(&comment.,"o_O"," surprise");
&comment.=tranwrd(&comment.,"o_0"," surprise");
&comment.=tranwrd(&comment.,"o.O"," surprise");
&comment.=tranwrd(&comment.,"8-0"," surprise");
&comment.=tranwrd(&comment.,":-<>"," surprise");
&comment.=tranwrd(&comment.,":*"," kiss");
&comment.=tranwrd(&comment.,":^*"," kiss");
&comment.=tranwrd(&comment.,">;]"," wink");
&comment.=tranwrd(&comment.,";-)"," wink");
&comment.=tranwrd(&comment.,";)"," wink");
&comment.=tranwrd(&comment.,"*-)"," wink");
&comment.=tranwrd(&comment.,"*)"," wink");
&comment.=tranwrd(&comment.,";-]"," wink");
&comment.=tranwrd(&comment.,";]"," wink");
&comment.=tranwrd(&comment.,";D"," wink");
&comment.=tranwrd(&comment.,";^)"," wink");
&comment.=tranwrd(&comment.,":-,"," wink");
&comment.=tranwrd(&comment.,">:P"," playfully sticking out tongue");
&comment.=tranwrd(&comment.,":-P"," playfully sticking out tongue");
&comment.=tranwrd(&comment.,":P"," playfully sticking out tongue");
&comment.=tranwrd(&comment.,"X-P"," playfully sticking out tongue");
&comment.=tranwrd(&comment.,"x-p"," playfully sticking out tongue");
&comment.=tranwrd(&comment.," xp "," playfully sticking out tongue");
&comment.=tranwrd(&comment.," XP "," playfully sticking out tongue");
&comment.=tranwrd(&comment.,":-p"," playfully sticking out tongue");
&comment.=tranwrd(&comment.,":p"," playfully sticking out tongue");
&comment.=tranwrd(&comment.,"=p"," playfully sticking out tongue");
&comment.=tranwrd(&comment.,":-b"," playfully sticking out tongue");
&comment.=tranwrd(&comment.,":b"," playfully sticking out tongue");
&comment.=tranwrd(&comment.,">:\"," annoyed");
&comment.=tranwrd(&comment.,">:/"," annoyed");
&comment.=tranwrd(&comment.,":-\"," annoyed");
&comment.=tranwrd(&comment.,":-/"," annoyed");
&comment.=tranwrd(&comment.,":-."," annoyed");
&comment.=tranwrd(&comment.,":/"," annoyed");
&comment.=tranwrd(&comment.,":\"," annoyed");
&comment.=tranwrd(&comment.,"=/"," annoyed");
&comment.=tranwrd(&comment.,"=\"," annoyed");
&comment.=tranwrd(&comment.,":S"," annoyed");
&comment.=tranwrd(&comment.,">.<"," annoyed");
&comment.=tranwrd(&comment.,":-|"," determined or straight face");
&comment.=tranwrd(&comment.,":$"," embarrassed");
&comment.=tranwrd(&comment.,">:X"," not speaking");
&comment.=tranwrd(&comment.,":-X"," not speaking");
&comment.=tranwrd(&comment.,":X"," not speaking");
&comment.=tranwrd(&comment.,"O:-)"," angel or innocent");
&comment.=tranwrd(&comment.,"0:-3"," angel or innocent");
&comment.=tranwrd(&comment.,"0:3"," angel or innocent");
&comment.=tranwrd(&comment.,"0:-)"," angel or innocent");
&comment.=tranwrd(&comment.,"0:)"," angel or innocent");
&comment.=tranwrd(&comment.,"0;^)"," angel or innocent");
&comment.=tranwrd(&comment.,">:)"," evil");
&comment.=tranwrd(&comment.,">;)"," evil");
&comment.=tranwrd(&comment.,">:-)"," evil");
&comment.=tranwrd(&comment.,"}:-)"," devilish");
&comment.=tranwrd(&comment.,"}:)"," devilish");
&comment.=tranwrd(&comment.,"3:-)"," devilish");
&comment.=tranwrd(&comment.,"3:)"," devilish");
&comment.=tranwrd(&comment.,"o/\o"," high five");
&comment.=tranwrd(&comment.,"^5"," high five");
&comment.=tranwrd(&comment.,"|-O"," bored");
&comment.=tranwrd(&comment.,":-&"," tongue-tied");
&comment.=tranwrd(&comment.,":&"," tongue-tied");
&comment.=tranwrd(&comment.,"#-)"," confused");
&comment.=tranwrd(&comment.,"%-)"," confused");
&comment.=tranwrd(&comment.,"%)"," confused");
&comment.=tranwrd(&comment.,":-###.."," sick");
&comment.=tranwrd(&comment.,":###.."," sick");
&comment.=tranwrd(&comment.,"<:-|"," dunce");
&comment.=tranwrd(&comment.,"<*)))-{"," fish");
&comment.=tranwrd(&comment.,"><(((*>"," fish");
&comment.=tranwrd(&comment.,"><> "," fish");
&comment.=tranwrd(&comment.,"*\0/*"," cheerleader");
&comment.=tranwrd(&comment.,"@}-;-'---"," rose");
&comment.=tranwrd(&comment.,"@>-->--"," rose");
&comment.=tranwrd(&comment.,"<3"," heart");
&comment.=tranwrd(&comment.,"</3"," broken heart");
&comment.=tranwrd(&comment.,":-o zz"," bored");
&comment.=tranwrd(&comment.,": @"," shouting");
&comment.=tranwrd(&comment.,":-(0)"," shouting");
&comment.=tranwrd(&comment.,"(-.-)"," sleeping");
&comment.=tranwrd(&comment.,"|-I"," sleeping");
&comment.=tranwrd(&comment.,"|-O"," snoring");
&comment.=tranwrd(&comment.,":-v"," talking");
run;
%mend emoticon;
I also received some local assistance that used perl:
data comments_test;
set comments;
em_comment=f1;
x=compress(compress(em_comment,'','kp'),' ');
y=prxparse('s/[:;=]-?[D)]/Smile/');
call prxchange(y,-1,x);
run;
The issue in that case was
A) I am a novice SAS programmer and a neophyte perl coder
B) how to handle emoticons of different lengths (i.e. when the first buffer can be 1 to n characters and the second one can be 1 to n as well...)
Thanks again for the assistance.
Chad
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Glad to hear that you got what you needed, but I'd still be interested to hear if someone has an easier solution for dealing with imbedded gif emoticons.