DATA Step, Macro, Functions and more

How to read specific information from Tweet data

Accepted Solution Solved
Reply
Contributor
Posts: 29
Accepted Solution

How to read specific information from Tweet data

[ Edited ]

note : raw data line are actually different twitter comments. Need to fetch only those data which has special character (with character) as shown in output and those comment does not contains special character then make it null.

 

 

Raw data 

 

@nice#Greatwork@fake

very nice portal ncs 

Good portal @hatsoff

 

output needed :

 

@nice

#Greatwork

@fake 

null

@hatsoff

 

 


Accepted Solutions
Solution
‎08-08-2016 02:13 PM
Contributor
Posts: 33

Re: How to read specific information from Tweet data

A not to complex regular expression and call prxnext can solve the problem:

 

data work.have;
   length lot $ 100;
   input;

   lot = trim(_infile_);

   datalines;
@nice#Greatwork@fake
very nice portal ncs 
Good portal @hatsoff
;
run;


data work.want;
   set work.have;
   length 
      word $ 100
      rx start p l 8
   ;
   retain rx;

   if _n_ = 1 then do;
      rx= prxparse('/([@#]\w+)/i');
   end;

   start = 1;
   
   call prxnext(rx, start, -1, lot, p, l);
   do while (p > 0);
      word = substr(lot, p, l);
      output;
      call prxnext(rx, start, -1, lot, p, l);
   end;

   if missing(word) then do;
      word = 'null';
      output;
   end;

   keep word;
run;

View solution in original post


All Replies
Super User
Posts: 17,829

Re: How to read specific information from Tweet data

Look at the SCAN() function including the third parameter, which allows you specify the delimiter. 

Contributor
Posts: 29

Re: How to read specific information from Tweet data

can you show me the exact code ?
Super User
Posts: 17,829

Re: How to read specific information from Tweet data

Here's a fully worked example. Specify your delimiters as needed, make sure to include spaces in there if required.

http://blogs.sas.com/content/iml/2016/07/11/break-sentence-into-words-sas.html

 

I don't think these will give you a NULL value, but if you really need it add in a check if the number of words is 0 then output a null record.

Solution
‎08-08-2016 02:13 PM
Contributor
Posts: 33

Re: How to read specific information from Tweet data

A not to complex regular expression and call prxnext can solve the problem:

 

data work.have;
   length lot $ 100;
   input;

   lot = trim(_infile_);

   datalines;
@nice#Greatwork@fake
very nice portal ncs 
Good portal @hatsoff
;
run;


data work.want;
   set work.have;
   length 
      word $ 100
      rx start p l 8
   ;
   retain rx;

   if _n_ = 1 then do;
      rx= prxparse('/([@#]\w+)/i');
   end;

   start = 1;
   
   call prxnext(rx, start, -1, lot, p, l);
   do while (p > 0);
      word = substr(lot, p, l);
      output;
      call prxnext(rx, start, -1, lot, p, l);
   end;

   if missing(word) then do;
      word = 'null';
      output;
   end;

   keep word;
run;
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 342 views
  • 1 like
  • 3 in conversation