BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jonathanzz
Obsidian | Level 7

I want to split a text variable into different new variables but only keeping words that I want as follow:

 

Original dataset:

review_ID    review

0001           I am satisfied with the staff and also the restaurant

0002           The swimming pool, spa and bar are amazing

...

 

Keywords that I want to keep:

staff

restaurant

swimming pool

spa

bar

 

Dataset that I want:

review_ID    review1                  review2              review3

0001           staff                       restaurant            -

0002           swimming pool       spa                      bar

...

 

Can anyone help me out please?

1 ACCEPTED SOLUTION

Accepted Solutions
gamotte
Rhodochrosite | Level 12

Hello,

 

For further processing and since you don't know in advance how many keywords you will find for each ID, it is better to use a vertical data storage.

 

data have;
    infile cards dlm='+' dsd;
    format review_ID z4. review $80.;
    input review_ID review;
    cards;
0001+I am satisfied with the staff and also the restaurant
0002+The swimming pool, spa and bar are amazing
;
run;

data keywords;
    length keyword $20.;
    input keyword;
    cards;
staff
restaurant
swimming pool
spa
bar
;
run;

proc sql noprint;
    SELECT count(keyword) INTO :nk
    FROM keywords;
quit;

data want (rename=(_review=review));
    set have;
    length _review $20.;

    array kw(&nk.) $20.;
    retain kw:;

    if _N_=1 then do i=1 to dim(kw);
        set keywords;
        kw(i)=keyword;
    end;

    do i=1 to dim(kw);
        if findw(review,strip(kw(i))) then do;
            _review=kw(i);
            output;
        end;
    end;

    if _review=" " then output;

    keep review_ID _review;

run;

data _NULL_;
    set want;
    by review_ID;

    if first.review_ID then put review_ID@;

    if last.review_ID then put review;
    else put review@;
run;

View solution in original post

4 REPLIES 4
gamotte
Rhodochrosite | Level 12

Hello,

 

For further processing and since you don't know in advance how many keywords you will find for each ID, it is better to use a vertical data storage.

 

data have;
    infile cards dlm='+' dsd;
    format review_ID z4. review $80.;
    input review_ID review;
    cards;
0001+I am satisfied with the staff and also the restaurant
0002+The swimming pool, spa and bar are amazing
;
run;

data keywords;
    length keyword $20.;
    input keyword;
    cards;
staff
restaurant
swimming pool
spa
bar
;
run;

proc sql noprint;
    SELECT count(keyword) INTO :nk
    FROM keywords;
quit;

data want (rename=(_review=review));
    set have;
    length _review $20.;

    array kw(&nk.) $20.;
    retain kw:;

    if _N_=1 then do i=1 to dim(kw);
        set keywords;
        kw(i)=keyword;
    end;

    do i=1 to dim(kw);
        if findw(review,strip(kw(i))) then do;
            _review=kw(i);
            output;
        end;
    end;

    if _review=" " then output;

    keep review_ID _review;

run;

data _NULL_;
    set want;
    by review_ID;

    if first.review_ID then put review_ID@;

    if last.review_ID then put review;
    else put review@;
run;
Jonathanzz
Obsidian | Level 7
Thanks!!! You coding works!
Ksharp
Super User
data have;
    infile cards dlm='+' dsd;
    format review_ID z4. review $80.;
    input review_ID review;
    cards;
0001+I am satisfied with the staff and also the restaurant
0002+The swimming pool, spa and bar are amazing
;
run;

data keywords;
input keyword $20.;
    cards;
staff
restaurant
swimming pool
spa
bar
;
run;


proc sql;
create table temp as
select review_id,keyword
 from have as a,keywords as b
  where find(review,strip(keyword),'i')
   order by review_id;
quit;
proc transpose data=temp out=want prefix=review;
by review_id;
var keyword;
run;
Jonathanzz
Obsidian | Level 7
Thanks!!! Your method can do exactly what I want!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 3664 views
  • 2 likes
  • 3 in conversation