BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jonathanzz
Obsidian | Level 7

I want to split a text variable into different new variables but only keeping words that I want as follow:

 

Original dataset:

review_ID    review

0001           I am satisfied with the staff and also the restaurant

0002           The swimming pool, spa and bar are amazing

...

 

Keywords that I want to keep:

staff

restaurant

swimming pool

spa

bar

 

Dataset that I want:

review_ID    review1                  review2              review3

0001           staff                       restaurant            -

0002           swimming pool       spa                      bar

...

 

Can anyone help me out please?

1 ACCEPTED SOLUTION

Accepted Solutions
gamotte
Rhodochrosite | Level 12

Hello,

 

For further processing and since you don't know in advance how many keywords you will find for each ID, it is better to use a vertical data storage.

 

data have;
    infile cards dlm='+' dsd;
    format review_ID z4. review $80.;
    input review_ID review;
    cards;
0001+I am satisfied with the staff and also the restaurant
0002+The swimming pool, spa and bar are amazing
;
run;

data keywords;
    length keyword $20.;
    input keyword;
    cards;
staff
restaurant
swimming pool
spa
bar
;
run;

proc sql noprint;
    SELECT count(keyword) INTO :nk
    FROM keywords;
quit;

data want (rename=(_review=review));
    set have;
    length _review $20.;

    array kw(&nk.) $20.;
    retain kw:;

    if _N_=1 then do i=1 to dim(kw);
        set keywords;
        kw(i)=keyword;
    end;

    do i=1 to dim(kw);
        if findw(review,strip(kw(i))) then do;
            _review=kw(i);
            output;
        end;
    end;

    if _review=" " then output;

    keep review_ID _review;

run;

data _NULL_;
    set want;
    by review_ID;

    if first.review_ID then put review_ID@;

    if last.review_ID then put review;
    else put review@;
run;

View solution in original post

4 REPLIES 4
gamotte
Rhodochrosite | Level 12

Hello,

 

For further processing and since you don't know in advance how many keywords you will find for each ID, it is better to use a vertical data storage.

 

data have;
    infile cards dlm='+' dsd;
    format review_ID z4. review $80.;
    input review_ID review;
    cards;
0001+I am satisfied with the staff and also the restaurant
0002+The swimming pool, spa and bar are amazing
;
run;

data keywords;
    length keyword $20.;
    input keyword;
    cards;
staff
restaurant
swimming pool
spa
bar
;
run;

proc sql noprint;
    SELECT count(keyword) INTO :nk
    FROM keywords;
quit;

data want (rename=(_review=review));
    set have;
    length _review $20.;

    array kw(&nk.) $20.;
    retain kw:;

    if _N_=1 then do i=1 to dim(kw);
        set keywords;
        kw(i)=keyword;
    end;

    do i=1 to dim(kw);
        if findw(review,strip(kw(i))) then do;
            _review=kw(i);
            output;
        end;
    end;

    if _review=" " then output;

    keep review_ID _review;

run;

data _NULL_;
    set want;
    by review_ID;

    if first.review_ID then put review_ID@;

    if last.review_ID then put review;
    else put review@;
run;
Jonathanzz
Obsidian | Level 7
Thanks!!! You coding works!
Ksharp
Super User
data have;
    infile cards dlm='+' dsd;
    format review_ID z4. review $80.;
    input review_ID review;
    cards;
0001+I am satisfied with the staff and also the restaurant
0002+The swimming pool, spa and bar are amazing
;
run;

data keywords;
input keyword $20.;
    cards;
staff
restaurant
swimming pool
spa
bar
;
run;


proc sql;
create table temp as
select review_id,keyword
 from have as a,keywords as b
  where find(review,strip(keyword),'i')
   order by review_id;
quit;
proc transpose data=temp out=want prefix=review;
by review_id;
var keyword;
run;
Jonathanzz
Obsidian | Level 7
Thanks!!! Your method can do exactly what I want!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 3582 views
  • 2 likes
  • 3 in conversation