BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jl13491
Fluorite | Level 6

Hi everyone, 

 

I'm about to cry because of SAS! I feel like I'm going crazy! Here's my issue: I want to create variables to flag specified keywords that are found in a TOPIC variable. Here's the code that I have: 

 

 

data test ;
    format topics $char256. ; 
    infile datalines dsd dlm=":";
    input topics $ ;
    datalines;
PHYSICAL COMPUTING  BASIC COMPUTER TEACHING  PROJECTS USING WORD PROCESSING AND DATA ANALYZING  POWERPOINT
GAME DESIGN
WEB DESIGN AND DEVELOPMENT MOBILE APP DEVELOPMENT
PHYSICAL COMPUTING MAKER ACTIVITIES WEB DESIGN AND DEVELOPMENT TEXT-BASED PROGRAMMING
;
run;

%macro test() ; 
DATA test ;
    SET test ;

    array keywords [6] $ t1-t6 ('PHYSICAL', 'ACTIVITES', 'GAME', 'DATA', 'CODING', 'MOBILE') ;
    array varnames [6] physcomp makeract gamedesign datasci coding mobiledev ; 

    %do i = 1 %to 6 ;
        if findw(topics, keywords[&i]) > 0 then varnames[&i] = 1 ;
        else varnames[&i] = 0 ;
    %end ; 

RUN ;
%mend ; 

%test() ; 

which produces the following data: 

 

PHYSICAL COMPUTING  BASIC COMPUTER TEACHING  PROJECTS USING WORD PROCESSING AND DATA ANALYZING  POWERPOINT	
1	0	0	0	0	0
GAME DESIGN	
0	0	0	0	0	0
WEB DESIGN AND DEVELOPMENT MOBILE APP DEVELOPMENT	
0	0	0	0	0	0
PHYSICAL COMPUTING MAKER ACTIVITIES WEB DESIGN AND DEVELOPMENT TEXT-BASED PROGRAMMING	
1	0	0	0	0	0

Huh? Why isn't GAME in the second observation being recognized? Or MOBILE in the third observation? Why is PHYSICAL the only keyword being recognized? Please help, I'm at my wit's end!!! 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

I can't test and explore right now, but here's an idea to examine.

 

When you create character variables with an array the default length is 8.  So "GAME" is not the value "GAME    " is the value.  I'm not sure how FINDW handles that, but you can easily find out.  Also note that some of your keywords are longer than 8 characters long, so the values are being chopped off.

View solution in original post

15 REPLIES 15
jl13491
Fluorite | Level 6

Also, the order of the keyword/varnames list doesn't matter -- even if I put game design first, it still only recognizes physical computing. 

Astounding
PROC Star

I can't test and explore right now, but here's an idea to examine.

 

When you create character variables with an array the default length is 8.  So "GAME" is not the value "GAME    " is the value.  I'm not sure how FINDW handles that, but you can easily find out.  Also note that some of your keywords are longer than 8 characters long, so the values are being chopped off.

jl13491
Fluorite | Level 6
I LOVE YOU ASTOUNDING! Thank you for your big brain, you have cured me! Using TRIM(keywords[&i]) solved the issue!
FreelanceReinh
Jade | Level 19

Hi @jl13491 and welcome to the SAS Support Communities!

 

Don't worry, FINDW can really be confusing. Only minutes before I read your post I struggled with that function in a different context ...

 

I see three issues:

  1. Length of array keywords is the default 8, so will truncate longer initial values. Use 
     array keywords [6] $10 ...
  2. Typo "ACTIVITES" in the second initial value.
  3. The trailing blanks of the keywords must be trimmed. Otherwise they become part of the search term (which is a problem for all keywords that are shorter than the defined length -- and the reason why "PHYSICAL" worked well with length 8). Use
    findw(topics, trim(keywords[&i]))
jl13491
Fluorite | Level 6
Thank you, Freelance, for the extremely thorough solution! You, like Outstanding, have saved me from SAS's horrific clutches! ❤️
PaigeMiller
Diamond | Level 26

In addition to the above correct comments, you probably want to avoid using macros and macro variables when they are not necessary. Arrays contain all the features you need to do this, without macros.

 

DATA test1;
    SET test ;
    array keywords [6] $ 10 _temporary_ ('PHYSICAL', 'ACTIVITIES', 'GAME', 'DATA', 'CODING', 'MOBILE') ;
    array varnames [6] physcomp makeract gamedesign datasci coding mobiledev; 
    do i = 1 to 6 ; 
        if findw(topics, trim(keywords[i]))> 0 then varnames[i] = 1 ; else varnames[i]=0;   
    end ; 
    drop i;
RUN ;
--
Paige Miller
JackHamilton
Lapis Lazuli | Level 10
Unnecessarily putting code inside a macro is so common that there must be a flaw in how the macro language is taught.

If you had dozens or hundreds of keywords instead of half a dozen, it might be better to put the keywords and corresponding variable names into a SAS data set, and use a macro to generate the two ARRAY statements. DOSUBL might be even easier.

Also, you might want to look at the R or T modifier in the FINDW function as an alternative to using the TRIM function.
PaigeMiller
Diamond | Level 26

@JackHamilton said:

 

If you had dozens or hundreds of keywords instead of half a dozen, it might be better to put the keywords and corresponding variable names into a SAS data set, and use a macro to generate the two ARRAY statements. DOSUBL might be even easier.

 

Why do you say that?

 

Also, you might want to look at the R or T modifier in the FINDW function as an alternative to using the TRIM function.

 

I should have thought of that!

--
Paige Miller
JackHamilton
Lapis Lazuli | Level 10
Personally, I find it easier to maintain a long narrow list than a short wide list, and if the list is coming from an external source it's usually easier to read it in a data step than in a macro. Data-driven instead of manually maintained array statements.

Using DOSUBL lets me do the processing right where the list of values is needed, so I don't have to create global macro variables. Depending on how much parameterization is needed, it might or might not be better than a macro.
JackHamilton
Lapis Lazuli | Level 10
/* Create a SAS data set containing words to be looked for and  */
/* their associated variables.                                  */

data mylist;
    input @1 lookfor $32.
         @34 varname $32. ;
datalines4;
PHYSICAL                         physcomp
ACTIVITIES                       makeract
GAME                             gamedesign
DATA                             datasci 
CODING                           coding
MOBILE                           mobiledev 
;;;;

/*** Method 1 ***/
/* Create global macro variables containing expanded parts of array statements.  */
%global lookfor_list varname_list list_length;

proc sql noprint;
    select 
        quote(trim(lookfor), "'") ,
        trim(varname) 
    into
        :LOOKFOR_LIST separated by ' ' ,
        :VARNAME_LIST separated by ' ' 
    from 
        mylist;
    %let list_length = &SQLOBS. ;
quit;

%put INFO: &=LOOKFOR_LIST ;
%put INFO: &=VARNAME_LIST ;
%put INFO: &=LIST_LENGTH  ;

data _null_; 
    array keywords [&LIST_LENGTH.] $32 _temporary_ (&LOOKFOR_LIST.) ;
    array varnames [&LIST_LENGTH.] 4 &VARNAME_LIST. ; 
    put keywords[1]=;
run;

%symdel lookfor_list varname_list list_length;

/*** Method 2 ***/
/* create global macro variables containing complete array statements.  */

%macro make_code();

    %global lookfor_code varname_code;
    %local lookfor_list varname_list list_length;
    
    proc sql noprint;
        select 
            quote(trim(lookfor), "'") ,
            trim(varname) 
        into
            :LOOKFOR_LIST separated by ' ' ,
            :VARNAME_LIST separated by ' ' 
        from 
            mylist;
        %let list_length = &SQLOBS. ;
    quit;
        
    %let lookfor_code = array keywords [&LIST_LENGTH.] $32 _temporary_ (&LOOKFOR_LIST.) ;
    %let varname_code = array varnames [&LIST_LENGTH.] 4 &VARNAME_LIST. ; 
    
    %put INFO: &=LOOKFOR_CODE;
    %put INFO: &=VARNAME_CODE;

%mend make_code;
%make_code();

data _null_; 
    &LOOKFOR_CODE. ;
    &VARNAME_CODE. ; 
    put keywords[1]=;
run;

%symdel lookfor_code varname_code;    
   
   
/*** Method 3 ***/
/* Create a macro that inserts code generated from SAS data set,    */
/* without causing a section break.  In real life, this would be    */
/* parameter-driven.                                                */

%macro use_dosub();

    %local lookfor_code varname_code;
    %local lookfor_list varname_list list_length;

    %let dosubl_rc = %sysfunc(dosubl(%nrstr(    
        proc sql noprint;
            select 
                quote(trim(lookfor), "'") ,
                trim(varname) 
            into
                :LOOKFOR_LIST separated by ' ' ,
                :VARNAME_LIST separated by ' ' 
            from 
                mylist;
            %let list_length = &SQLOBS. ;
        quit;
            
        %let lookfor_code = array keywords [&LIST_LENGTH.] $32 _temporary_ (&LOOKFOR_LIST.) ;
        %let varname_code = array varnames [&LIST_LENGTH.] 4 &VARNAME_LIST. ; 
    )));
    %put INFO: RC from &=DOSUBL_RC;

    %put INFO: &=LOOKFOR_CODE;
    %put INFO: &=VARNAME_CODE;    
    
    &LOOKFOR_CODE;
    &VARNAME_CODE;

%mend use_dosub;


data _null_; 
    %use_dosub(); 
    put keywords[1]=;
run;

%put _user_;

This is using a sledgehammer for a penny nail, but sometimes the ability to create code inline without generating a section break is very useful. 

PaigeMiller
Diamond | Level 26

@JackHamilton wrote:
Personally, I find it easier to maintain a long narrow list than a short wide list, and if the list is coming from an external source it's usually easier to read it in a data step than in a macro. Data-driven instead of manually maintained array statements.

Good explanation! Thanks!

--
Paige Miller
Tom
Super User Tom
Super User

Here is example that remove macros.

First create some sample datasets.  One with the list of strings to search and a second with the list of words to find.

data test ;
  infile datalines truncover;
  row+1;
  input topics $256. ;
datalines;
PHYSICAL COMPUTING  BASIC COMPUTER TEACHING  PROJECTS USING WORD PROCESSING AND DATA ANALYZING  POWERPOINT
GAME DESIGN
WEB DESIGN AND DEVELOPMENT MOBILE APP DEVELOPMENT
PHYSICAL COMPUTING MAKER ACTIVITIES WEB DESIGN AND DEVELOPMENT TEXT-BASED PROGRAMMING
;

data keywords;
  key+1;
  input keyword $30.;
datalines;
PHYSICAL
ACTIVITIES
GAME
DATA
CODING
MOBILE
;

Then do a cross join to match every keyword to every string and then using FINDW() to check if the keyword is in the string.

If you want the wide format then use PROC TRANSPOSE to create it.  If the goal is the wide structure then make the join a VIEW.

proc sql ;
create view tall as 
  select a.*,b.*
       , 0<findw(topics,keyword,' ','sit') as found
  from test a
     , keywords b
  order by row,key
;
quit;

proc transpose data=tall out=want(drop=_name_) prefix=keyword;
  by row topics;
  id key;
  idlabel keyword;
  var found;
run;

Let's check the results:

proc print;
 var row key: ;
run;

proc print label;
 var row key: ;
run;

Screenshot 2022-01-26 190925.jpg

Tom
Super User Tom
Super User

I usually find that I need to train FINDW() to SIT for it be useful.

 

i or I

ignores the case of the characters.

s or S

adds space characters (blank, horizontal tab, vertical tab, carriage return, line feed, and form feed) to the list of characters.

t or T

trims trailing blanks from the string, word, and character arguments.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 2363 views
  • 12 likes
  • 7 in conversation