BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kendrew
Calcite | Level 5
Hi all,

I have been working on a task, and i need to find those variables that contain the words of a list.
Lets say

ITEM WORDS
1 AAA
2 BBB
3 CCC
4 DDD
5 EEE

COMPANY PROD1 PROD2 PROD3
C1 FAAAC FQQQ FPPP FZZZ
C2 FTTTW FUUU FPPP FLLL
C3 FOOOC FBBB FPPP FMMM
C4 FIIIIC FOOO FPPP FLLL

And i want to have the result like

COMPANY PROD1 PROD2 PROD3
C1 FAAAC FQQQ FPPP FZZZ
C3 FOOOC FBBB FPPP FMMM

as the product name in company C1 and C3 hit the words of the list.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data words;
input dummy WORDS $;
cards;
1 AAA
2 BBB
3 CCC
4 DDD
5 EEE
;
data have;
input (COMPANY PROD1 PROD2 PROD3) ($);
cards;
C1 FAAAC FQQQ FPPP FZZZ
C2 FTTTW FUUU FPPP FLLL
C3 FOOOC FBBB FPPP FMMM
C4 FIIIIC FOOO FPPP FLLL
;

options noquotelenmax;
proc sql noprint;
select words into :words separated by '|' from words;
quit;
data want;
 set have;
 if prxmatch("/&words/",catx('|',of prod:));
run;

proc print;run;

View solution in original post

12 REPLIES 12
Kendrew
Calcite | Level 5
And supposed that there are 50 words on the list
Ksharp
Super User
data words;
input dummy WORDS $;
cards;
1 AAA
2 BBB
3 CCC
4 DDD
5 EEE
;
data have;
input (COMPANY PROD1 PROD2 PROD3) ($);
cards;
C1 FAAAC FQQQ FPPP FZZZ
C2 FTTTW FUUU FPPP FLLL
C3 FOOOC FBBB FPPP FMMM
C4 FIIIIC FOOO FPPP FLLL
;

options noquotelenmax;
proc sql noprint;
select words into :words separated by '|' from words;
quit;
data want;
 set have;
 if prxmatch("/&words/",catx('|',of prod:));
run;

proc print;run;
andreas_lds
Jade | Level 19

Your company-dataset seems to consist of four variables, but the subsequent lines have five values. So please post the data you have in usable form.

 

ChrisNZ
Tourmaline | Level 20

Similar to @Ksharp 's but no macro variable.

data WORDS;
  input ITEM WORD $;
cards;  
1 AAA
2 BBB
3 CCC
4 DDD
5 EEE
run;

data COMPANIES;
  input COMPANY $ PROD1 $ PROD2 $ PROD3 $;
cards;  
C1 FAAAC FQQQ FPPP FZZZ
C2 FTTTW FUUU FPPP FLLL
C3 FOOOC FBBB FPPP FMMM
C4 FIIIIC FOOO FPPP FLLL
run;

data WANT;
  length WORDS $800;
  retain WORDS PRX;
  if _N_=1 then do;
    do I= 1 by 1 until(LASTOBS);
      set WORDS end=LASTOBS;
      WORDS=catx('|',WORDS,WORD);
    end;
    PRX=prxparse(catt('/',WORDS,'/'));
  end;
  set COMPANIES;
  if prxmatch(PRX,catx('|',of PROD1-PROD3));
  drop I WORD: ;
run;  

NOTE: The data set WORK.WANT has 2 observations and 6 variables.

 

 

novinosrin
Tourmaline | Level 20

Hi @Kendrew  yet another similar one alike others using FIND function

 



data words;
input dummy WORDS $;
cards;
1 AAA
2 BBB
3 CCC
4 DDD
5 EEE
;
data have;
input (COMPANY PROD1 PROD2 PROD3) ($);
cards;
C1 FAAAC FQQQ FPPP FZZZ
C2 FTTTW FUUU FPPP FLLL
C3 FOOOC FBBB FPPP FMMM
C4 FIIIIC FOOO FPPP FLLL
;

data want;
 if _n_=1 then do;
  if 0 then set words;
   dcl hash H (dataset: "words") ;
   h.definekey  ("words") ;
   h.definedone () ;
   dcl hiter hi('h');
 end;
 set have;
 length temp $500;
 temp=catx(' ',of prod1-prod3);
 do while(hi.next()=0);
  if find(temp,strip(words)) then _n_=0;
 end;
 if _n_=0;
 keep company prod1-prod3;
run;

 
 
COMPANY PROD1 PROD2 PROD3
C1 FAAAC FQQQ FPPP
C3 FOOOC FBBB FPPP

 

 

 

Patrick
Opal | Level 21

And yet another way for doing this creating a RegEx informat from your Words table.

/* source data */
data words;
  input dummy WORDS $;
  cards;
1 AAA
2 BBB
3 CCC
4 DDD
5 EEE
;

data have;
  input (COMPANY PROD1 PROD2 PROD3) ($);
  cards;
C1 FAAAC FQQQ FPPP FZZZ
C2 FTTTW FUUU FPPP FLLL
C3 FOOOC FBBB FPPP FMMM
C4 FIIIIC FOOO FPPP FLLL
;

/* create informat using regex */
data strings_fmt;
  retain fmtname 'match_strings';
  retain default 10;
  retain type 'I';
  retain HLO 'SP';
  length start $20;
  retain label 1;
  set words(keep=words) end=last;
  start=cats('/',words,'/i');
  output;
  if last then
    do;
      hlo='SO';
      label=0;
      output;
    end;
run;

proc format cntlin=strings_fmt;
run;


/* select rows with matching strings */
data want;
  set have;
  array _prod prod1-prod3;
  do over _prod;
    /* string found if informat returns 1 */
    if input(_prod,match_strings.) = 1 then
      do;
        output;
        leave;
      end;
  end;
run;

proc print;
run;

Patrick_0-1585137446452.png

 

novinosrin
Tourmaline | Level 20

data words;
input dummy WORDS $;
cards;
1 AAA
2 BBB
3 CCC
4 DDD
5 EEE
;
data have;
input (COMPANY PROD1 PROD2 PROD3) ($);
cards;
C1 FAAAC FQQQ FPPP FZZZ
C2 FTTTW FUUU FPPP FLLL
C3 FOOOC FBBB FPPP FMMM
C4 FIIIIC FOOO FPPP FLLL
;

proc sql;
 create table want as
 select a.*
 from have a, words
 where  index(catx(prod1,prod2,prod3),strip(words))>0;
quit;
COMPANY PROD1 PROD2 PROD3
C1 FAAAC FQQQ FPPP
C3 FOOOC FBBB FPPP
Patrick
Opal | Level 21

@novinosrin 

I believe your SQL needs below two amendments.

Patrick_0-1585139008201.png

 

novinosrin
Tourmaline | Level 20

Thank you sir @Patrick  though catx without delimiter in a SQL is kinda tolerant to work like cats 🙂

 

But like you pointed better to be safe than sorry, and with a tweak to handle multiple matches(if present)

 


proc sql;
 create table want as
 select distinct a.*
 from have a, words
 group by COMPANY,PROD1, PROD2, PROD3
 having max(index(catx('|',prod1,prod2,prod3),strip(words)));
quit;

 

 

Patrick
Opal | Level 21

@novinosrin wrote:

Thank you sir @Patrick  though catx without delimiter in a SQL is kinda tolerant to work like cats 🙂

@novinosrin 

You sure about this? Then can you explain below result?

data words;
  input dummy WORDS $;
  cards;
1 AAA
2 BBB
3 CCC
4 DDD
5 EEE
;

data have;
  input (COMPANY PROD1 PROD2 PROD3) ($);
  cards;
C1 FAAAC FBBB FPPP FZZZ
C2 DTTTA AADD FPPP FLLL
C3 FOOOC FBBB FPPP FMMM
C4 FIIIIC FOOO FPPP FLLL
;

proc sql;
/*  create table want as*/
  select 
    a.*,
    catx(prod1,prod2,prod3) as str1,
    catx('|',prod1,prod2,prod3) as str2
  from have a, words
  group by COMPANY,PROD1, PROD2, PROD3
  having max(index(catx(prod1,prod2,prod3),strip(words)));
quit;

 Patrick_0-1585139958085.png

 

novinosrin
Tourmaline | Level 20

Right I see. Sorry for my misconception/misunderstanding. I take back my words. My apologies. 

 

I sit corrected. Thank you!

Patrick
Opal | Level 21

It was a nice one. Made me realize that I can use a variable as delimiter. Can't think of a use case right now - but who knows.

data test;
  set sashelp.class;
  string=catx(sex,age,height,weight);
run;
proc print data=test;
run;

Patrick_0-1585140460823.png

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 2737 views
  • 6 likes
  • 6 in conversation