BookmarkSubscribeRSS Feed
chcody
SAS Employee

Using: SAS Studio 3.8, SAS 9

 

I want to check one column of each row of a dataset to see if it contains any words from a specified list. There will only be one match, and if there is no match then I want the entry to be indicated as missing. What I want to do is create a new column in the largedataset that will have the word that was matched to. 

 

Input data: 

 

list.csv: dog, cat, cow

 

largedataset.csv: 

col1 col2 colToCheck

x       x     abcd/dog/123

x       x     zxy123cat

x       x     /123cowab

x       x     abcdog

 

Output data: 

 

largedataset

col1 col2 colToCheck      newCol

x       x     abcd/dog/123   dog

x       x     zxy123cat         cat

x       x     /123cowab        cow

x       x     abcdog             dog

x       x     abcdefg             . 

 

I'm new to SAS and having trouble locating what I need through documentation and other forum answers.However, the "DO OVER" proc seems promising, but also seems to be unsupported?

 

 

data newdata;
	set largedata;
		do over list;
		if find (colToCheck, list, 'i') then
			newCol=list;
		else   /*else there is no match*/
			newCol=.;
		end;
run;quit;

 

 

4 REPLIES 4
Shmuel
Garnet | Level 18

 How long is the list - how many words ?

 

For a very short list you can use an array like in next code:

data want;
  set have;
        array lst 'white' 'black' 'yellow' ...;

       flag='absent';
       do i=1 to dim(lst);
            if index(<col_name>, lst(i) then flag='exist';
       end;
run; 

For a long list you can use next code:

 

proc format lib=work
   value $list
      'white' = 'exist'
      'black' = 'exist'
      'yellow' = 'exist'
     otherwise = 'absent'
; run;

data want;
 set have;
       flag = put(<col_name> , $list.);
run;

if you have the list in a dataset you can create the format from it by:

data cntl;
  set list;
       retain fmtname '$list';
       start = <col_list>;
       end = start;
       type = 'C';   /* or maybe CHAR */
run;

proc format lib=work cntlin=cntl <noprint>;
run;

/***
  next step as before, see above 
***/
       

More complicate method is using hash method, which I'm not expert with it.

chcody
SAS Employee

The list contains 70+ words and could become larger over time. The main issue is that I need to be able to identify which word was matched to.  E.g. if the observation is 'abcdogdef' then the new column should contain 'dog' (instead of 'exist'). 

Shmuel
Garnet | Level 18

As the word may be a part of a string the format method will not be applicable;

 

The best way will be to use the hash method.

Ksharp
Super User
data list;
input key $;
cards;
dog 
cat 
cow
;

data have;
input col1 $ col2 $ Check : $20.;
cards;
x       x     abcd/dog/123
x       x     zxy123cat
x       x     /123cowab
x       x     abcdog
;

proc sql;
select *
 from have as a left join list as b
  on Check contains strip(key);
run;
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1491 views
  • 0 likes
  • 3 in conversation