BookmarkSubscribeRSS Feed
chcody
SAS Employee

Using: SAS Studio 3.8, SAS 9

 

I want to check one column of each row of a dataset to see if it contains any words from a specified list. There will only be one match, and if there is no match then I want the entry to be indicated as missing. What I want to do is create a new column in the largedataset that will have the word that was matched to. 

 

Input data: 

 

list.csv: dog, cat, cow

 

largedataset.csv: 

col1 col2 colToCheck

x       x     abcd/dog/123

x       x     zxy123cat

x       x     /123cowab

x       x     abcdog

 

Output data: 

 

largedataset

col1 col2 colToCheck      newCol

x       x     abcd/dog/123   dog

x       x     zxy123cat         cat

x       x     /123cowab        cow

x       x     abcdog             dog

x       x     abcdefg             . 

 

I'm new to SAS and having trouble locating what I need through documentation and other forum answers.However, the "DO OVER" proc seems promising, but also seems to be unsupported?

 

 

data newdata;
	set largedata;
		do over list;
		if find (colToCheck, list, 'i') then
			newCol=list;
		else   /*else there is no match*/
			newCol=.;
		end;
run;quit;

 

 

4 REPLIES 4
Shmuel
Garnet | Level 18

 How long is the list - how many words ?

 

For a very short list you can use an array like in next code:

data want;
  set have;
        array lst 'white' 'black' 'yellow' ...;

       flag='absent';
       do i=1 to dim(lst);
            if index(<col_name>, lst(i) then flag='exist';
       end;
run; 

For a long list you can use next code:

 

proc format lib=work
   value $list
      'white' = 'exist'
      'black' = 'exist'
      'yellow' = 'exist'
     otherwise = 'absent'
; run;

data want;
 set have;
       flag = put(<col_name> , $list.);
run;

if you have the list in a dataset you can create the format from it by:

data cntl;
  set list;
       retain fmtname '$list';
       start = <col_list>;
       end = start;
       type = 'C';   /* or maybe CHAR */
run;

proc format lib=work cntlin=cntl <noprint>;
run;

/***
  next step as before, see above 
***/
       

More complicate method is using hash method, which I'm not expert with it.

chcody
SAS Employee

The list contains 70+ words and could become larger over time. The main issue is that I need to be able to identify which word was matched to.  E.g. if the observation is 'abcdogdef' then the new column should contain 'dog' (instead of 'exist'). 

Shmuel
Garnet | Level 18

As the word may be a part of a string the format method will not be applicable;

 

The best way will be to use the hash method.

Ksharp
Super User
data list;
input key $;
cards;
dog 
cat 
cow
;

data have;
input col1 $ col2 $ Check : $20.;
cards;
x       x     abcd/dog/123
x       x     zxy123cat
x       x     /123cowab
x       x     abcdog
;

proc sql;
select *
 from have as a left join list as b
  on Check contains strip(key);
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 666 views
  • 0 likes
  • 3 in conversation