- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Using: SAS Studio 3.8, SAS 9
I want to check one column of each row of a dataset to see if it contains any words from a specified list. There will only be one match, and if there is no match then I want the entry to be indicated as missing. What I want to do is create a new column in the largedataset that will have the word that was matched to.
Input data:
list.csv: dog, cat, cow
largedataset.csv:
col1 col2 colToCheck
x x abcd/dog/123
x x zxy123cat
x x /123cowab
x x abcdog
Output data:
largedataset
col1 col2 colToCheck newCol
x x abcd/dog/123 dog
x x zxy123cat cat
x x /123cowab cow
x x abcdog dog
x x abcdefg .
I'm new to SAS and having trouble locating what I need through documentation and other forum answers.However, the "DO OVER" proc seems promising, but also seems to be unsupported?
data newdata;
set largedata;
do over list;
if find (colToCheck, list, 'i') then
newCol=list;
else /*else there is no match*/
newCol=.;
end;
run;quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
How long is the list - how many words ?
For a very short list you can use an array like in next code:
data want;
set have;
array lst 'white' 'black' 'yellow' ...;
flag='absent';
do i=1 to dim(lst);
if index(<col_name>, lst(i) then flag='exist';
end;
run;
For a long list you can use next code:
proc format lib=work
value $list
'white' = 'exist'
'black' = 'exist'
'yellow' = 'exist'
otherwise = 'absent'
; run;
data want;
set have;
flag = put(<col_name> , $list.);
run;
if you have the list in a dataset you can create the format from it by:
data cntl;
set list;
retain fmtname '$list';
start = <col_list>;
end = start;
type = 'C'; /* or maybe CHAR */
run;
proc format lib=work cntlin=cntl <noprint>;
run;
/***
next step as before, see above
***/
More complicate method is using hash method, which I'm not expert with it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The list contains 70+ words and could become larger over time. The main issue is that I need to be able to identify which word was matched to. E.g. if the observation is 'abcdogdef' then the new column should contain 'dog' (instead of 'exist').
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As the word may be a part of a string the format method will not be applicable;
The best way will be to use the hash method.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data list;
input key $;
cards;
dog
cat
cow
;
data have;
input col1 $ col2 $ Check : $20.;
cards;
x x abcd/dog/123
x x zxy123cat
x x /123cowab
x x abcdog
;
proc sql;
select *
from have as a left join list as b
on Check contains strip(key);
run;