turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-23-2017 09:21 AM

I came accross this post since I am confronted with a similar issue. I can easily transscript this to alphanumerical as well as having column 2 of matrix two containing the values I like to know.

proc iml;

one = {a, c, e, h};

two = {a aa, b bb, c cc, d dd, e ee, f ff, g gg, h hh};

idx = loc (element(two, one));

v = two(idx);

do i = 1 to ncol(idx);

n = idx;

newmat = two[loc(one=v),2];

end;

print newmat;

quit;

However, my real word looks different:

- 'one' contains strings, e.g.:

one = {hat, cap, bottle, bottleholder};

- 'two' contains even longer strings and multiple matches, e.g.:

two = {'mouse with glasses' '223 567', 'house with bottleholder in black' '345 987', 'baseballcap in blue' '678 912', 'pink baseballcap' 345 123', 'cap from plant material' '678 123' };

Now I would like to find in 'two' all the "number strings" connected to the "word strings" which match the strings in 'one', including "baseballcap ..." such that the result would look something like

newmat_result = {'hat' '' '' '', 'cap' '678 912' '345 123' '678 123', bottle '' '' '', bottleholder '345 987' '' ''};

I worked with regular expressions before, but those were handwritten and not extracted from one matrix to match another matrix.

After days of searching and trying, I hope that I find somebody with an idea to solve this issue.

Thank you for any pointers!

Regards from Germany

Gerit

Accepted Solutions

Solution

06-30-2017
05:07 AM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MsGeritO

06-27-2017 09:00 AM

If you have many key value need to match.

Try SQL's cartesian product .

```
data k;
input k : $20.;
cards;
hat
cap
bottle
bottleholder
;
run;
data have;
input x $40.;
cards;
'mouse with glasses' '223 567'
'house with bottleholder in black' '345 987'
'baseballcap in blue' '678 912'
'pink baseballcap' '345 123'
'cap from plant material' '678 123'
;
run;
proc sql;
select *
from have,k
where prxmatch(cats('/\b',k,'\b/i'),x);
run;
```

All Replies

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MsGeritO

06-23-2017 02:51 PM

I don't understand your rules for what is a match, but the general idea is to loop over the search terms in the ONE array and find all the matching terms in the strings in TWO[,1]. You can use regular expression or the FIND function to find the match. Hopefully the following snippet will get you started:

```
proc iml;
one = {hat, cap, bottle, bottleholder};
two = {'mouse with glasses' '223 567', 'house with bottleholder in black' '345 987',
'baseballcap in blue' '678 912', 'pink baseballcap' '345 123',
'cap from plant material' '678 123' };
target = two[,1];
do i = 1 to nrow(one);
s = find(target, one[i], 'it');
idx = loc(s>0);
if ncol(idx)>0 then do;
numbers = two[idx,2];
print (one[i]) numbers;
end;
end;
```

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Rick_SAS

06-27-2017 01:05 AM - edited 06-27-2017 01:27 AM

Thank you for your answer. Now I tried hard to extract the results into one matrix / data set which should look something like "newmat_result". I tried with producing a matrix each loop and save it and the latest try was to create a data set and append the information gained. With such a data set (newmat_result) I could merge it with the original (one) which subsequently needs to be worked on in a different programm. For this purpose I added ID-numbers (1 to 4 in this case).

Here is my slow progress on which I would really appreciate further help:

proc iml;

one = {'1' hat, '2' cap, '3' bottle, '4' bottleholder};

two = {'mouse with glasses' '223 567', 'house with bottleholder in black' '345 987',

'baseballcap in blue' '678 912', 'pink baseballcap' '345 123',

'cap from plant material' '678 123' };

target = two[,1];

goal = one || J(nrow(one),10,'.'); *Better, if it would generate the number of columns on its own.;

*create result ; *Doesn't work;

do i = 1 to nrow(one);

s = find(target, one[i,2], 'it');

idx = loc(s>0);

if ncol(idx)>0 then do;

numbers = T(two[idx,2]);

numb = (one[i,2]) || numbers;

cdim = ncol(numb);

rest = ncol(goal) - 1 - cdim;

cols = numb || J(1,rest,'.');

print cdim rest cols;

end;

*append from cols;*Doesn't work;

end;

*close result;*Doesn't work;

*print result;*Doesn't work;

newmat_result = {'hat' '' '' '', 'cap' '678 912' '345 123' '678 123', 'bottle' '' '' '', 'bottleholder' '345 987' '' ''};

print newmat_result;

quit;

Thank you again.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MsGeritO

06-24-2017 09:28 AM - edited 06-24-2017 09:30 AM

It is better for Perl Regular Expression, no need of IML .

```
data have;
input x $40.;
found=prxmatch('/\b(hat|cap|bottle|bottleholder)\b/i',x);
cards;
'mouse with glasses' '223 567'
'house with bottleholder in black' '345 987'
'baseballcap in blue' '678 912'
'pink baseballcap' '345 123'
'cap from plant material' '678 123'
;
run;
```

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ksharp

06-27-2017 01:26 AM

Thank you for your answer. It looks fancy to me. However, I have issues how to transfer this to my two matrices / data sets. At the moment I have 2.700+ words in matrix "one" and roughly 50.000 strings in matrix "two". I am not capable of transfering your code to my task:

- read regular expressions from matrix one, search them in matrix two

- if found transfer the entry in column 2 of matrix two to the regular expression from matrix one; none or multiple entries are possible

- result would be like

data result;

input id $char2. term $ foundCode1 $char10. foundCode2 $char10. foundCode3 $char10.;

datalines;

1 hat

2 cap '678 912' '345 123' '678 123'

3 bottle

4 bottleholder '345 987'

;

run;

Is this possible?

Thank you again!

Solution

06-30-2017
05:07 AM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MsGeritO

06-27-2017 09:00 AM

If you have many key value need to match.

Try SQL's cartesian product .

```
data k;
input k : $20.;
cards;
hat
cap
bottle
bottleholder
;
run;
data have;
input x $40.;
cards;
'mouse with glasses' '223 567'
'house with bottleholder in black' '345 987'
'baseballcap in blue' '678 912'
'pink baseballcap' '345 123'
'cap from plant material' '678 123'
;
run;
proc sql;
select *
from have,k
where prxmatch(cats('/\b',k,'\b/i'),x);
run;
```

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ksharp

06-30-2017 05:11 AM

Thank your for the repeated help.

My final version builds on the following code. This gives me two lines per result. This has the nice spin-off that aside from the number code the text is included in the output which makes the following step of deleting non-relevant codes more easy.

data words;

input id $char2. term $20.;

cards;

1 hat

2 cap

3 bottle

4 bottleholder

;

run;

data strings;

length x $ 40;

infile datalines delimiter=',';

input x $ code $;

datalines;

mouse with glasses,223 567

house with bottleholder in black,345 987

baseballcap in blue,678 912

pink baseballcap,345 123

cap from plant material,678 123

;

run;

proc sql;

create table interim as

select *

from strings,words

where prxmatch(cats('/',strip(term),'/i'),x);

run;

*Added clean-up;

proc transpose data=interim out=tpinterim;

by id ;

var id code term x;

run;

data tpinterim;

merge tpinterim words;

by id;

run;

data result;

retain id term;

set tpinterim;

if _NAME_ = 'id' then delete;

if _NAME_ = 'term' then delete;

run;