Hi all,
I need your help for the below problem.
I have 2 data sets.For example data1 and data2.The data1 data set has codes like :
data1:
code1
code2
code3
code4
code5
and the data 2 data set has combinations of the above codes like:
code2code5
code1code5code4
etc.
My question is exist some way to create one new column in data2 data set which it contains how many codes from the data1 appear in each row of data2?
Thank you in advance,
Burney 1998.
First, I'll create some test data:
/* Create the test data */
data codes;
drop _:;
do _i=1 to 5;
code=cats('code',_i);
output;
end;
run;
data have;
call streaminit(12345);
length id 8 text $100;
/* Drops all variables with _ as the first character of the name */
drop _:;
do id=1 to 5;
do _i=1 to rand('integer',1,4);
Text=cats(Text,'code',rand('integer',1,5),'other random text');
end;
output;
call missing(text);
end;
run;
It looks like this:
CODES
code |
---|
code1 |
code2 |
code3 |
code4 |
code5 |
id | text |
---|---|
1 | code5other random textcode3other random textcode5other random text |
2 | code2other random textcode4other random textcode2other random textcode3other random text |
3 | code3other random textcode5other random textcode2other random textcode5other random text |
4 | code2other random text |
5 | code4other random text |
Now to do the work:
/* Find out how may codes there are, write number to macro variable */
proc sql noprint;
select count(*) into :dimension
from codes
;
quit;
data want;
drop _: code;
/* Create a temporary array to hold al the code values */
array lookup[&dimension] $100 _temporary_;
/* Load the array with codes from your codes table */
if _n_ = 1 then do;
do _i=1 to &dimension;
set codes end=EndOfLookup;
lookup[_i]=code;
end;
end;
/* read the data you want to test */
set have2;
/* reset the count for each row of input data */
call missing (count);
/* Add up the matches for each of the codes */
do _i=1 to dim(lookup);
count+count(text,lookup[_i],'it');
end;
run;
And the result:
id | text | count |
---|---|---|
1 | code5other random textcode3other random textcode5other random text | 3 |
2 | code2other random textcode4other random textcode2other random textcode3other random text | 4 |
3 | code3other random textcode5other random textcode2other random textcode5other random text | 4 |
4 | code2other random text | 1 |
5 | code4other random text | 1 |
First, I'll create some test data:
/* Create the test data */
data codes;
drop _:;
do _i=1 to 5;
code=cats('code',_i);
output;
end;
run;
data have;
call streaminit(12345);
length id 8 text $100;
/* Drops all variables with _ as the first character of the name */
drop _:;
do id=1 to 5;
do _i=1 to rand('integer',1,4);
Text=cats(Text,'code',rand('integer',1,5),'other random text');
end;
output;
call missing(text);
end;
run;
It looks like this:
CODES
code |
---|
code1 |
code2 |
code3 |
code4 |
code5 |
id | text |
---|---|
1 | code5other random textcode3other random textcode5other random text |
2 | code2other random textcode4other random textcode2other random textcode3other random text |
3 | code3other random textcode5other random textcode2other random textcode5other random text |
4 | code2other random text |
5 | code4other random text |
Now to do the work:
/* Find out how may codes there are, write number to macro variable */
proc sql noprint;
select count(*) into :dimension
from codes
;
quit;
data want;
drop _: code;
/* Create a temporary array to hold al the code values */
array lookup[&dimension] $100 _temporary_;
/* Load the array with codes from your codes table */
if _n_ = 1 then do;
do _i=1 to &dimension;
set codes end=EndOfLookup;
lookup[_i]=code;
end;
end;
/* read the data you want to test */
set have2;
/* reset the count for each row of input data */
call missing (count);
/* Add up the matches for each of the codes */
do _i=1 to dim(lookup);
count+count(text,lookup[_i],'it');
end;
run;
And the result:
id | text | count |
---|---|---|
1 | code5other random textcode3other random textcode5other random text | 3 |
2 | code2other random textcode4other random textcode2other random textcode3other random text | 4 |
3 | code3other random textcode5other random textcode2other random textcode5other random text | 4 |
4 | code2other random text | 1 |
5 | code4other random text | 1 |
We need names for the datasets and the variables to write code.
So lets call the first dataset LIST and the variable CODE.
And lets call the second dataset HAVE and the variable STRING.
And lets call the output dataset WANT and the new variable COUNT.
So just join the two and sum up the number of codes that appear at least once.
proc sql;
create table want as
select a.string
, sum(0<index(a.string,b.code)) as count
from have a
, list b
group by a.string
;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.