BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Barney1998
Obsidian | Level 7

Hi all,

I need your help for the below problem.

I have 2 data sets.For example data1 and data2.The data1 data set has codes like :

data1:

code1

code2

code3

code4

code5

 

and the data 2 data set has combinations of the above codes like:

code2code5

code1code5code4

etc.

My question is exist some way to create one new column in data2 data set which it contains how many codes from the data1 appear in each row of data2?

Thank you in advance,

Burney 1998.

 

1 ACCEPTED SOLUTION

Accepted Solutions
SASJedi
SAS Super FREQ

First, I'll create some test data:

/* Create the test data */
data codes;
	drop _:;
	do _i=1 to 5;
		code=cats('code',_i);
		output;
	end;
run;

data have;
	call streaminit(12345);
	length id 8 text $100;
	/* Drops all variables with _ as the first character of the name */
	drop _:;
	do id=1 to 5;
		do _i=1 to rand('integer',1,4);
			Text=cats(Text,'code',rand('integer',1,5),'other random text');
		end;
		output;
		call missing(text);
	end;
run;

It looks like this:

CODES

code
code1
code2
code3
code4
code5

 

HAVE
id text
1 code5other random textcode3other random textcode5other random text
2 code2other random textcode4other random textcode2other random textcode3other random text
3 code3other random textcode5other random textcode2other random textcode5other random text
4 code2other random text
5 code4other random text

 

Now to do the work:

/* Find out how may codes there are, write number to macro variable */
proc sql noprint;
select count(*) into :dimension
   from codes
;
quit;

data want;
	drop _: code;
	/* Create a temporary array to hold al the code values */
	array lookup[&dimension] $100 _temporary_;
	/* Load the array with codes from your codes table */
   if _n_ = 1 then do;
      do _i=1 to &dimension;
         set codes end=EndOfLookup;
         lookup[_i]=code;
         end;
      end;
	/* read the data you want to test */
	set have2;
	/* reset the count for each row of input data */
	call missing (count);
	/* Add up the matches for each of the codes */
	do _i=1 to dim(lookup);
	   count+count(text,lookup[_i],'it');
   end;
run;

And the result:

id text count
1 code5other random textcode3other random textcode5other random text 3
2 code2other random textcode4other random textcode2other random textcode3other random text 4
3 code3other random textcode5other random textcode2other random textcode5other random text 4
4 code2other random text 1
5 code4other random text 1
Check out my Jedi SAS Tricks for SAS Users

View solution in original post

3 REPLIES 3
SASJedi
SAS Super FREQ

First, I'll create some test data:

/* Create the test data */
data codes;
	drop _:;
	do _i=1 to 5;
		code=cats('code',_i);
		output;
	end;
run;

data have;
	call streaminit(12345);
	length id 8 text $100;
	/* Drops all variables with _ as the first character of the name */
	drop _:;
	do id=1 to 5;
		do _i=1 to rand('integer',1,4);
			Text=cats(Text,'code',rand('integer',1,5),'other random text');
		end;
		output;
		call missing(text);
	end;
run;

It looks like this:

CODES

code
code1
code2
code3
code4
code5

 

HAVE
id text
1 code5other random textcode3other random textcode5other random text
2 code2other random textcode4other random textcode2other random textcode3other random text
3 code3other random textcode5other random textcode2other random textcode5other random text
4 code2other random text
5 code4other random text

 

Now to do the work:

/* Find out how may codes there are, write number to macro variable */
proc sql noprint;
select count(*) into :dimension
   from codes
;
quit;

data want;
	drop _: code;
	/* Create a temporary array to hold al the code values */
	array lookup[&dimension] $100 _temporary_;
	/* Load the array with codes from your codes table */
   if _n_ = 1 then do;
      do _i=1 to &dimension;
         set codes end=EndOfLookup;
         lookup[_i]=code;
         end;
      end;
	/* read the data you want to test */
	set have2;
	/* reset the count for each row of input data */
	call missing (count);
	/* Add up the matches for each of the codes */
	do _i=1 to dim(lookup);
	   count+count(text,lookup[_i],'it');
   end;
run;

And the result:

id text count
1 code5other random textcode3other random textcode5other random text 3
2 code2other random textcode4other random textcode2other random textcode3other random text 4
3 code3other random textcode5other random textcode2other random textcode5other random text 4
4 code2other random text 1
5 code4other random text 1
Check out my Jedi SAS Tricks for SAS Users
Tom
Super User Tom
Super User

We need names for the datasets and the variables to write code.

So lets call the first dataset LIST and the variable CODE.

And lets call the second dataset HAVE and the variable STRING.

And lets call the output dataset WANT and the new variable COUNT.

 

So just join the two and sum up the number of codes that appear at least once.

proc sql;
create table want as 
  select a.string
       , sum(0<index(a.string,b.code)) as count
  from have a
     , list b
  group by a.string
;
quit;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 378 views
  • 4 likes
  • 3 in conversation