Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Counting the number of times values are together

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 17
Accepted Solution

Counting the number of times values are together

HI All,

 

I'm trying to count the number of times that one code is claimed at the same time another code is claimed. The original dataset is like this:

 

 

group1	code1
group1	code2
group1	code3
group2	code2
group2	code3
group3	code1
group3	code2
group3	code3
group3	code4
group3	code5

I've transposed it (for another purpose) like this:

 

 

 

group1	code1	code2	code3		
group2	code2	code3			
group3	code1	code2	code3	code4	code5

 

What I am trying to acheive is results like this:

 

first codesecond codecount claimed together
code1code22
code1code31
code1code41
code1code51
code2code12
code2code33
code2code42
code2code51
code3code12
code3code23
code3code41
code3code51

 

I know that the dataset will be large since I have more than 8000 unique codes but I am trying to figure out a way to be able to know what codes are billed together the most.

 

I appreciate any input.

 

Thanks! 


Accepted Solutions
Solution
‎08-08-2016 02:01 PM
Super User
Posts: 5,085

Re: Counting the number of times values are together

The transposed data will be more useful here.  For illustration purposes, I'm assuming that you have no more than 8 codes per group but the program easily expands if you have more than that.

 

data pairs;

set transposed_data;

array codes {8} code1-code8;

do _m_ = 1 to 7;

   do _n_=_m_+1 to 8;

      if codes{_m_} > ' ' and codes{_n_} > ' ' then do;

         if codes{_m_} < codes{_n_} then code_pair = catx(' ', codes{_m_}, codes{_n_});

         else code_pair = catx(' ', codes{_n_}, codes{_m_});

         output;

      end;

   end;

end;

run;

 

proc freq data=pairs;

   tables code_pair  / out=pair_counts;

run;

 

The output data set is optional, but might come in handy.

View solution in original post


All Replies
Solution
‎08-08-2016 02:01 PM
Super User
Posts: 5,085

Re: Counting the number of times values are together

The transposed data will be more useful here.  For illustration purposes, I'm assuming that you have no more than 8 codes per group but the program easily expands if you have more than that.

 

data pairs;

set transposed_data;

array codes {8} code1-code8;

do _m_ = 1 to 7;

   do _n_=_m_+1 to 8;

      if codes{_m_} > ' ' and codes{_n_} > ' ' then do;

         if codes{_m_} < codes{_n_} then code_pair = catx(' ', codes{_m_}, codes{_n_});

         else code_pair = catx(' ', codes{_n_}, codes{_m_});

         output;

      end;

   end;

end;

run;

 

proc freq data=pairs;

   tables code_pair  / out=pair_counts;

run;

 

The output data set is optional, but might come in handy.

Occasional Contributor
Posts: 17

Re: Counting the number of times values are together

Thank you for your help! I was able to process 300m+ observations in under 20 minutes. Impressive.

 

I have another dataset that's further aggregated (20m observations). Rather than counting each occurence, could I sum the count?

 

GroupCountclaim1claim2claim3claim4claim5claim6claim7claim8
group146code1       
group21code1code2code3code4code5   
group31code1code3code4     
group41code1code3code4code5code6code7  
group56code1code2code3     
group625code1code4code5     

 

 

Super User
Posts: 5,085

Re: Counting the number of times values are together

Not quite sure what you're asking for here, but it's possible to do most anything.  Do you want this:

 

code1  80  (sum of 46, 1, 1, 1, 6, 25)

 

code2  7  (sum of 6, 1)

 

code3  9  (sum of 1, 1, 1, 6)

 

Are you now processing individual codes instead of pairs?  (If you're still processing pairs, how do you handle the first line that only has one code in it?)

Occasional Contributor
Posts: 17

Re: Counting the number of times values are together

Sorry I wasn't clear. I'm still looking to get the same information but rather than counting the occurrences of the pairs, is it possible to maintain a running total of the count for each pair. The output would be something like this:

 

PairSumCount
code1 code2773
Super User
Posts: 5,085

Re: Counting the number of times values are together

I'm not seeing where the numeric values of 77 and 3 come from here.

 

But to answer your question in general, I think it would be necessary to use a hash table to make this happen ... not my area of strength.  You code certainly could define a hash table that is based on two keys (the minimum code in a pair and the maximum code in a pair), and maintain two data fields based on those two keys as you move through the data.  But someone else would have to program that for you ... it's a medium complexity task for someone who knows hash tables.

Respected Advisor
Posts: 3,124

Re: Counting the number of times values are together

Some SQL plus some data step should give what you need:

 

data have;
	infile cards expandtabs;
	input group$	code$;
	cards;
group1 code1
group1	code2
group1	code3
group2	code2
group2	code3
group3	code1
group3	code2
group3	code3
group3	code4
group3	code5
;

/*generate pairs combination within the same group*/
proc sql;
	create table _step1 as
		select a.*, b.code as _code
			from have a
				left join have b
					on a.group=b.group and a.code ne b.code
	;
quit;

/*sort the pair so for dedup purpose*/
data _step2;
	set _step1;
	array cd code _code;
	call sortc(of cd(*));
	_code_group = catx('-',code,_code);
run;

/*count the group number by unique pair, this is the essentially the count you need*/
proc sql;
	create table want  as 
		select distinct  code as first_code, _code as second_code, count (distinct group) as count from _step2 
			group by _code_group
	;
quit;
Super User
Posts: 9,681

Re: Counting the number of times values are together

data have;
	infile cards expandtabs;
	input group$	code$;
	cards;
group1 code1
group1	code2
group1	code3
group2	code2
group2	code3
group3	code1
group3	code2
group3	code3
group3	code4
group3	code5
;
run;
proc transpose data=have out=temp(drop=_:);
 by group;
 var code;
run;
data temp1;
 set temp;
 array x{*} col:;
 do i=1 to dim(x)-1;
  do j=i+1 to dim(x);
   if not missing(x{i}) and not missing(x{j}) then do;
    first=x{i};second=x{j};output;
    first=x{j};second=x{i};output;
   end;
  end;
 end;
 drop i j;
run;
proc freq data=temp1 noprint;
 table first*second/out=want list;
run;


Occasional Contributor
Posts: 17

Re: Counting the number of times values are together

Thank you for your help. Is there a method to limit the variables in the transpose step? say to 25? I know that I could clean up my data would wouldn't require the limit. I'm just curious. Thanks again!
Super User
Posts: 9,681

Re: Counting the number of times values are together

"o limit the variables in the transpose step? "
I don't understand what you mean.
You want control the output variable ?

proc transpose data=have out=temp(keep=id col1-col25)


Occasional Contributor
Posts: 17

Re: Counting the number of times values are together

Sorry. I was wondering if it was possible to limit the number of observations per group. Maybe by putting in a counter to move to the next group after 25 variables are read. If I drop anything past variable 25, the observations will still be read which would increase processing times.

 

If it's not an option to add a counter by group then it's probably best that I clean my initial dataset first.

 

Thanks again for your input!

Super User
Posts: 9,681

Re: Counting the number of times values are together

Make a count variable:

data want;
 set have;
 by id;
 if first.id then count=0;
 count+1;
 if count le 25;
run;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 795 views
  • 3 likes
  • 4 in conversation