SAS Programming

abhi309 · Posted 12-16-2024 07:20 PM

Hello All,

I have claims dataset and want to see most frequent Health Plan of the year for each patient who have health plans assigned for each month. In case if some one have two or more health plans with same frequency e.g. patient patient ID 3 in my example I would like to assign the latest health plan.

data have;

input id 1 hp_01 $ 3-5 hp_02 $ 7-9 hp_03 $ 11-13 hp_04 $ 15-17 hp_05 $ 19-21 hp_06 $ 23-25 hp_07 $ 27-29 hp_08 $ 31-33 hp_09 $ 35-37 hp_10 $ 39-41 hp_11 $ 43-45 hp_12 $ 47-49;

datalines;

1 MCO MCO MCO PPO PPO PPO PPO PPO PPO PPO PPO PPO

2 PDP PDP PDP PDP PDP PDP PDP PDP PDP PDP MCO MCO

3 EDP EDP EDP EDP EDP EDP PPO PPO PPO PPO PPO PPO

4 MCO MCO MCO MCO MCO MCO MCO MCO MCO EDP EDP EDP

5 EDP EDP EDP EDP MCO MCO PPO PPO PPO PPO EDP EDP

6 MCO NET NET NET EDP EDP NET NET MCO MCO MCO MCO

7 PPO PPO PPO NET NET NET NET NET PPO PPO MCO MCO

8 EDP MCO MCO MCO PPO PPO PPO PPO PPO NET NET NET

9 NET NET NET NET PPO PPO PPO PPO PPO PPO PPO PPO

;

Thank you for help in advance.

Patrick · Posted 12-16-2024 08:09 PM

Below one way that should work.

data have;
	input id 1 hp_01 $ 3-5 hp_02 $ 7-9 hp_03 $ 11-13 hp_04 $ 15-17 hp_05 $ 19-21 
		hp_06 $ 23-25 hp_07 $ 27-29 hp_08 $ 31-33 hp_09 $ 35-37 hp_10 $ 39-41 
		hp_11 $ 43-45 hp_12 $ 47-49;
	datalines;
1 MCO MCO MCO PPO PPO PPO PPO PPO PPO PPO PPO PPO  
2 PDP PDP PDP PDP PDP PDP PDP PDP PDP PDP MCO MCO
3 EDP EDP EDP EDP EDP EDP PPO PPO PPO PPO PPO PPO
4 MCO MCO MCO MCO MCO MCO MCO MCO MCO EDP EDP EDP
5 EDP EDP EDP EDP MCO MCO PPO PPO PPO PPO EDP EDP
6 MCO NET NET NET EDP EDP NET NET MCO MCO MCO MCO
7 PPO PPO PPO NET NET NET NET NET PPO PPO MCO MCO
8 EDP MCO MCO MCO PPO PPO PPO PPO PPO NET NET NET
9 NET NET NET NET PPO PPO PPO PPO PPO PPO PPO PPO
;

data long;
	set have;
	array hp {*} hp_01 - hp_12;
	do _i=1 to dim(hp);
		month_num=input(scan(vname(hp[_i]),-1,'_'),best32.);
		health_plan=hp[_i];
		output;
	end;
	keep id month_num health_plan;
run;

proc sql;
	create table counts as
	select
		id
		,health_plan
		,max(month_num) as max_month_num
		,count(*) as n_occurences
	from long
	group by id, health_plan
	order by id, n_occurences, max_month_num
	;
quit;

data want;
	set counts;
	by id;
	if last.id;
run;

proc print data=want;
run;

View solution in original post

Patrick · Posted 12-16-2024 08:09 PM

Below one way that should work.

data have;
	input id 1 hp_01 $ 3-5 hp_02 $ 7-9 hp_03 $ 11-13 hp_04 $ 15-17 hp_05 $ 19-21 
		hp_06 $ 23-25 hp_07 $ 27-29 hp_08 $ 31-33 hp_09 $ 35-37 hp_10 $ 39-41 
		hp_11 $ 43-45 hp_12 $ 47-49;
	datalines;
1 MCO MCO MCO PPO PPO PPO PPO PPO PPO PPO PPO PPO  
2 PDP PDP PDP PDP PDP PDP PDP PDP PDP PDP MCO MCO
3 EDP EDP EDP EDP EDP EDP PPO PPO PPO PPO PPO PPO
4 MCO MCO MCO MCO MCO MCO MCO MCO MCO EDP EDP EDP
5 EDP EDP EDP EDP MCO MCO PPO PPO PPO PPO EDP EDP
6 MCO NET NET NET EDP EDP NET NET MCO MCO MCO MCO
7 PPO PPO PPO NET NET NET NET NET PPO PPO MCO MCO
8 EDP MCO MCO MCO PPO PPO PPO PPO PPO NET NET NET
9 NET NET NET NET PPO PPO PPO PPO PPO PPO PPO PPO
;

data long;
	set have;
	array hp {*} hp_01 - hp_12;
	do _i=1 to dim(hp);
		month_num=input(scan(vname(hp[_i]),-1,'_'),best32.);
		health_plan=hp[_i];
		output;
	end;
	keep id month_num health_plan;
run;

proc sql;
	create table counts as
	select
		id
		,health_plan
		,max(month_num) as max_month_num
		,count(*) as n_occurences
	from long
	group by id, health_plan
	order by id, n_occurences, max_month_num
	;
quit;

data want;
	set counts;
	by id;
	if last.id;
run;

proc print data=want;
run;

abhi309 · Posted 12-17-2024 10:04 AM

Thank you @Patrick for the solution it worked. I was trying to transpose the data and doing the same but since the dataset was too big it was not working for me. The array method worked.

Tom · Posted 12-17-2024 02:53 PM

I never understand where this pattern of using BEST as if it was the name of a INFORMAT originated from. But I see it more and more often (probably just because it triggers a reaction). The BEST format gets its name from the idea that it will try to find the best way to display a number within the limited number of characters requested. But there is no corresponding "best" way to store a number. There is only one way to store a number in SAS. If you use BEST as the name of an informat then SAS just treats it as an alias for the normal numeric informat. So input(xx,best32.) just means input(xx,32.).

If you feel a need to use a "name" for the informat instead of just using 32. then why not use one of its aliases like F that does not imply that it is doing something that is impossible.

whymath · Posted 12-16-2024 09:44 PM

You can use arrays to manually calculate the mode:
1. Calculate the frequency of each variable value;
2. Compare the frequency of each variable value. When the frequency becomes larger, the variable value who have the larger frequency is the mode;

data want;
  set have;

  array _par_[12]hp_01-hp_12;
  array _cnt_[12]_temporary_;
  call missing(of _cnt_[*]);

  do i=1 to dim(_par_);
    do j=1 to dim(_par_);
      _cnt_[i]+(_par_[i]=_par_[j]);
    end;
    if frequency<=_cnt_[i] then do;
      frequency=_cnt_[i];
      most_frequent_value=_par_[i];
    end;
  end;
run;

Since you ask for the latest mode, I use "frequency<=_cnt_[i]" instead of "frequency<_cnt_[i]".

abhi309 · Posted 12-17-2024 10:05 AM

Thank you @whymath for the reply. I tried your code and it worked as well.

Ksharp · Posted 12-17-2024 01:14 AM

data have;
	input id 1 hp_01 $ 3-5 hp_02 $ 7-9 hp_03 $ 11-13 hp_04 $ 15-17 hp_05 $ 19-21 
		hp_06 $ 23-25 hp_07 $ 27-29 hp_08 $ 31-33 hp_09 $ 35-37 hp_10 $ 39-41 
		hp_11 $ 43-45 hp_12 $ 47-49;
	datalines;
1 MCO MCO MCO PPO PPO PPO PPO PPO PPO PPO PPO PPO  
2 PDP PDP PDP PDP PDP PDP PDP PDP PDP PDP MCO MCO
3 EDP EDP EDP EDP EDP EDP PPO PPO PPO PPO PPO PPO
4 MCO MCO MCO MCO MCO MCO MCO MCO MCO EDP EDP EDP
5 EDP EDP EDP EDP MCO MCO PPO PPO PPO PPO EDP EDP
6 MCO NET NET NET EDP EDP NET NET MCO MCO MCO MCO
7 PPO PPO PPO NET NET NET NET NET PPO PPO MCO MCO
8 EDP MCO MCO MCO PPO PPO PPO PPO PPO NET NET NET
9 NET NET NET NET PPO PPO PPO PPO PPO PPO PPO PPO
;
proc transpose data=have out=temp;
by id;
var hp_:;
run;
proc freq data=temp noprint;
table id*col1/out=temp2;
run;
proc sort data=temp2;
by id descending count;
run;
data temp3;
 set temp2;
 by id count notsorted;
 if first.id then n=0;
 n+first.count;
 if n=1;
run;
proc transpose data=temp3 out=temp4(drop=_name_ _label_) prefix=_;
by id;
var count;
id col1;
run;
data want;
 merge have temp4;
 by id;
 array x{*} hp_:;
 do i=dim(x) to 1 by -1;
  if not missing(input(vvaluex(cats('_',x{i})),best.)) then do;want=x{i}; leave;end;
 end;
 drop _: i;
run;

abhi309 · Posted 12-17-2024 10:07 AM

Thank you @Ksharp . Proc transpose didn't work for me as the dataset was too big and for some reason SAS was giving error that it ran out of resources but when I used array which was kind of doing the same worked for me.

Ksharp · Posted 12-17-2024 08:39 PM

OK. If you have a BIG table, try the following Hash Table:

data have;
	input id 1 hp_01 $ 3-5 hp_02 $ 7-9 hp_03 $ 11-13 hp_04 $ 15-17 hp_05 $ 19-21 
		hp_06 $ 23-25 hp_07 $ 27-29 hp_08 $ 31-33 hp_09 $ 35-37 hp_10 $ 39-41 
		hp_11 $ 43-45 hp_12 $ 47-49;
	datalines;
1 MCO MCO MCO PPO PPO PPO PPO PPO PPO PPO PPO PPO  
2 PDP PDP PDP PDP PDP PDP PDP PDP PDP PDP MCO MCO
3 EDP EDP EDP EDP EDP EDP PPO PPO PPO PPO PPO PPO
4 MCO MCO MCO MCO MCO MCO MCO MCO MCO EDP EDP EDP
5 EDP EDP EDP EDP MCO MCO PPO PPO PPO PPO EDP EDP
6 MCO NET NET NET EDP EDP NET NET MCO MCO MCO MCO
7 PPO PPO PPO NET NET NET NET NET PPO PPO MCO MCO
8 EDP MCO MCO MCO PPO PPO PPO PPO PPO NET NET NET
9 NET NET NET NET PPO PPO PPO PPO PPO PPO PPO PPO
;
data want;
if _n_=1 then do;
 length key $ 80;
 declare hash h();
 declare hiter hi('h');
 h.definekey('key');
 h.definedata('key','count','idx');
 h.definedone();
end;
set have;
h.clear();
array x{*} hp_01-hp_12;
do i=1 to dim(x);
if not missing(x{i}) then do;
 key=x{i};
 if h.find()=0 then count=count+1;
  else count=1;
 idx=i;
 h.replace();
end;
end;
do while(hi.next()=0);
 if want_count<count then do;want_key=key;want_count=count;want_idx=idx;end;
  else if want_count=count and want_idx<idx then do;want_key=key;want_idx=idx;end;
end;
drop i key count idx;
run;

mkeintz · Posted 12-21-2024 11:32 PM

You effectively want the most frequent word in a set of words, all of which are in the same observation. This can also be done using a single character variable containing that list of words:

data want (drop=_: i);
  set have;

  length _word_list $100  _word $3;

  _word_list=catx(' ',of hp_:);  

  do while (_word_list^=' ');
    _word=scan(_word_list,-1);
    _freq=0;
    do i=1 to countw(_word_list);
      if scan(_word_list,i)=_word then _freq=_freq+1;
    end;
    if _freq > mode_freq then do;
      mode_freq=_freq;
      mode=_word;
    end;
    _word_list=tranwrd(_word_list,trim(_word),' ');
  end;
run;

Note that the assignment of a value to _WORD using

    _word=scan(_word_list,-1);

continually takes the rightmost available value (i.e. the latest) instead of the leftmost. Then, if there is a tie in the frequency of two words, the subsequent statement

    if _freq > mode_freq then do;

guarantees that the most recent (rightmost) word with the tied frequency is assigned, per the request.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Programming

Getting most frequent value across columns

Re: Getting most frequent value across columns

Re: Getting most frequent value across columns

Re: Getting most frequent value across columns

Re: Getting most frequent value across columns

Re: Getting most frequent value across columns

Re: Getting most frequent value across columns

Re: Getting most frequent value across columns

Re: Getting most frequent value across columns

Re: Getting most frequent value across columns

Re: Getting most frequent value across columns

assign value to new column

How to replace a value from a column with the value of column from ano...

Using column value as condition

Select columns containing a specific value

Get back first non-null value for multiple columns grouped by some oth...

Follow Us

What is...

SAS Programming

Join us for our biggest event of the year!

SAS Training: Just a Click Away

Follow Us

What is...