SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
abhi309
Obsidian | Level 7

Hello All,

 

I have claims dataset and want to see most frequent Health Plan of the year for each patient who have health plans assigned for each month. In case if some one have two or more health plans with same frequency e.g. patient patient ID 3 in my example I would like to assign the latest health plan.

 

data have;

input id 1 hp_01 $ 3-5 hp_02 $ 7-9 hp_03 $ 11-13 hp_04 $ 15-17 hp_05 $ 19-21 hp_06 $ 23-25 hp_07 $ 27-29 hp_08 $ 31-33 hp_09 $ 35-37 hp_10 $ 39-41 hp_11 $ 43-45 hp_12 $ 47-49;

datalines;

1 MCO MCO MCO PPO PPO PPO PPO PPO PPO PPO PPO PPO  

2 PDP PDP PDP PDP PDP PDP PDP PDP PDP PDP MCO MCO

3 EDP EDP EDP EDP EDP EDP PPO PPO PPO PPO PPO PPO

4 MCO MCO MCO MCO MCO MCO MCO MCO MCO EDP EDP EDP

5 EDP EDP EDP EDP MCO MCO PPO PPO PPO PPO EDP EDP

6 MCO NET NET NET EDP EDP NET NET MCO MCO MCO MCO

7 PPO PPO PPO NET NET NET NET NET PPO PPO MCO MCO

8 EDP MCO MCO MCO PPO PPO PPO PPO PPO NET NET NET

9 NET NET NET NET PPO PPO PPO PPO PPO PPO PPO PPO

;

 

Thank you for help in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Below one way that should work.

data have;
	input id 1 hp_01 $ 3-5 hp_02 $ 7-9 hp_03 $ 11-13 hp_04 $ 15-17 hp_05 $ 19-21 
		hp_06 $ 23-25 hp_07 $ 27-29 hp_08 $ 31-33 hp_09 $ 35-37 hp_10 $ 39-41 
		hp_11 $ 43-45 hp_12 $ 47-49;
	datalines;
1 MCO MCO MCO PPO PPO PPO PPO PPO PPO PPO PPO PPO  
2 PDP PDP PDP PDP PDP PDP PDP PDP PDP PDP MCO MCO
3 EDP EDP EDP EDP EDP EDP PPO PPO PPO PPO PPO PPO
4 MCO MCO MCO MCO MCO MCO MCO MCO MCO EDP EDP EDP
5 EDP EDP EDP EDP MCO MCO PPO PPO PPO PPO EDP EDP
6 MCO NET NET NET EDP EDP NET NET MCO MCO MCO MCO
7 PPO PPO PPO NET NET NET NET NET PPO PPO MCO MCO
8 EDP MCO MCO MCO PPO PPO PPO PPO PPO NET NET NET
9 NET NET NET NET PPO PPO PPO PPO PPO PPO PPO PPO
;

data long;
	set have;
	array hp {*} hp_01 - hp_12;
	do _i=1 to dim(hp);
		month_num=input(scan(vname(hp[_i]),-1,'_'),best32.);
		health_plan=hp[_i];
		output;
	end;
	keep id month_num health_plan;
run;

proc sql;
	create table counts as
	select
		id
		,health_plan
		,max(month_num) as max_month_num
		,count(*) as n_occurences
	from long
	group by id, health_plan
	order by id, n_occurences, max_month_num
	;
quit;

data want;
	set counts;
	by id;
	if last.id;
run;

proc print data=want;
run;

View solution in original post

9 REPLIES 9
Patrick
Opal | Level 21

Below one way that should work.

data have;
	input id 1 hp_01 $ 3-5 hp_02 $ 7-9 hp_03 $ 11-13 hp_04 $ 15-17 hp_05 $ 19-21 
		hp_06 $ 23-25 hp_07 $ 27-29 hp_08 $ 31-33 hp_09 $ 35-37 hp_10 $ 39-41 
		hp_11 $ 43-45 hp_12 $ 47-49;
	datalines;
1 MCO MCO MCO PPO PPO PPO PPO PPO PPO PPO PPO PPO  
2 PDP PDP PDP PDP PDP PDP PDP PDP PDP PDP MCO MCO
3 EDP EDP EDP EDP EDP EDP PPO PPO PPO PPO PPO PPO
4 MCO MCO MCO MCO MCO MCO MCO MCO MCO EDP EDP EDP
5 EDP EDP EDP EDP MCO MCO PPO PPO PPO PPO EDP EDP
6 MCO NET NET NET EDP EDP NET NET MCO MCO MCO MCO
7 PPO PPO PPO NET NET NET NET NET PPO PPO MCO MCO
8 EDP MCO MCO MCO PPO PPO PPO PPO PPO NET NET NET
9 NET NET NET NET PPO PPO PPO PPO PPO PPO PPO PPO
;

data long;
	set have;
	array hp {*} hp_01 - hp_12;
	do _i=1 to dim(hp);
		month_num=input(scan(vname(hp[_i]),-1,'_'),best32.);
		health_plan=hp[_i];
		output;
	end;
	keep id month_num health_plan;
run;

proc sql;
	create table counts as
	select
		id
		,health_plan
		,max(month_num) as max_month_num
		,count(*) as n_occurences
	from long
	group by id, health_plan
	order by id, n_occurences, max_month_num
	;
quit;

data want;
	set counts;
	by id;
	if last.id;
run;

proc print data=want;
run;
abhi309
Obsidian | Level 7

Thank you @Patrick  for the solution it worked. I was trying to transpose the data and doing the same but since the dataset was too big it was not working for me. The array method worked.

Tom
Super User Tom
Super User

I never understand where this pattern of using BEST as if it was the name of a INFORMAT originated from.  But I see it more and more often (probably just because it triggers a reaction).  The BEST format gets its name from the idea that it will try to find the best way to display a number within the limited number of characters requested.  But there is no corresponding "best" way to store a number.  There is only one way to store a number in SAS.  If you use BEST as the name of an informat then SAS just treats it as an alias for the normal numeric informat.  So input(xx,best32.) just means input(xx,32.).

 

If you feel a need to use a "name" for the informat instead of just using 32. then why not use one of its aliases like F that does not imply that it is doing something that is impossible.

whymath
Lapis Lazuli | Level 10

You can use arrays to manually calculate the mode:
1. Calculate the frequency of each variable value;
2. Compare the frequency of each variable value. When the frequency becomes larger, the variable value who have the larger frequency is the mode;

data want;
  set have;

  array _par_[12]hp_01-hp_12;
  array _cnt_[12]_temporary_;
  call missing(of _cnt_[*]);

  do i=1 to dim(_par_);
    do j=1 to dim(_par_);
      _cnt_[i]+(_par_[i]=_par_[j]);
    end;
    if frequency<=_cnt_[i] then do;
      frequency=_cnt_[i];
      most_frequent_value=_par_[i];
    end;
  end;
run;

Since you ask for the latest mode, I use "frequency<=_cnt_[i]" instead of "frequency<_cnt_[i]".

abhi309
Obsidian | Level 7

Thank you @whymath for the reply. I tried your code and it worked as well.

Ksharp
Super User
data have;
	input id 1 hp_01 $ 3-5 hp_02 $ 7-9 hp_03 $ 11-13 hp_04 $ 15-17 hp_05 $ 19-21 
		hp_06 $ 23-25 hp_07 $ 27-29 hp_08 $ 31-33 hp_09 $ 35-37 hp_10 $ 39-41 
		hp_11 $ 43-45 hp_12 $ 47-49;
	datalines;
1 MCO MCO MCO PPO PPO PPO PPO PPO PPO PPO PPO PPO  
2 PDP PDP PDP PDP PDP PDP PDP PDP PDP PDP MCO MCO
3 EDP EDP EDP EDP EDP EDP PPO PPO PPO PPO PPO PPO
4 MCO MCO MCO MCO MCO MCO MCO MCO MCO EDP EDP EDP
5 EDP EDP EDP EDP MCO MCO PPO PPO PPO PPO EDP EDP
6 MCO NET NET NET EDP EDP NET NET MCO MCO MCO MCO
7 PPO PPO PPO NET NET NET NET NET PPO PPO MCO MCO
8 EDP MCO MCO MCO PPO PPO PPO PPO PPO NET NET NET
9 NET NET NET NET PPO PPO PPO PPO PPO PPO PPO PPO
;
proc transpose data=have out=temp;
by id;
var hp_:;
run;
proc freq data=temp noprint;
table id*col1/out=temp2;
run;
proc sort data=temp2;
by id descending count;
run;
data temp3;
 set temp2;
 by id count notsorted;
 if first.id then n=0;
 n+first.count;
 if n=1;
run;
proc transpose data=temp3 out=temp4(drop=_name_ _label_) prefix=_;
by id;
var count;
id col1;
run;
data want;
 merge have temp4;
 by id;
 array x{*} hp_:;
 do i=dim(x) to 1 by -1;
  if not missing(input(vvaluex(cats('_',x{i})),best.)) then do;want=x{i}; leave;end;
 end;
 drop _: i;
run;
abhi309
Obsidian | Level 7

Thank you @Ksharp . Proc transpose didn't work for me as the dataset was too big and for some reason SAS was giving error that it ran out of resources but when I used array which was kind of doing the same worked for me.

 

 

Ksharp
Super User

OK. If you have a BIG table, try the following Hash Table:

 

data have;
	input id 1 hp_01 $ 3-5 hp_02 $ 7-9 hp_03 $ 11-13 hp_04 $ 15-17 hp_05 $ 19-21 
		hp_06 $ 23-25 hp_07 $ 27-29 hp_08 $ 31-33 hp_09 $ 35-37 hp_10 $ 39-41 
		hp_11 $ 43-45 hp_12 $ 47-49;
	datalines;
1 MCO MCO MCO PPO PPO PPO PPO PPO PPO PPO PPO PPO  
2 PDP PDP PDP PDP PDP PDP PDP PDP PDP PDP MCO MCO
3 EDP EDP EDP EDP EDP EDP PPO PPO PPO PPO PPO PPO
4 MCO MCO MCO MCO MCO MCO MCO MCO MCO EDP EDP EDP
5 EDP EDP EDP EDP MCO MCO PPO PPO PPO PPO EDP EDP
6 MCO NET NET NET EDP EDP NET NET MCO MCO MCO MCO
7 PPO PPO PPO NET NET NET NET NET PPO PPO MCO MCO
8 EDP MCO MCO MCO PPO PPO PPO PPO PPO NET NET NET
9 NET NET NET NET PPO PPO PPO PPO PPO PPO PPO PPO
;
data want;
if _n_=1 then do;
 length key $ 80;
 declare hash h();
 declare hiter hi('h');
 h.definekey('key');
 h.definedata('key','count','idx');
 h.definedone();
end;
set have;
h.clear();
array x{*} hp_01-hp_12;
do i=1 to dim(x);
if not missing(x{i}) then do;
 key=x{i};
 if h.find()=0 then count=count+1;
  else count=1;
 idx=i;
 h.replace();
end;
end;
do while(hi.next()=0);
 if want_count<count then do;want_key=key;want_count=count;want_idx=idx;end;
  else if want_count=count and want_idx<idx then do;want_key=key;want_idx=idx;end;
end;
drop i key count idx;
run;
mkeintz
PROC Star

You effectively want the most frequent word in a set of words, all of which are in the same observation.  This can also be done using a single character variable containing that list of words:

 

data want (drop=_: i);
  set have;

  length _word_list $100  _word $3;

  _word_list=catx(' ',of hp_:);  

  do while (_word_list^=' ');
    _word=scan(_word_list,-1);
    _freq=0;
    do i=1 to countw(_word_list);
      if scan(_word_list,i)=_word then _freq=_freq+1;
    end;
    if _freq > mode_freq then do;
      mode_freq=_freq;
      mode=_word;
    end;
    _word_list=tranwrd(_word_list,trim(_word),' ');
  end;
run;

 Note that the assignment of a value to _WORD using

    _word=scan(_word_list,-1);

continually takes the rightmost available value (i.e. the latest) instead of the leftmost.  Then, if there is a tie in the frequency of two words, the subsequent  statement

    if _freq > mode_freq then do;

guarantees that the most recent (rightmost) word with the tied frequency is assigned, per the request.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-white.png

Join us for our biggest event of the year!

Four days of inspiring keynotes, product reveals, hands-on learning opportunities, deep-dive demos, and peer-led breakouts. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1197 views
  • 3 likes
  • 6 in conversation