DATA Step, Macro, Functions and more

Summarizing data-membership numbers etc

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 87
Accepted Solution

Summarizing data-membership numbers etc

Merry Xmas to all you talented people! I have the data in following format below-

 

I'm looking to extract following information-

1. Number of members each year who have claims, number of those claims and claim amounts (I solved this part by using simple sas code, but had to do for each year seperately and then merge all six datasets-im sure there is more efficient way to do it)

 

2. Member progression each year- new distinct members each year who enter and exit the system- many members have claims in multiple years, but i want to find out how many new members enter the system each year or exit the system permanently, list of those members by gender, county type etc. The tricky part here is that member could have claim in 2011 and then again in 2014 and 2016, so those members should not be counted as new members, but still be identified.

 

 

Person ID Unencrypted Gender Date of Birth County Claims Paid_2011 Payment Amount Prof & Fac_2011 Claims Paid_2012 Payment Amount Prof & Fac_2012 Claims Paid_2013 Payment Amount Prof & Fac_2013 Claims Paid_2014 Payment Amount Prof & Fac_2014 Claims Paid_2015 Payment Amount Prof & Fac_2015 Claims Paid_2016 Payment Amount Prof & Fac_2016
0001 Female 09/13/1991 Aroostook     1 $57.50                
0002 Female 10/12/1959 Aroostook 6 $141.31 17 $169.56 18 $632.27 11 $355.29 10 $367.13 6 $298.35
0003 Female 05/22/1960 Aroostook             3 $0.00 7 $92.40 3 $28.57
0004 Male 12/07/1965 Aroostook             1 $71.52        
0005 Male 10/23/1964 Somerset     6 $689.52                
0006 Male 05/04/1942 Cumberland     1 $48.44 5 $44.04 4 $0.00 1 $71.70 3 $71.29

Accepted Solutions
Solution
‎12-27-2016 11:42 AM
Super User
Posts: 17,912

Re: Summarizing data-membership numbers etc

Merry Christmas

 

Here's a solution that tested. It's based on the idea of transposing your data to a more workable format first - a long format, and adding indicator variables for first and last years. You don't mention how the lastYear is identified, as I assume it has to be more than the last record.  I'll leave that logic up to you. 

 

*Read in sample data;

data WORK.TEST;
	infile datalines dsd truncover;
	input Person_ID_Unencrypted:$9. Gender:$8. Date_of_Birth:$10. County:$14. 
		Claims_Paid_2011:32. Payment_Amount_Prof___Fac_2011:DOLLAR21.2 
		Claims_Paid_2012:32. Payment_Amount_Prof___Fac_2012:DOLLAR21.2 
		Claims_Paid_2013:32. Payment_Amount_Prof___Fac_2013:DOLLAR21.2 
		Claims_Paid_2014:32. Payment_Amount_Prof___Fac_2014:DOLLAR21.2 
		Claims_Paid_2015:32. Payment_Amount_Prof___Fac_2015:DOLLAR21.2 
		Claims_Paid_2016:32. Payment_Amount_Prof___Fac_2016:DOLLAR21.2;
	datalines4;
94604050A,Female,06/25/1973,Androscoggin,,,,,,,,,6,"$4,247.89",23,"$7,327.37"
94603100A,Male,06/20/1961,Cumberland,6,$67.47,7,$65.24,16,$311.07,13,"$2,795.79",17,$201.22,,
94602853A,Male,06/24/1949,Oxford,,,,,,,16,"$65,874.18",26,"$3,625.06",30,"$2,946.30"
94602600A,Male,11/19/1960,Kennebec,13,$777.11,51,"$8,004.18",29,"$39,693.57",25,"$8,292.15",50,"$13,131.50",29,"$22,569.97"
94602570A,Male,01/18/1960,Penobscot,18,"$1,355.37",25,"$2,313.65",19,"$2,472.84",41,"$16,849.72",21,"$5,170.58",17,"$1,885.47"
94601310A,Male,01/05/1960,Cumberland,,,8,$993.41,7,$842.01,12,"$1,226.75",2,$22.30,,
94599480A,Male,09/30/1958,Cumberland,,,4,$471.54,7,$1.23,3,$29.75,16,$288.53,27,"$2,411.29"
94598333A,Male,12/13/1972,Cumberland,,,,,,,1,$83.80,,,,
94597070A,Female,04/04/1959,Piscataquis,8,"$2,027.79",11,"$3,132.89",11,$688.19,25,"$2,672.37",14,$93.44,21,"$-1,386.13"
94596570A,Male,09/21/1959,Piscataquis,6,$657.30,10,"$1,276.86",8,$861.97,1,$-4.69,,,,
;;;;
	
	
*Transpose data to a more friendly format;
data flipped;
	set test;
	array claims(*) claims_:;
	array payment(*) payment_:;

	do i=1 to dim(claims);
		claim_paid=claims(i);
		year=input(scan(vname(claims(I)), 3, "_"), 8.);
		payments_total=payment(i);

		if payments_total ne . then
			output;
	end;
	drop claims_: payment_: i;
run;

*Sort data for BY group processing;

proc sort data=flipped;
	by person_id_Unencrypted;
run;

*Identify first and last year per customer;

data flipped2;
	set flipped;
	by person_id_unencrypted;
	firstYear=first.person_id_unencrypted;
	lastYear=last.person_id_unencrypted;
run;

/*Summary steps*/
proc means data=flipped2 sum noprint nway;
	class year;
	output out=summary_by_year (drop=_type_ _freq_) n(claim_paid)=num_members 
		sum(claim_paid)=total_num_claims sum(payments_total)=total_payments 
		sum(firstYear)=newMembers sum(lastYear)=leavingMembers;
run;

 

View solution in original post


All Replies
Super User
Posts: 17,912

Re: Summarizing data-membership numbers etc

Fortunately, all your issues have a solution that uses the same idea, BY GROUP processing.

 

You can identify a BY GROUP by several different variables. For example, in your first question, your BY group would be YEAR. If you post your current code, we can suggest ways to add in the BY YEAR concept. 

 

For you second it would be the person. You don't post what you want as output, but I would recommend Person, StartYear, EndYear.

You could also do this via a PROC MEANS or SUMMARY recognizing that the smallest (MIN) year is the start and the highest (MAX) is the end year. Then you can summarize your flow over a year by using PROC FREQ on the StartYear and EndYear variables. 

 

 

 

 

 

 

Super User
Posts: 17,912

Re: Summarizing data-membership numbers etc

http://support.sas.com/documentation/cdl/en/lrcon/69852/HTML/default/viewer.htm#n138da4gme3zb7n1nifp...

 

A quick read of the Understanding BY Group section would help, and then the rest of the section.

Frequent Contributor
Posts: 87

Re: Summarizing data-membership numbers etc

Thanks Reeza. I used the following code to get first year numbers and then have to do it 5 more times to get other year numbers. I understand the group by processing, but im getting confused for membership numbers when they go in and out of system.

Also, i see my example data is not posted fully. I remember there is a post which helps you convert sas dataset into code for demonstration purpose. I cant find that on forum. I will then post what i want for output.

 

proc sql;
create table diab2011 as select count(person_id_unencrypted) as members2011, sum(Claims_Paid_2011) as claims2011, sum(Payment_Amount_Prof___Fac_2011) as total_payment_2011
from diabetes where Claims_Paid_2011 <> . and Payment_Amount_Prof___Fac_2011 <> . ;
quit;

Frequent Contributor
Posts: 87

Re: Summarizing data-membership numbers etc

There are additional 10 columns in my data  from 2012 till 2016 respectively for number of claims and claim amounts for each member. Below is what I'm looking for final output-

 

Year  county gender no_of_members  no_of_New_members Members_lost no_of_claims claim_amounts

2011  A   M

2011  A   F

2011 B   M

2011 B   F

2012 A  M

2012 A F

Super User
Posts: 17,912

Re: Summarizing data-membership numbers etc

Frequent Contributor
Posts: 87

Re: Summarizing data-membership numbers etc

Thanks. Here is the sample data-

 

data WORK.TEST;
infile datalines dsd truncover;
input Person_ID_Unencrypted:$9. Gender:$8. Date_of_Birth:$10. County:$14. Claims_Paid_2011:32. Payment_Amount_Prof___Fac_2011:DOLLAR21.2 Claims_Paid_2012:32. Payment_Amount_Prof___Fac_2012:DOLLAR21.2 Claims_Paid_2013:32. Payment_Amount_Prof___Fac_2013:DOLLAR21.2 Claims_Paid_2014:32. Payment_Amount_Prof___Fac_2014:DOLLAR21.2 Claims_Paid_2015:32. Payment_Amount_Prof___Fac_2015:DOLLAR21.2 Claims_Paid_2016:32. Payment_Amount_Prof___Fac_2016:DOLLAR21.2;
datalines4;
94604050A,Female,06/25/1973,Androscoggin,,,,,,,,,6,"$4,247.89",23,"$7,327.37"
94603100A,Male,06/20/1961,Cumberland,6,$67.47,7,$65.24,16,$311.07,13,"$2,795.79",17,$201.22,,
94602853A,Male,06/24/1949,Oxford,,,,,,,16,"$65,874.18",26,"$3,625.06",30,"$2,946.30"
94602600A,Male,11/19/1960,Kennebec,13,$777.11,51,"$8,004.18",29,"$39,693.57",25,"$8,292.15",50,"$13,131.50",29,"$22,569.97"
94602570A,Male,01/18/1960,Penobscot,18,"$1,355.37",25,"$2,313.65",19,"$2,472.84",41,"$16,849.72",21,"$5,170.58",17,"$1,885.47"
94601310A,Male,01/05/1960,Cumberland,,,8,$993.41,7,$842.01,12,"$1,226.75",2,$22.30,,
94599480A,Male,09/30/1958,Cumberland,,,4,$471.54,7,$1.23,3,$29.75,16,$288.53,27,"$2,411.29"
94598333A,Male,12/13/1972,Cumberland,,,,,,,1,$83.80,,,,
94597070A,Female,04/04/1959,Piscataquis,8,"$2,027.79",11,"$3,132.89",11,$688.19,25,"$2,672.37",14,$93.44,21,"$-1,386.13"
94596570A,Male,09/21/1959,Piscataquis,6,$657.30,10,"$1,276.86",8,$861.97,1,$-4.69,,,,
;;;;
Solution
‎12-27-2016 11:42 AM
Super User
Posts: 17,912

Re: Summarizing data-membership numbers etc

Merry Christmas

 

Here's a solution that tested. It's based on the idea of transposing your data to a more workable format first - a long format, and adding indicator variables for first and last years. You don't mention how the lastYear is identified, as I assume it has to be more than the last record.  I'll leave that logic up to you. 

 

*Read in sample data;

data WORK.TEST;
	infile datalines dsd truncover;
	input Person_ID_Unencrypted:$9. Gender:$8. Date_of_Birth:$10. County:$14. 
		Claims_Paid_2011:32. Payment_Amount_Prof___Fac_2011:DOLLAR21.2 
		Claims_Paid_2012:32. Payment_Amount_Prof___Fac_2012:DOLLAR21.2 
		Claims_Paid_2013:32. Payment_Amount_Prof___Fac_2013:DOLLAR21.2 
		Claims_Paid_2014:32. Payment_Amount_Prof___Fac_2014:DOLLAR21.2 
		Claims_Paid_2015:32. Payment_Amount_Prof___Fac_2015:DOLLAR21.2 
		Claims_Paid_2016:32. Payment_Amount_Prof___Fac_2016:DOLLAR21.2;
	datalines4;
94604050A,Female,06/25/1973,Androscoggin,,,,,,,,,6,"$4,247.89",23,"$7,327.37"
94603100A,Male,06/20/1961,Cumberland,6,$67.47,7,$65.24,16,$311.07,13,"$2,795.79",17,$201.22,,
94602853A,Male,06/24/1949,Oxford,,,,,,,16,"$65,874.18",26,"$3,625.06",30,"$2,946.30"
94602600A,Male,11/19/1960,Kennebec,13,$777.11,51,"$8,004.18",29,"$39,693.57",25,"$8,292.15",50,"$13,131.50",29,"$22,569.97"
94602570A,Male,01/18/1960,Penobscot,18,"$1,355.37",25,"$2,313.65",19,"$2,472.84",41,"$16,849.72",21,"$5,170.58",17,"$1,885.47"
94601310A,Male,01/05/1960,Cumberland,,,8,$993.41,7,$842.01,12,"$1,226.75",2,$22.30,,
94599480A,Male,09/30/1958,Cumberland,,,4,$471.54,7,$1.23,3,$29.75,16,$288.53,27,"$2,411.29"
94598333A,Male,12/13/1972,Cumberland,,,,,,,1,$83.80,,,,
94597070A,Female,04/04/1959,Piscataquis,8,"$2,027.79",11,"$3,132.89",11,$688.19,25,"$2,672.37",14,$93.44,21,"$-1,386.13"
94596570A,Male,09/21/1959,Piscataquis,6,$657.30,10,"$1,276.86",8,$861.97,1,$-4.69,,,,
;;;;
	
	
*Transpose data to a more friendly format;
data flipped;
	set test;
	array claims(*) claims_:;
	array payment(*) payment_:;

	do i=1 to dim(claims);
		claim_paid=claims(i);
		year=input(scan(vname(claims(I)), 3, "_"), 8.);
		payments_total=payment(i);

		if payments_total ne . then
			output;
	end;
	drop claims_: payment_: i;
run;

*Sort data for BY group processing;

proc sort data=flipped;
	by person_id_Unencrypted;
run;

*Identify first and last year per customer;

data flipped2;
	set flipped;
	by person_id_unencrypted;
	firstYear=first.person_id_unencrypted;
	lastYear=last.person_id_unencrypted;
run;

/*Summary steps*/
proc means data=flipped2 sum noprint nway;
	class year;
	output out=summary_by_year (drop=_type_ _freq_) n(claim_paid)=num_members 
		sum(claim_paid)=total_num_claims sum(payments_total)=total_payments 
		sum(firstYear)=newMembers sum(lastYear)=leavingMembers;
run;

 

Frequent Contributor
Posts: 87

Re: Summarizing data-membership numbers etc

Thanks Reeza. I'm out now, but will run the code soon. Last year is basically the year after which they have no claims.  So, a person could have claim in 2013 and then again in 2016, the last year will be 2016. If only in 2012 and 2013, then 2013 is last year.

Frequent Contributor
Posts: 87

Re: Summarizing data-membership numbers etc

Reeza, the code above has worked very well. I just realized that there are some cases where a member can have multiple records in the same year as they change counties. Now this doesnt change any other calculations in your code, except the number of members each year. Because in cases where there are multiple claims for same year, those members are counted twice using this below in your code.

n(claim_paid)=num_members 

 I tried to solve this problem by flagging such cases in flipped2 dataset so that I can subtract these cases later on. I used the following code, but its not giving me the desired result. instead it is flagging those cases where members have only one record.

 

data flagged;
set flipped2;
by person_id_unencrypted year;
if first.person_id_unencrypted then do;
if year = lag(year) then flag =1 ;
end;
run;

 

Sorry for troubling you again. Thanks so much as always!

Super User
Posts: 17,912

Re: Summarizing data-membership numbers etc

Your on the right track. If you're using BY group processing and see this type of code, know that you're on the wrong track

 

if year=lag(year) 

 

This means you need to add year to your BY statement. 

 

The idea here, is flag the first occurence for each ID, and add a record that is 1 for the first and then 0 otherwise for each person/year. Then for the number of users, add up the variable. 

There are three changes below, with comments Change#

 

 

*Sort data for BY group processing;

proc sort data=flipped;
	by person_id_Unencrypted year; /*Change1 - add year to sort/BY*/
run;

*Identify first and last year per customer;

data flipped2;
	set flipped;
	by person_id_unencrypted year; /*Change1 - add year to sort/BY*/
	firstYear=first.person_id_unencrypted;
	lastYear=last.person_id_unencrypted;
    if first.year then person_count=1; else person_count=0; /* Change2 - Add flag for first record per person per year*/
run;

/*Summary steps*/
proc means data=flipped2 sum noprint nway;
	class year;
	output out=summary_by_year (drop=_type_ _freq_) sum(person_count)=num_members /*Change3 - Sum the new variable created as number of members*/ 
		sum(claim_paid)=total_num_claims sum(payments_total)=total_payments 
		sum(firstYear)=newMembers sum(lastYear)=leavingMembers;
run;

 

Frequent Contributor
Posts: 87

Re: Summarizing data-membership numbers etc

Thanks a lot, Reeza! 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 310 views
  • 1 like
  • 2 in conversation