BookmarkSubscribeRSS Feed
bknitch
Quartz | Level 8

I am trying to rollup ENRL counts in scenarios where cid's and pid's consolidate to new pid's in the following year. This is represented as CID_15 = 2015 and PID_15= 2015 and CID_16= 2016 and PID_16=2016. In this case my code works the way I want it to. An example is CID_15= 'H0028' and PID_15 in ('001','002','003') rolls up to CID_16= 'H0028' and PID_16 = '001'. The code currently summarizes the counts of varying ENRL fields and populates a total. This is how most CID's should rollup, they stay the same from '15 to '16 and only the PID changes. 

 

My problem is the code does not work for scenarios where the cid_15 and pid_15 change at both the CID and PID level from year to year but are still deemed as change or consolidation. This does not happen frequently, but it is a scenario where a CID will change from one year to the next, as will the PID. 

 

A perfect example in my data is CID_15 = 'H0108', this changes to CID_16= 'H5619' the following year and as you can see the PID's in 'H0108' consolidate. An example here is CID_15 = 'H0108' and PID_15 in ('021','022') consolidate to CID_16= 'H5619' and PID_15 = '026'.

 

data have;
	infile datalines dlm="09"x;
	input CID_15 $ PID_15 $ CID_16 $ PID_16 $ CONSOLIDATED
 NEW_PLAN TERMED RENEWAL RENEWAL_SAR RENEWAL_SAE ENROLLMENT
BASE_ENRL RENEW_ENRL CONSOL_ENRL RENEW_CONSOL TERMED_ENRL ;
 datalines;
H0028	001	H0028	001	1	0	0	0	0	0	3215	3215	3215	0	0	0
H0028	002	H0028	001	1	0	0	0	0	0	1452	1452	1452	1452	0	0
H0028	003	H0028	001	1	0	0	0	0	0	198	198	198	198	0	0
H0028	004	H0028	004	0	0	0	1	0	0	5783	5783	5783	0	0	0
H0028	801	H0028	801	0	0	0	0	1	0	.	.	.	0	0	0
H0028	802	H0028	802	0	0	0	0	1	0	.	.	.	0	0	0
H0108	004	H5619	012	0	0	0	0	0	1	6093	6093	6093	0	0	0
H0108	005	H5619	019	0	0	0	1	0	0	13640	13640	13640	0	0	0
H0108	006	H5619	020	0	0	0	1	0	0	1256	1256	1256	0	0	0
H0108	008	H5619	016	0	0	0	1	0	0	3411	3411	3411	0	0	0
H0108	009	H5619	017	0	0	0	1	0	0	634	634	634	0	0	0
H0108	010	H5619	014	0	0	0	0	0	1	540	540	540	0	0	0
H0108	011	H5619	021	0	0	0	0	1	0	10530	10530	10530	0	0	0
H0108	021	H5619	026	1	0	0	0	0	0	1563	1563	1563	0	0	0
H0108	022	H5619	026	1	0	0	0	0	0	772	772	772	0	0	0
H0108	023	H5619	029	1	0	0	0	0	0	3357	3357	3357	0	0	0
H0108	024	H5619	029	1	0	0	0	0	0	1971	1971	1971	0	0	0
H0108	029	H5619	018	0	0	0	0	1	0	233	233	233	0	0	0
H0108	034	H5619	025	1	0	0	0	0	0	423	423	423	0	0	0
H0108	035	H5619	032	0	0	0	1	0	0	277	277	277	0	0	0
H0108	037	H5619	025	1	0	0	0	0	0	349	349	349	0	0	0
H0108	038	H5619	023	0	0	0	1	0	0	2153	2153	2153	0	0	0
H0108	039	H5619	031	0	0	0	1	0	0	339	339	339	0	0	0
H0108	041	H5619	024	0	0	0	1	0	0	2654	2654	2654	0	0	0
H0108	045	H5619	030	0	0	0	1	0	0	1496	1496	1496	0	0	0
H0108	046	H5619	015	0	0	0	0	1	0	3745	3745	3745	0	0	0
H0108	047	H5619	022	0	0	0	0	1	0	300	300	300	0	0	0
H0108	048	H5619	038	0	0	0	1	0	0	1457	1457	1457	0	0	0
H0108	050	H5619	037	0	0	0	0	1	0	2306	2306	2306	0	0	0
H0108	054	H5619	035	1	0	0	0	0	0	55	55	55	0	0	0
H0108	055	H5619	035	1	0	0	0	0	0	466	466	466	0	0	0
H0108	056	H5619	034	0	0	0	1	0	0	238	238	238	0	0	0
H0108	057	H5619	033	0	0	0	1	0	0	1433	1433	1433	0	0	0
H0108	058	H5619	036	1	0	0	0	0	0	42	42	42	0	0	0
H0108	059	H5619	036	1	0	0	0	0	0	308	308	308	0	0	0
H0108	801	H5619	802	1	0	0	0	0	0	35	35	35	0	0	0

run;

proc sort data=HAVE; by CID_15 PID_16 PID_15;run;


data ROLLUP (keep= CID_15 PID_15 CID_16 PID_16 CONSOLIDATED
 NEW_PLAN TERMED RENEWAL RENEWAL_SAR RENEWAL_SAE ENROLLMENT
BASE_ENRL RENEW_ENRL CONSOL_ENRL RENEW_CONSOL_ENRL TERMED_ENR);

set HAVE(
rename=(
/*these are the columns that need to be retained*/
PID_15=o_PID_15
BASE_ENRL=o_BASE_ENRL
RENEW_ENRL=o_RENEW_ENRL
RENEW_CONSOL=o_RENEW_CONSOL
CONSOL_ENRL=O_CONSOL_ENRL
TERMED_ENRL=O_TERMED_ENRL
));

by CID_15 PID_16;
length PID_15 $3.  
BASE_ENRL
RENEW_ENRL
CONSOL_ENRL
RENEW_CONSOL
TERMED_ENRL
TOTAL
8. ;
retain CID_15 PID_16 PID_15 '' /*<--retain string variables*/
CONSOL_ENRL BASE_ENRL RENEW_ENRL RENEW_CONSOL TERMED_ENRL total /*<-- retain numeric variables*/;

if first.PID_15 then do;
CONSOL_ENRL=0;/*this will summarize line by line below*/
/*retaining values from first row*/
pid_15=o_PID_15;
BASE_ENRL=o_BASE_ENRL;
RENEW_ENRL=o_RENEW_ENRL;
RENEW_CONSOL=o_RENEW_CONSOL;
TERMED_ENRL=o_TERMED_ENRL;
end;
CONSOL_ENRL=sum(CONSOL_ENRL,O_CONSOL_ENRL);
total=sum(CONSOL_ENRL,RENEW_ENRL,RENEW_CONSOL);
if last.PID_16 then do; output; end;
run;

I'm trying to find a way to account for this level of change in my data step without in a fluid step. 

 

 

3 REPLIES 3
ballardw
Super User

Here is a hint to ask a question more likely to get a clear answer. Reduce your example data to the variables needed in your question, which I think are C_id P_id, what ever indicates "consolidated" or "changed" and ONE variable to summarize. Then reduce the number of observations to maybe 10 or 15. These records should be able to demonstrate your cases. Then provide an example of the desired summary.

 

Since I do not see any variable that in anyway resembles a Year then you will need to very explicitly list out every single thing that changes but should be treated the same.

I for one do not see why "C_ID = 'H0108' and P_ID in ('021','022') consolidate to C_ID= 'H5619' and P_ID = '026'."

The way that statement reads makes me believe the C_ID is supposed to change values? is that the case? or P_id.

 

It is very common to leave out a lot of information that you know because you are intimately familiar (hopefully ) with your data. But skipping steps or assuming we make the same "leap" of concept for those of use not familiar will often lead to lots of questions.

 

I would say the variables smell of containing time information in the names which sometimes means the data structure (wide) is more complicated then it needs to be. Which is somewhat believable because it appears that when any of ENROLLMENT BASE_ENRL RENEW_ENRL have values they all have the exact same value. If this is not always the case then you should include some values that represent those cases.

 

RETAIN does not actually keep any value if the variable is in the data set on the set statement. Each time a new record is brought it that is the value of the variable. So your retain of C_id and Current_Pid does nothing. And the whole renaming old variables to reuse there names makes the code a tad harder to follow.

 

Since apparently you want some sums grouped by C_ID and Current_Pid then I would likely drop this whole data step and move to Proc Summary if I could follow the whole changing C_id P_id logic, Typically I might do that with custom formats but I'm not sure exactly what you need.

bknitch
Quartz | Level 8

@ballardw  I made some changes to the explanation and added the year variable to the CID and PID level to hopefully assist in some of the confusion. I also removed some of the data to only focus on the two examples I provided. Hopefully this makes it less messy. Thank you for the suggestions. 

 

 

smantha
Lapis Lazuli | Level 10

can you post the expected output like ballardw asked. There are few inconsistencies in the way you code

1. Length statement as you write it is ineffective length statement should always be before a set statement otherwise the length of the variables from the input dataset is considered as length and for other char variables it is set to a default of 8.

2. Are you trying to retain first pid15 and last pid16 ?

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 939 views
  • 0 likes
  • 3 in conversation