I am trying to rollup ENRL counts in scenarios where cid's and pid's consolidate to new pid's in the following year. This is represented as CID_15 = 2015 and PID_15= 2015 and CID_16= 2016 and PID_16=2016. In this case my code works the way I want it to. An example is CID_15= 'H0028' and PID_15 in ('001','002','003') rolls up to CID_16= 'H0028' and PID_16 = '001'. The code currently summarizes the counts of varying ENRL fields and populates a total. This is how most CID's should rollup, they stay the same from '15 to '16 and only the PID changes.
My problem is the code does not work for scenarios where the cid_15 and pid_15 change at both the CID and PID level from year to year but are still deemed as change or consolidation. This does not happen frequently, but it is a scenario where a CID will change from one year to the next, as will the PID.
A perfect example in my data is CID_15 = 'H0108', this changes to CID_16= 'H5619' the following year and as you can see the PID's in 'H0108' consolidate. An example here is CID_15 = 'H0108' and PID_15 in ('021','022') consolidate to CID_16= 'H5619' and PID_15 = '026'.
data have;
infile datalines dlm="09"x;
input CID_15 $ PID_15 $ CID_16 $ PID_16 $ CONSOLIDATED
NEW_PLAN TERMED RENEWAL RENEWAL_SAR RENEWAL_SAE ENROLLMENT
BASE_ENRL RENEW_ENRL CONSOL_ENRL RENEW_CONSOL TERMED_ENRL ;
datalines;
H0028 001 H0028 001 1 0 0 0 0 0 3215 3215 3215 0 0 0
H0028 002 H0028 001 1 0 0 0 0 0 1452 1452 1452 1452 0 0
H0028 003 H0028 001 1 0 0 0 0 0 198 198 198 198 0 0
H0028 004 H0028 004 0 0 0 1 0 0 5783 5783 5783 0 0 0
H0028 801 H0028 801 0 0 0 0 1 0 . . . 0 0 0
H0028 802 H0028 802 0 0 0 0 1 0 . . . 0 0 0
H0108 004 H5619 012 0 0 0 0 0 1 6093 6093 6093 0 0 0
H0108 005 H5619 019 0 0 0 1 0 0 13640 13640 13640 0 0 0
H0108 006 H5619 020 0 0 0 1 0 0 1256 1256 1256 0 0 0
H0108 008 H5619 016 0 0 0 1 0 0 3411 3411 3411 0 0 0
H0108 009 H5619 017 0 0 0 1 0 0 634 634 634 0 0 0
H0108 010 H5619 014 0 0 0 0 0 1 540 540 540 0 0 0
H0108 011 H5619 021 0 0 0 0 1 0 10530 10530 10530 0 0 0
H0108 021 H5619 026 1 0 0 0 0 0 1563 1563 1563 0 0 0
H0108 022 H5619 026 1 0 0 0 0 0 772 772 772 0 0 0
H0108 023 H5619 029 1 0 0 0 0 0 3357 3357 3357 0 0 0
H0108 024 H5619 029 1 0 0 0 0 0 1971 1971 1971 0 0 0
H0108 029 H5619 018 0 0 0 0 1 0 233 233 233 0 0 0
H0108 034 H5619 025 1 0 0 0 0 0 423 423 423 0 0 0
H0108 035 H5619 032 0 0 0 1 0 0 277 277 277 0 0 0
H0108 037 H5619 025 1 0 0 0 0 0 349 349 349 0 0 0
H0108 038 H5619 023 0 0 0 1 0 0 2153 2153 2153 0 0 0
H0108 039 H5619 031 0 0 0 1 0 0 339 339 339 0 0 0
H0108 041 H5619 024 0 0 0 1 0 0 2654 2654 2654 0 0 0
H0108 045 H5619 030 0 0 0 1 0 0 1496 1496 1496 0 0 0
H0108 046 H5619 015 0 0 0 0 1 0 3745 3745 3745 0 0 0
H0108 047 H5619 022 0 0 0 0 1 0 300 300 300 0 0 0
H0108 048 H5619 038 0 0 0 1 0 0 1457 1457 1457 0 0 0
H0108 050 H5619 037 0 0 0 0 1 0 2306 2306 2306 0 0 0
H0108 054 H5619 035 1 0 0 0 0 0 55 55 55 0 0 0
H0108 055 H5619 035 1 0 0 0 0 0 466 466 466 0 0 0
H0108 056 H5619 034 0 0 0 1 0 0 238 238 238 0 0 0
H0108 057 H5619 033 0 0 0 1 0 0 1433 1433 1433 0 0 0
H0108 058 H5619 036 1 0 0 0 0 0 42 42 42 0 0 0
H0108 059 H5619 036 1 0 0 0 0 0 308 308 308 0 0 0
H0108 801 H5619 802 1 0 0 0 0 0 35 35 35 0 0 0
run;
proc sort data=HAVE; by CID_15 PID_16 PID_15;run;
data ROLLUP (keep= CID_15 PID_15 CID_16 PID_16 CONSOLIDATED
NEW_PLAN TERMED RENEWAL RENEWAL_SAR RENEWAL_SAE ENROLLMENT
BASE_ENRL RENEW_ENRL CONSOL_ENRL RENEW_CONSOL_ENRL TERMED_ENR);
set HAVE(
rename=(
/*these are the columns that need to be retained*/
PID_15=o_PID_15
BASE_ENRL=o_BASE_ENRL
RENEW_ENRL=o_RENEW_ENRL
RENEW_CONSOL=o_RENEW_CONSOL
CONSOL_ENRL=O_CONSOL_ENRL
TERMED_ENRL=O_TERMED_ENRL
));
by CID_15 PID_16;
length PID_15 $3.
BASE_ENRL
RENEW_ENRL
CONSOL_ENRL
RENEW_CONSOL
TERMED_ENRL
TOTAL
8. ;
retain CID_15 PID_16 PID_15 '' /*<--retain string variables*/
CONSOL_ENRL BASE_ENRL RENEW_ENRL RENEW_CONSOL TERMED_ENRL total /*<-- retain numeric variables*/;
if first.PID_15 then do;
CONSOL_ENRL=0;/*this will summarize line by line below*/
/*retaining values from first row*/
pid_15=o_PID_15;
BASE_ENRL=o_BASE_ENRL;
RENEW_ENRL=o_RENEW_ENRL;
RENEW_CONSOL=o_RENEW_CONSOL;
TERMED_ENRL=o_TERMED_ENRL;
end;
CONSOL_ENRL=sum(CONSOL_ENRL,O_CONSOL_ENRL);
total=sum(CONSOL_ENRL,RENEW_ENRL,RENEW_CONSOL);
if last.PID_16 then do; output; end;
run;
I'm trying to find a way to account for this level of change in my data step without in a fluid step.
Here is a hint to ask a question more likely to get a clear answer. Reduce your example data to the variables needed in your question, which I think are C_id P_id, what ever indicates "consolidated" or "changed" and ONE variable to summarize. Then reduce the number of observations to maybe 10 or 15. These records should be able to demonstrate your cases. Then provide an example of the desired summary.
Since I do not see any variable that in anyway resembles a Year then you will need to very explicitly list out every single thing that changes but should be treated the same.
I for one do not see why "C_ID = 'H0108' and P_ID in ('021','022') consolidate to C_ID= 'H5619' and P_ID = '026'."
The way that statement reads makes me believe the C_ID is supposed to change values? is that the case? or P_id.
It is very common to leave out a lot of information that you know because you are intimately familiar (hopefully ) with your data. But skipping steps or assuming we make the same "leap" of concept for those of use not familiar will often lead to lots of questions.
I would say the variables smell of containing time information in the names which sometimes means the data structure (wide) is more complicated then it needs to be. Which is somewhat believable because it appears that when any of ENROLLMENT BASE_ENRL RENEW_ENRL have values they all have the exact same value. If this is not always the case then you should include some values that represent those cases.
RETAIN does not actually keep any value if the variable is in the data set on the set statement. Each time a new record is brought it that is the value of the variable. So your retain of C_id and Current_Pid does nothing. And the whole renaming old variables to reuse there names makes the code a tad harder to follow.
Since apparently you want some sums grouped by C_ID and Current_Pid then I would likely drop this whole data step and move to Proc Summary if I could follow the whole changing C_id P_id logic, Typically I might do that with custom formats but I'm not sure exactly what you need.
@ballardw I made some changes to the explanation and added the year variable to the CID and PID level to hopefully assist in some of the confusion. I also removed some of the data to only focus on the two examples I provided. Hopefully this makes it less messy. Thank you for the suggestions.
can you post the expected output like ballardw asked. There are few inconsistencies in the way you code
1. Length statement as you write it is ineffective length statement should always be before a set statement otherwise the length of the variables from the input dataset is considered as length and for other char variables it is set to a default of 8.
2. Are you trying to retain first pid15 and last pid16 ?
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.