This is the data I have:
MBR_ID | Prov_ID | DRG | Start_DT | End_DT |
00158409 | 10010 | 0774 | 06Dec2011 | 29Jul2012 |
00158409 | 10010 | 0774 | 06Dec2011 | 29Jul2012 |
00211424 | 10010 | 0765 | 16Jan2012 | 21Sep2012 |
00324582 | 10119 | 0775 | 24Oct2011 | 13Feb2012 |
00324582 | 10119 | 0775 | 24Oct2011 | 13Feb2012 |
00330871 | 10022 | 0766 | 05Jan2012 | 30Aug2012 |
00330871 | 10022 | 0766 | 05Jan2012 | 30Aug2012 |
00334334 | 10014 | 0775 | 13Feb2012 | 24Oct2012 |
00334334 | 10014 | 0775 | 13Feb2012 | 24Oct2012 |
00337381 | 10019 | 0775 | 25Aug2011 | 18Apr2012 |
00337381 | 10019 | 0775 | 25Aug2011 | 18Apr2012 |
00340418 | 10004 | 0766 | 20Mar2012 | 20Dec2012 |
00340418 | 10004 | 0766 | 20Mar2012 | 20Dec2012 |
00345727 | 10014 | 0775 | 22Jul2011 | 01Mar2012 |
00345727 | 10014 | 0775 | 22Jul2011 | 01Mar2012 |
00347643 | 10015 | 0766 | 11Jan2012 | 27Apr2012 |
00347643 | 10015 | 0766 | 11Jan2012 | 27Apr2012 |
00348641 | 10020 | 0775 | 23Feb2012 | 01Nov2012 |
00348641 | 10020 | 0775 | 23Feb2012 | 01Nov2012 |
I want to create a new var Admit_ID like this: if the values for the 5 vars in original data set are same for row1 to row2, we sign 1 to the Admit_ID then we move on to compare row3, row 3 is different (if any one of the five values is different, we conside row2 and row3 are different), then we sign 2 to row3. then we move on to compare row3 to row4....
MBR_ID | Prov_ID | DRG | Start_DT | End_DT | Admit_ID |
00158409 | 10010 | 0774 | 06Dec2011 | 29Jul2012 | 1 |
00158409 | 10010 | 0774 | 06Dec2011 | 29Jul2012 | 1 |
00211424 | 10010 | 0765 | 16Jan2012 | 21Sep2012 | 2 |
00324582 | 10119 | 0775 | 24Oct2011 | 13Feb2012 | 3 |
00324582 | 10119 | 0775 | 24Oct2011 | 13Feb2012 | 3 |
00330871 | 10022 | 0766 | 05Jan2012 | 30Aug2012 | 4 |
00330871 | 10022 | 0766 | 05Jan2012 | 30Aug2012 | 4 |
00334334 | 10014 | 0775 | 13Feb2012 | 24Oct2012 | 5 |
00334334 | 10014 | 0775 | 13Feb2012 | 24Oct2012 | 5 |
00337381 | 10019 | 0775 | 25Aug2011 | 18Apr2012 | 6 |
00337381 | 10019 | 0775 | 25Aug2011 | 18Apr2012 | 6 |
00340418 | 10004 | 0766 | 20Mar2012 | 20Dec2012 | 7 |
00340418 | 10004 | 0766 | 20Mar2012 | 20Dec2012 | 7 |
00345727 | 10014 | 0775 | 22Jul2011 | 01Mar2012 | 8 |
00345727 | 10014 | 0775 | 22Jul2011 | 01Mar2012 | 8 |
00347643 | 10015 | 0766 | 11Jan2012 | 27Apr2012 | 9 |
00347643 | 10015 | 0766 | 11Jan2012 | 27Apr2012 | 9 |
00348641 | 10020 | 0775 | 23Feb2012 | 01Nov2012 | 10 |
00348641 | 10020 | 0775 | 23Feb2012 | 01Nov2012 | 10 |
This is commonly known as creating an enumeration or counter variable.
Data want;
set have;
by var1 var2... var5;
retain counter 0;
If first.var5 = 1 then counter+1;
run;
This is commonly known as creating an enumeration or counter variable.
Data want;
set have;
by var1 var2... var5;
retain counter 0;
If first.var5 = 1 then counter+1;
run;
One of the things the BY statement does in a data step is keep track of the whether the the current record is the first or last value of the combination of the by variables. That makes the boolean values FIRST.Variable and LAST.Variable available. So FIRST.VAR5 is true when it is the first of each level of values within VAR5 for the combination of values Var1 through Var4.
If the other variables are not included then the order of the processing would be incorrect and actually likely generate an error about data not being sorted in order for var5.
It's worth skimming over the documentation, even if you don't read or understand it all at this point.
Especially the examples. Then the next time you see a problem that's similar you can connect the dots.
It's a long chapter:
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.