I'm fairly new to the idea of macro variables and I think it is the best option I'm looking for right now. I'm working with a data set to track gains and losses in membership rolls. I have my data set sorted by MEMBER_ID and want to look at the beginning and ending dates for each member. We get this data snapshot monthly at the end of the month thus having a field called MONTHLY_DATE containing values like 2018-04-30, 2018-05-31, 2018-06-30, etc. The data starts at 2012-09-31 and currently ends at 2019-04-30. However, there is a possibility that we could get more data prior to 2012-09-31 and we will definitely be receiving data after 2019-04-30. Attached is a spreadsheet with an example data set containing the original data set as well as the two tables I wish to create.
I think I need a few DO statements. First it will look at the MEMBER_ID then look at the MONTHLY_DATE field. If the MONTHLY_DATE field of the first record of the MEMBER_ID (in the data set 'A123B78') is not equal to the MIN(MONTHLY_DATE) then it will write that record to the GAINS table. It will keep cycling through the first MEMBER_ID until it gets to the last record of MEMBER_ID, 'A123B78' at which point it will compare MONTHLY_DATE to the MAX(MONTHLY_DATE). If MONTHLY_DATE is not equal to MAX(MONTHLY_DATE) then it will be written to the LOSSES table. It then repeats this process for the next MEMBER_ID, 'J800K99' and so on and so forth.
What I'm not sure about is how to effectively do this loop and if macro variables are need in the code at all. I know this is vague and I'm not sure where to start with the code. Any and all help is greatly appreciated.
Starter Data Set
MEMBER_ID | MONTHLY_DATE | CONTRIBUTION | LOCATION |
A123B78 | 9/30/2012 | 40 | PA |
A123B78 | 10/31/2012 | 40 | PA |
A123B78 | 11/30/2012 | 40 | PA |
A123B78 | 12/31/2012 | 40 | PA |
A123B78 | 1/31/2012 | 40 | PA |
A123B78 | 2/29/2012 | 40 | PA |
J800K99 | 5/31/2015 | 25 | PA |
J800K99 | 6/30/2015 | 25 | PA |
J800K99 | 7/31/2015 | 25 | PA |
J800K99 | 8/31/2015 | 25 | PA |
R675T90 | 6/30/2013 | 15 | NY |
R675T90 | 7/31/2013 | 15 | NY |
R675T90 | 8/31/2013 | 15 | NY |
R675T90 | 9/30/2013 | 15 | NY |
W786L13 | 2/28/2019 | 25 | CA |
W786L13 | 3/31/2019 | 25 | CA |
W786L13 | 4/30/2019 | 25 | CA |
Y889F45 | 8/31/2018 | 40 | CA |
Y889F45 | 9/30/2018 | 40 | CA |
Y889F45 | 10/31/2018 | 40 | CA |
Y889F45 | 11/30/2018 | 40 | CA |
Y889F45 | 12/31/2018 | 40 | CA |
Y889F45 | 1/31/2019 | 40 | CA |
Y889F45 | 2/28/2019 | 40 | CA |
Y889F45 | 3/31/2019 | 40 | CA |
Y889F45 | 4/30/2019 | 40 | CA |
LOSSES
MEMBER_ID | MONTHLY_DATE | CONTRIBUTION | LOCATION |
A123B78 | 2/29/2012 | 40 | PA |
J800K99 | 8/31/2015 | 25 | PA |
R675T90 | 9/30/2013 | 15 | NY |
GAINS
MEMBER_ID | MONTHLY_DATE | CONTRIBUTION | LOCATION |
J800K99 | 5/31/2015 | 25 | PA |
R675T90 | 6/30/2013 | 15 | NY |
W786L13 | 2/28/2019 | 25 | CA |
Y889F45 | 8/31/2018 | 40 | CA |
You need to provide a cleaner definition of what MIN(MONTHLY_DATE) and MAX (MONTHLY_DATE) mean. Are they the single smalles/largest dates included in the entire data set? Within a specific period of interest? Specified somewhere? If so, where or how?
The definition needs clarification because each member_id will have their own min and max date. And it not clear which actual value you are comparing the "desired result" dates to.
Also please clarify that the date variable in question is actually as SAS date value, not a character value.
Your attachment didn't make it. But since this is SAS providing SAS data sets is much better as that answers all the questions about data types and such.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
MIN(MONTHLY_DATE) and MAX(MONTHLY_DATE) would be the minimum and maximum dates in the entire data set.
The date variable in question is an official SAS formatted date field.
The attachment just contained the tables I had typed out in my original question.
Here's a way to FLAG records for gain or loss that uses a SAS supplied data set you should have available since you haven't provided example data in a form useable for code.
proc summary data=sashelp.stocks; var date; output out=work.datesum max= min=/autoname; run; proc sql noprint; select min(date), max(date) into :mindate, :maxdate from sashelp.stocks ; quit; /* sort the data as not in stock, date order remove some records to have different date ranges as the same for the stocks in the base data */ proc sort data=sashelp.stocks out=work.stocks (where= ( (stock='IBM' and date<'01JUL2005'd) or (stock='Microsoft' and date>'01JUN1987'd) ) ); by stock date; run; data example; set work.stocks; by stock; if first.stock and date ne &mindate. then GainFlag=1; if last.stock and date ne &maxdate. then LossFlag =1; run;
If the proper values are flagged then it is a trivial exercise to OUTPUT the flagged records to a data set, separate data sets or write reports with the information of which records are gains and losses.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.