Re: Loops and Macros

ChaseA · Posted 06-06-2019 12:12 PM

I'm fairly new to the idea of macro variables and I think it is the best option I'm looking for right now. I'm working with a data set to track gains and losses in membership rolls. I have my data set sorted by MEMBER_ID and want to look at the beginning and ending dates for each member. We get this data snapshot monthly at the end of the month thus having a field called MONTHLY_DATE containing values like 2018-04-30, 2018-05-31, 2018-06-30, etc. The data starts at 2012-09-31 and currently ends at 2019-04-30. However, there is a possibility that we could get more data prior to 2012-09-31 and we will definitely be receiving data after 2019-04-30. Attached is a spreadsheet with an example data set containing the original data set as well as the two tables I wish to create.

I think I need a few DO statements. First it will look at the MEMBER_ID then look at the MONTHLY_DATE field. If the MONTHLY_DATE field of the first record of the MEMBER_ID (in the data set 'A123B78') is not equal to the MIN(MONTHLY_DATE) then it will write that record to the GAINS table. It will keep cycling through the first MEMBER_ID until it gets to the last record of MEMBER_ID, 'A123B78' at which point it will compare MONTHLY_DATE to the MAX(MONTHLY_DATE). If MONTHLY_DATE is not equal to MAX(MONTHLY_DATE) then it will be written to the LOSSES table. It then repeats this process for the next MEMBER_ID, 'J800K99' and so on and so forth.

What I'm not sure about is how to effectively do this loop and if macro variables are need in the code at all. I know this is vague and I'm not sure where to start with the code. Any and all help is greatly appreciated.

Starter Data Set

MEMBER_ID	MONTHLY_DATE	CONTRIBUTION	LOCATION
A123B78	9/30/2012	40	PA
A123B78	10/31/2012	40	PA
A123B78	11/30/2012	40	PA
A123B78	12/31/2012	40	PA
A123B78	1/31/2012	40	PA
A123B78	2/29/2012	40	PA
J800K99	5/31/2015	25	PA
J800K99	6/30/2015	25	PA
J800K99	7/31/2015	25	PA
J800K99	8/31/2015	25	PA
R675T90	6/30/2013	15	NY
R675T90	7/31/2013	15	NY
R675T90	8/31/2013	15	NY
R675T90	9/30/2013	15	NY
W786L13	2/28/2019	25	CA
W786L13	3/31/2019	25	CA
W786L13	4/30/2019	25	CA
Y889F45	8/31/2018	40	CA
Y889F45	9/30/2018	40	CA
Y889F45	10/31/2018	40	CA
Y889F45	11/30/2018	40	CA
Y889F45	12/31/2018	40	CA
Y889F45	1/31/2019	40	CA
Y889F45	2/28/2019	40	CA
Y889F45	3/31/2019	40	CA
Y889F45	4/30/2019	40	CA

LOSSES

MEMBER_ID	MONTHLY_DATE	CONTRIBUTION	LOCATION
A123B78	2/29/2012	40	PA
J800K99	8/31/2015	25	PA
R675T90	9/30/2013	15	NY

GAINS

MEMBER_ID	MONTHLY_DATE	CONTRIBUTION	LOCATION
J800K99	5/31/2015	25	PA
R675T90	6/30/2013	15	NY
W786L13	2/28/2019	25	CA
Y889F45	8/31/2018	40	CA

ballardw · Posted 06-06-2019 03:27 PM

You need to provide a cleaner definition of what MIN(MONTHLY_DATE) and MAX (MONTHLY_DATE) mean. Are they the single smalles/largest dates included in the entire data set? Within a specific period of interest? Specified somewhere? If so, where or how?

The definition needs clarification because each member_id will have their own min and max date. And it not clear which actual value you are comparing the "desired result" dates to.

Also please clarify that the date variable in question is actually as SAS date value, not a character value.

Your attachment didn't make it. But since this is SAS providing SAS data sets is much better as that answers all the questions about data types and such.

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

ChaseA · Posted 06-06-2019 04:45 PM

MIN(MONTHLY_DATE) and MAX(MONTHLY_DATE) would be the minimum and maximum dates in the entire data set.

The date variable in question is an official SAS formatted date field.

The attachment just contained the tables I had typed out in my original question.

ballardw · Posted 06-07-2019 05:44 PM

Here's a way to FLAG records for gain or loss that uses a SAS supplied data set you should have available since you haven't provided example data in a form useable for code.

proc summary data=sashelp.stocks;
   var date;
   output out=work.datesum max= min=/autoname;
run;

proc sql noprint;
   select min(date), max(date) into :mindate, :maxdate
   from sashelp.stocks
   ;
quit;

/* sort the data as not in stock, date order
   remove some records to have different date
   ranges as the same for the stocks
   in the base data
*/
proc sort data=sashelp.stocks
    out=work.stocks (where= ( (stock='IBM' and date<'01JUL2005'd)
                            or (stock='Microsoft' and date>'01JUN1987'd)
                            )
                    );
   by stock date;
run;

data example;
  set work.stocks;
  by stock;
  if first.stock and date ne &mindate. then GainFlag=1;
  if last.stock and date ne &maxdate. then LossFlag =1;
run;

If the proper values are flagged then it is a trivial exercise to OUTPUT the flagged records to a data set, separate data sets or write reports with the information of which records are gains and losses.

Loops and Macros