Dear SAS community:
I have a dataset like the following:
Week | geogkey | volume |
3/10/2019 | unique_geogkey1 | 15000 |
"All dates in between" | .... | |
12/15/2019 | unique_geogkey2 | 16000 |
3/8/2020 | unique_geogkey1 | 17000 |
"All dates in between" | ... | |
12/13/2020 | unique_geogkey1 | 18000 |
I want to perform a computation between the current year date and the corresponding previous year date starting from 3/8/2020 until 12/13/2020. In my dataset, there are 62 unique geogkeys, each of them mapped to all the unique weeks. So total observations = 62*156 weeks. I want to group this by each unique geogkey and week combination. i.e. there is a unique volume value for each week and geogkey combination.
For example, (3/8/2020 - 3/10/2019)*2 = (17000-15633)*2
Do this for each subsequent date until 12/13/2020 and the corresponding previous year (12/15/2019)
Is there a method that performs a more complex calculation besides the simple ones in proc summary e.g. sum= means= ?
I am currently attempting with a macro code to no avail:
start_date = '08Mar2020'd;
end_date = '13Dec2020'd;
%macro catyrind(start_date,end_date);
%do cat_period = &start_date %to &end_date;
%let prev_date=%sysfunc(intnx(year,cat_period,-1,"sameday"));
proc means data=walmart_catyrind nway noprint;
class week geogkey;
var sc; *this variable contains numeric values in the table
output out=final_wm ((cat_period)/(&prev_date))*100-100; *This is the necessary calculation
run;
%end;
%mend;
%catyrind(start_date,end_date);
I know the calculation above is out of order but was testing to see if it can be performed within the proc summary step.
*The final output for the week variable should ONLY include the corresponding 2020 date and the 2019 week date will be dropped.
The output would look something like this:
Geogkey Week calculation
unique_id 3/8/2020 (volume of 3/8/2020) / (volume of 3/10/2019) *100-100
.....
unique_id unique_week_2020year All volume for weeks in between
......
unique_id 12/13/2020 (volume of 12/13/2020) / (volume of 12/15/2019) *100-100
Doesn't have to be a macro, any method would be helpful. Thanks in advance!
Should not GEOKEYs in 2020 be the same you had in 2019 ?
I suppose that if there are GEOKEY1-GEOKEYn in 2020, most of them if not all,
were in 2019 too.
In order to be able to compute and compare volume of 2020 to that of 2019 it should be on same line
and for that need to separate the year and the week number in a year:
data temp;
set have;
year = year(WEEK);
wk= week(WEEK);
run;
proc sort data=temp; by geokey wk; run;
data want;
merge temp (where=(year=2020))
temp (where=(year=2019) rename=(volume=v2019) drop=year)
by geokey wk;
if missing(v2019) then calculated = .; else
calculated = volume / v2019 *100 - 100;
run;
I suppose that answers what you are looking for.
Clarification needed:
1) In your example you use different day in computation: (8 vs 10)
For example, (3/8/2020 - 3/10/2019)*2 = (17000-15633)*2
but in your code you compute with the same day:
%let prev_date=%sysfunc(intnx(year,cat_period,-1,"sameday"));
Are you looking first/following occurrence in 2020 vs. 2019 or for the sameday ?
2) Is WEEK unique in the data set or are there more than one observation per week?
When unique you just need union data for same month-day observation and compute
the single pair and you don't need PROC MEANS.
1) I should have clarified - All dates end on SUNDAY and I want to find the corresponding date from the previous year that also ends on SUNDAY. Although they are written as single dates, 3/8 & 3/10 technically represent data for week ENDING on SUNDAY. Hence, they won't exactly align. I'm presuming I need to write a custom interval because I don't believe any default arguments are available.
2) Yes, week is UNIQUE within the data set. I am calculating data for each week-geogkey pair.
Could you show an example of your union data method here?
Thank you!
I understand that there is one observation per week per year, therefore I created two new variables - YEAR and WEEK from the date variable you called "week".
I have created a test data to contain all available week-dates from the presented data in your post.
then merged 2019 data with 2020 data by week number.
I did not developed the statistic computing step, as I am not sure if next code realy helps you.
Please run and check the intermediate as the last output dataset and express your opinion about it.
The code is:
/* Calculating custom formula -
* between current and previous date
* with variable to be analyzed
*************************************/
data test;
infile cards dlm='09'x truncover;
input week :mmddyy10. geokey1 geokey2 $ geokeyn $;
cards;
3/10/2019 15633 x1 y1
12/15/2019 16000 x2 y2
3/8/2020 17000 x3 y3
12/13/2020 18000 x4 y4
; run;
data have; /* creating a full test data */
set test;
retain prevwk;
if _N_=1 then do;
prevwk = week;
year = year(week);
wk = week(week);
output;
return;
end;
if week - prevwk > 7 then do; /* assumed one obs per week */
lastwk = week;
week = prevwk +7;
do until (week > lastwk);
year = year(week);
wk = week(week);
output;
prevwk = week;
week+7;
end;
end;
drop week prevwk lastwk;
run;
data union;
format wk;
merge have(where=(year=2019) rename=(geokey1=gk2019_1 geokey2=gk2019_2 geokeyn=gk2019_n))
have(where=(year=2020) rename=(geokey1=gk2020_1 geokey2=gk2020_2 geokeyn=gk2020_n));
by wk;
drop year;
run;
@hojpoj93 please post some real test data.
Thanks for the code! @Shmuel This is the resulting dataset.
:
This is good because each there's observations for each week by year. However, I realized I set up the original data inefficiently and would have to transpose this again to get the final output. Sorry for the mixup. To save myself an unnecessary step, I changed the structure of the same data above. The volume is already summed up by unique week and geogkey. All I would need to do is perform the calculation of the volume values for the matching 2020 date rows and 2019 rows.
For my final output, I ONLY want the 2020 week date and NOT the 2019 week date. Could you offer any suggestions. Hope this clarifies the question.
Should not GEOKEYs in 2020 be the same you had in 2019 ?
I suppose that if there are GEOKEY1-GEOKEYn in 2020, most of them if not all,
were in 2019 too.
In order to be able to compute and compare volume of 2020 to that of 2019 it should be on same line
and for that need to separate the year and the week number in a year:
data temp;
set have;
year = year(WEEK);
wk= week(WEEK);
run;
proc sort data=temp; by geokey wk; run;
data want;
merge temp (where=(year=2020))
temp (where=(year=2019) rename=(volume=v2019) drop=year)
by geokey wk;
if missing(v2019) then calculated = .; else
calculated = volume / v2019 *100 - 100;
run;
I suppose that answers what you are looking for.
I do not understand what you are trying to do. In particular, you have
%let prev_date=%sysfunc(intnx(year,cat_period,-1,"sameday"));
which tells me that the variable CAT_PERIOD is being treated as a date value.
So far so good. But later you have
((cat_period)/(&prev_date))*100-100
which suggests you are dividing one date by another. This make no sense to me. For example, what if the PREV_DATE was 01jan1960, whose sas internal value is a zero - which makes for a bad divisor.
I suggest you write a code for a SINGLE date, without the macro loop. Make sure it generates the expected results, and then show us the working code with a description of the apparent attempt to do it for weekly steps.
In general, it's best to write working code for a single instances prior to revising it to work in a parameterized macro. I don't think that happened here.
proc sort data=df1;
by geogkey week;
run;
start_date = '10Mar2019'd;
end_date = '08Mar2020'd;
proc summary data=df1(where=(week=start_date or week=end_date)) noprint;
by week geogkey;
var sc;
output out=df2 /*calculation goes here*/;
run;
It would look similar to this where I would first sort the data by geogkey and week then define the start and end dates in yearly intervals with week ending Sunday.
Then subset the data based on those start and end dates and do the custom calculation on the "sc" variable for those dates. So for example, I wouldn't be subtracting two dates but the values corresponding to those dates.
I did a simple calculation and it worked, but I eventually need to go through every date from march to dec for 2019 and 2020 and perform the custom calculation.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.