BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hojpoj93
Calcite | Level 5

Dear SAS community:

 

I have a dataset like the following:

 

Weekgeogkeyvolume
3/10/2019   unique_geogkey1                   15000
"All dates in between".... 
12/15/2019 unique_geogkey2     16000
3/8/2020 unique_geogkey1   17000
"All dates in between"... 
12/13/2020unique_geogkey118000

 

I want to perform a computation between the current year date and the corresponding previous year date starting from 3/8/2020 until 12/13/2020. In my dataset, there are 62 unique geogkeys, each of them mapped to all the unique weeks. So total observations = 62*156 weeks. I want to group this by each unique geogkey and week combination. i.e. there is a unique volume value for each week and geogkey combination. 

 

For example, (3/8/2020 - 3/10/2019)*2 = (17000-15633)*2

Do this for each subsequent date until 12/13/2020 and the corresponding previous year (12/15/2019)

 

Is there a method that performs a more complex calculation besides the simple ones in proc summary e.g. sum= means= ?

 

I am currently attempting with a macro code to no avail:

start_date = '08Mar2020'd;
end_date = '13Dec2020'd;

%macro catyrind(start_date,end_date);

    %do cat_period = &start_date %to &end_date;
         %let prev_date=%sysfunc(intnx(year,cat_period,-1,"sameday"));
         proc means data=walmart_catyrind nway noprint;
             class week geogkey;
             var sc; *this variable contains numeric values in the table
             output out=final_wm   ((cat_period)/(&prev_date))*100-100; *This is the necessary calculation
         run;
       
    %end;
%mend;

%catyrind(start_date,end_date);

I know the calculation above is out of order but was testing to see if it can be performed within the proc summary step.

 

 

*The final output for the week variable should ONLY include the corresponding 2020 date and the 2019 week date will be dropped. 

 

The output would look something like this:

Geogkey         Week              calculation

 

unique_id    3/8/2020    (volume of 3/8/2020) / (volume of 3/10/2019) *100-100

.....

unique_id    unique_week_2020year   All volume for weeks in between

......

unique_id    12/13/2020   (volume of 12/13/2020) / (volume of 12/15/2019) *100-100

 

Doesn't have to be a macro, any method would be helpful. Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

Should not GEOKEYs in 2020 be the same you had in 2019 ? 

I suppose that if there are GEOKEY1-GEOKEYn in 2020, most of them if not all,

were in 2019 too.

 

In order to be able to compute and compare volume of 2020 to that of 2019 it should be on same line

and for that need to separate the year and the week number in a year:

data temp;
 set have;
       year = year(WEEK);
       wk= week(WEEK);
run;
proc sort data=temp; by geokey wk; run;
data want;
  merge temp (where=(year=2020))
        temp (where=(year=2019) rename=(volume=v2019) drop=year)
    by geokey wk; 
        if missing(v2019) then calculated = .; else
        calculated = volume / v2019 *100 - 100;
run;

I suppose that answers what you are looking for.

 

 

 

 

View solution in original post

10 REPLIES 10
Shmuel
Garnet | Level 18

Clarification  needed:

1) In your example you use different day in computation: (8 vs 10)

For example, (3/8/2020 - 3/10/2019)*2 = (17000-15633)*2

   but in your code you compute with the same day:

%let prev_date=%sysfunc(intnx(year,cat_period,-1,"sameday"));

Are you looking first/following occurrence in 2020 vs. 2019 or for the sameday ?

 

2) Is WEEK unique in the data set or are there more than one observation per week?

    When unique you just need union data for same month-day observation and compute

    the single pair and you don't need PROC MEANS.

hojpoj93
Calcite | Level 5

1) I should have clarified - All dates end on SUNDAY and I want to find the corresponding date from the previous year that also ends on SUNDAY. Although they are written as single dates, 3/8 & 3/10 technically represent data for week ENDING on SUNDAY. Hence, they won't exactly align. I'm presuming I need to write a custom interval because I don't believe any default arguments are available. 

 

2) Yes, week is UNIQUE within the data set. I am calculating data for each week-geogkey pair. 

Could you show an example of your union data method here?

 

Thank you! 

Shmuel
Garnet | Level 18

I understand that there is one observation per week per year, therefore I created two new variables - YEAR and WEEK from the date variable you called "week".

 

I have created a test data to contain all available week-dates from the presented data in your post.

then merged 2019 data with 2020 data by week number.

 

I did not developed the statistic computing step, as I am not sure if next code realy helps you.

Please run and check the intermediate as the last output dataset and express your opinion about it.

 

The code is:

/* Calculating custom formula - 
*  between current and previous date 
*  with variable to be analyzed
*************************************/
data test;
 infile cards dlm='09'x truncover;
 input week :mmddyy10. geokey1 geokey2 $ geokeyn $;
cards;
3/10/2019	15633	x1	y1
12/15/2019	16000	x2	y2
3/8/2020	17000	x3	y3
12/13/2020	18000	x4	y4
; run;
data have;  /* creating a full test data */
 set test;
  retain prevwk; 
  if _N_=1 then do;
     prevwk = week;
     year = year(week);
     wk = week(week);
	 output;
	 return;
  end; 
  if week - prevwk > 7 then do; /* assumed one obs per week */
     lastwk = week;
	 week = prevwk +7;
	 do until (week > lastwk);
        year = year(week);
        wk = week(week);
	    output;
		prevwk = week;
		week+7;
	 end;
  end;
  drop week prevwk lastwk;
run;	 
  
data union;
 format wk;
 merge have(where=(year=2019) rename=(geokey1=gk2019_1 geokey2=gk2019_2 geokeyn=gk2019_n))
       have(where=(year=2020) rename=(geokey1=gk2020_1 geokey2=gk2020_2 geokeyn=gk2020_n));
  by wk;
  drop year;
run;   
hojpoj93
Calcite | Level 5

Thanks for the code! @Shmuel  This is the resulting dataset.

:catyrind pic.png

 

This is good because each there's observations for each week by year. However, I realized I set up the original data inefficiently and would have to transpose this again to get the final output. Sorry for the mixup. To save myself an unnecessary step, I changed the structure of the same data above. The volume is already summed up by unique week and geogkey. All I would need to do is perform the calculation of the volume values for the matching 2020 date rows and 2019 rows. 

For my final output, I ONLY want the 2020 week date and NOT the 2019 week date. Could you offer any suggestions. Hope this clarifies the question. 

Shmuel
Garnet | Level 18

Should not GEOKEYs in 2020 be the same you had in 2019 ? 

I suppose that if there are GEOKEY1-GEOKEYn in 2020, most of them if not all,

were in 2019 too.

 

In order to be able to compute and compare volume of 2020 to that of 2019 it should be on same line

and for that need to separate the year and the week number in a year:

data temp;
 set have;
       year = year(WEEK);
       wk= week(WEEK);
run;
proc sort data=temp; by geokey wk; run;
data want;
  merge temp (where=(year=2020))
        temp (where=(year=2019) rename=(volume=v2019) drop=year)
    by geokey wk; 
        if missing(v2019) then calculated = .; else
        calculated = volume / v2019 *100 - 100;
run;

I suppose that answers what you are looking for.

 

 

 

 

hojpoj93
Calcite | Level 5
Yes, you're correct. I should have specified. There are 62 unique geogkeys
that are mapped to all the weeks. So you would be comparing the same
geogkeys for 2019 and 2020 when doing the calculation.

This solution worked. Thank you so much!
hojpoj93
Calcite | Level 5
Yes, you're correct. I should have specified. There are 62 unique geogkeys that are mapped to all the weeks. So you would be comparing the same geogkeys for 2019 and 2020 when doing the calculation.

This solution worked. Thank you so much!
mkeintz
PROC Star

I do not understand what you are trying to do.  In particular, you have

%let prev_date=%sysfunc(intnx(year,cat_period,-1,"sameday"));

which tells me that the variable CAT_PERIOD is being treated as a date value.

 

So far so good.  But later you have

((cat_period)/(&prev_date))*100-100

which suggests you are dividing one date by another.  This make no sense to me.  For example, what if the PREV_DATE was 01jan1960, whose sas internal value is a zero - which makes for a bad divisor.

 

I suggest you write a code for a SINGLE date, without the macro loop.  Make sure it generates the expected results, and then show us the working code with a description of the apparent attempt to do it for weekly steps.

 

In general, it's best to write working code for a single instances prior to revising it to work in a parameterized macro.  I don't think that happened here.

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
hojpoj93
Calcite | Level 5
proc sort data=df1;
by geogkey week;
run;

start_date = '10Mar2019'd;
end_date = '08Mar2020'd;

proc summary data=df1(where=(week=start_date or week=end_date)) noprint;
by week geogkey;
var sc;
output out=df2 /*calculation goes here*/;
run;

It would look similar to this where I would first sort the data by geogkey and week then define the start and end dates in yearly intervals with week ending Sunday. 

 

Then subset the data based on those start and end dates and do the custom calculation on the "sc" variable for those dates. So for example, I wouldn't be subtracting two dates but the values corresponding to those dates. 

 

I did a simple calculation and it worked, but I eventually need to go through every date from march to dec for 2019 and 2020 and perform the custom calculation. 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 669 views
  • 0 likes
  • 3 in conversation