BookmarkSubscribeRSS Feed
jkf91
Calcite | Level 5

I have a large scanner panel data set, which tracks consumptions made by households.

What I'm interested in doing is the following:

I want to (eventually) count the number of purchase observations 3, 6, 12 months before and after specific dates.

I was told that the best way to go about doing this is by creating LET statement.

However, I have no clue hwo this can be done... Can someone help please?

Currently, the data looks as below:

Household_ID          Purchase_Date          Specific_Date          X          Y          Z

1                             11/22/07                       3/24/08

1                              3/25/08                        3/24/08

1                              4/28/08                        3/24/08

1                              1/30/09                        3/24/08

2

So, for household_id=1,

for 3 months interval, i want to have two observations: 3/25/08 4/28/08

for 6 months interval, i want to have three observations: 11/22/07 3/25/08 4/28/08

for 12 months interval, i want all observations

In the end, I need to create additional summary statistics for observations that fall belong to above time intervals. Such as avg(x), avg(y), avg(z)...

any comments would be greatly appreciated. please let me know if anything is not clear

7 REPLIES 7
art297
Opal | Level 21

If specific_date already exists in your file, and that is the date you are seeing if a particular purchase date falls within x months before or after, then I don't see how creating a macro variable would help.  If the field doesn't exist, then you could create the variable before running a datastep, and not take up any file space.

What you are looking for, I think, is simply the intnx function to create variables x, y and z that will be set to 1 if a purchase_date is within 3, 6 and 12 months of a Specific_Date.  You could do that with code like:

data want;

  set have;

  if intnx('month', Purchase_Date, -3, 'S' ) < Specific_Date <

    intnx('month', Purchase_Date, 3, 'S' ) then x=1;

  if intnx('month', Purchase_Date, -6, 'S' ) < Specific_Date <

    intnx('month', Purchase_Date, 6, 'S' ) then y=1;

  if intnx('month', Purchase_Date, -12, 'S' ) < Specific_Date <

    intnx('month', Purchase_Date, 12, 'S' ) then z=1;

run;

If Specific_Date didn't exist in your file you could create a macro variable .. before your datastep .. something like:

%let Specific_Date=24mar2008;

data want;

  set have;

  if intnx('month', Purchase_Date, -3, 'S' ) < Specific_Date <

    intnx('month', Purchase_Date, 3, 'S' ) then x=1;

  if intnx('month', Purchase_Date, -6, 'S' ) < Specific_Date <

    intnx('month', Purchase_Date, 6, 'S' ) then y=1;

  if intnx('month', Purchase_Date, -12, 'S' ) < Specific_Date <

    intnx('month', Purchase_Date, 12, 'S' ) then z=1;

run;

robby_beum
Quartz | Level 8

I agree with Art - I don't see how a macro variable will help...I cam up with this code example:

If I'm understanding correctly, you want to populate x, y, x with a numeric 1 so you can summarize and rollup by household_id
and then calculate the averages.

My sas is currently busy so I couldn't test but you could do somthing along the lines of this psuedo code.

data dataset2;
   set dataset1;

   three_month_int=intchk('MONTH3', purchase_date, specific_date);
   six_month_int=intchk('MONTH6', purchase_date, specific_date);
   twelve_month_int=intchk('MONTH12', purchase_date, specific_date);

   if three_month_int=1 then x=1;
   if six_month_int=1 then
      do;
         x=1;
         y=1;
      end;
   if twelve_month_int=1 then
      do;
         x=1;
         y=1;
         z=1;
      end;
run;

proc sql exec;
   create table dataset3 as
   select *,
          sum(x) as sum_x,
          sum(y) as sum_y,
          sum(z) as sum_z
   from dataset2;
   group by household_id
quit;

data dataset4;
   set dataset3;
   by household_id;

   if first.household_id;
run;

art297
Opal | Level 21

Robby,

I presume you meant intck.  The reason I suggested the intnx function instead was that the former only counts boundaries .. not months from a particular date.  Thus your code would say that anything that happened in April would be 3 months from January, regardless of the actual dates.

DLing
Obsidian | Level 7

I think this boils down to finding records that fall within some time window and then computing meaningful metrics for them, by household.

This is something the macro %let statement won't even come close to solving on its own.

I'm hoping the [t1,t2] window is the same for all households, which makes it far easier, and usually would make sense.  If so, a reasonable approach might look like,

1) sort by household_id Purchase_date.

2) use proc sql or data step to: a) select records, and b) compute total purchase counts and dollars.

3) repeat for different [t1,t2] windows, e.g., 3 mth, 6 mth, annual, ...

4) join the separate summarized results together by household if necessary.

Alternatively,

1) sort by household_id Purchase_date.

2) use proc sql or data step to create house_hold monthly totals.  Essentially change the data from the "transaction" dataset to a monthly summary dataset.

3) Then you can read the monthly summary dataset (one record for each household, every occuring month), transpose it to a wide time series view (one record for each household, month appear as additional columns).

4) Compute 3,6,12 month statistics using arrays or plain statements from the wide time series view.

That's all I can think of for now.  Hope this helps and good luck!

jkf91
Calcite | Level 5

thank you all for sharing your expertise.

the specific dates are different for each individual (e.g.., date joined).

I have three specific dates for each individual. So... perhaps I need to create many variables.

Initially, by X, Y, Z, I didn't mean indicator variables but real values (e.g, dollar_spent)

DLing
Obsidian | Level 7

Ah, you're doing tenure analysis at the household level.  Your aggregation code should look something like this:

proc sort data=purchases;

     by houseid joindate purchdate;     /*  Each house should only have 1 joindate  */

run;

data summarized;

     set purchases;

     by houseid;

     days = purchdate - joindate;                     /*  presuming these are SAS date values     */

     days = intck( 'day', joindate, purchdate );     /*  same as above, but amenable to other ways of counting  */

     mths = intck( 'month', joindate, purchchdate ); /*  careful since it counts month boundaries  */

     /*  or any variant that works according to your sense of what is a month  */

     if first.houseid then do;

          cnt30 = 0;     val30 = 0;

          cnt60 = 0;     val60 = 0;

          ... etc

     end;

     ... cumulative counters & values ...

     if dur <= 30 then do;

          cnt30 + 1;

          val30 + amount;

     end;

     if dur <= 60 then do;

          cnt60 + 1;

          val60 + amount;

     end;

     if dur <= 90 then do;

          cnt90 + 1;

          val90 + amount;

     end;

     ... etc  Typical problem with this is that it ignores seasonality in the data.

     ... Sometimes you want tenure view (since joindate), sometimes calendar view (at December of the year).

     ... sometimes people want non-overlapping buckets rather than cumulatives, such as

     if dur <= 30 then do;

          cnt30 + 1;

          val30 + amount;

     end;

     else if dur <= 60 then do;

          cnt60 + 1;

          val60 + amount;

     end;

     else if dur <= 90 then do;

          cnt90 + 1;

          val90 + amount;

     end;

     if last.houseid then output;

run;

These are the sorts of processing of transaction oriented data stream into either tenure/vintage views or calendar date views.

Ksharp
Super User

You do not post the output you want. you said you have three special days for each individual(house_id?),but in your post ,there is only one special day.

Maybe you want to use %let month_interval=3 ,to select 3,6,12 month interval?

So it is hard to code something without seeing your whole data.

Ksharp

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1108 views
  • 3 likes
  • 5 in conversation