BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DangIT
Fluorite | Level 6

Hi,

I have a table of 'qualified records' for January that looks like:

Table1:

Member_ID Transaction_ID Service_Date Prod_Group

1 123 2014-01-05 1

1 124 2014-01-08 3

2 126 2014-01-07 2

3 128 2014-01-26 5

...

I have another table of ALL the history of transactions for each member (HistoryTbl) with all the same variables (plus a bunch more which i've suppressed in this example as I don't believe it'll be relevant to the logic).

I am trying to pull in a number of variables from HistoryTbl to Table1 based on the following criteria:

Look back 365 days from Service_Date of the transaction in Table1 for the Member_ID

IF the Member has made a transaction in the SAME Prod_Group: grab following variables of the latest transaction in that Prod_Group from the HistoryTbl: Service_Date, Product_ID, etc.

IF no transaction in the same Prod_Group: Get same variables as above, for the most recent Prod_Group transaction.

I am unsure of how to approach the programming logic where essentially I am creating dynamic 'look back periods' for each transaction in Table1.

Any help greatly appreciated.

Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
gergely_batho
SAS Employee

Hi,

First sort datasets by Member_ID and Service_Date.

The read them with a set/by statement (interleaving).

When reading from the history dataset, store the needed variables in a hash object. Key will be the Prod_Group.  If key already exist, just replace, since the dataset is sorted by time, and you need only  the latest.

Also store the needed variables globally (not per group). For simplicity, my code stores it in the same hash object, with key=0.

When reading from qualified dataset, retrieve the needed values from the hash object. Of course, first try with key=Prod_Group. If no success, or entry is too old, try with key=0.

All this should be done for each Memeber_ID. I am simply clearing the hash object at the beginning of  a new group(Member_ID).

Please test it with more meaningful data.

data qualified;

input Member_ID Transaction_ID Service_Date:yymmdd10. Prod_Group;

format Service_Date yymmdd10. ;

cards;

1 123 2014-01-05 1

1 124 2014-01-08 3

2 126 2014-01-07 2

3 128 2014-01-26 5

;

run;

data history;

input Member_ID Transaction_ID Service_Date:yymmdd10. Prod_Group Product_ID;

format Service_Date yymmdd10. ;

cards;

1 123 2013-02-05 1 3

1 124 2013-05-08 3 4

2 126 2013-07-07 2 6

3 128 2014-01-26 5 6

1 123 2013-01-05 1 7

1 124 2013-01-08 3 8

2 126 2014-01-07 2 8

3 128 2014-01-26 5 8

;

run;

/*Sort by ID and date*/

proc sort data=qualified;

  by Member_ID Service_Date;

run;

proc sort data=history;

  by Member_ID Service_Date;

run;

data want;

if _n_=1 then do;

  dcl hash lastGroup();

  lastGroup.defineKey('H_Prod_Group');

  lastGroup.defineData('H_Transaction_ID');

    lastGroup.defineData('H_Service_Date');

  lastGroup.defineData('H_Prod_Group');

    lastGroup.defineData('H_Product_ID');

    lastGroup.defineDone();

  /*The following line is just a trick to define H_: variables in the PDV*/

  if 0 then set history(rename=(Transaction_ID=H_Transaction_ID Service_Date=H_Service_Date Prod_Group=H_Prod_Group Product_ID=H_Product_ID));

end;

set qualified(in=inQ) history(in=inH);

by Member_ID Service_Date;

if first.Member_ID then do;

     lastGroup.clear();

end;

if inH then do;/*Storing data about the last transaction per group, and also globally (key:0)*/

  rc=lastGroup.add    (key:Prod_Group,data:Transaction_ID,data:Service_Date,data:Prod_Group,data:Product_ID);

  rc=lastGroup.replace(key:Prod_Group,data:Transaction_ID,data:Service_Date,data:Prod_Group,data:Product_ID);

  rc=lastGroup.add    (key:0         ,data:Transaction_ID,data:Service_Date,data:Prod_Group,data:Product_ID);

  rc=lastGroup.replace(key:0         ,data:Transaction_ID,data:Service_Date,data:Prod_Group,data:Product_ID);

end;

if inQ then do;

  rc=lastGroup.find(key:Prod_Group);/*fetch group specific*/

  putlog rc=;

  if rc~=0 or H_Service_Date<=intnx('day',Service_Date,-365,'s') then do;/*not found or old entry*/

  call missing(of H_:);

  rc=lastGroup.find(key:0);/*fetch global*/

  if rc~=0 or H_Service_Date<=intnx('day',Service_Date,-365,'s') then do;/*not found or old entry*/

  call missing(of H_:);

  end;

  end;

  output;

end;

run;

View solution in original post

17 REPLIES 17
ballardw
Super User

Are your dates SAS dates or character variables? The comparison for a number of days is much easier if they are SAS date values.

DangIT
Fluorite | Level 6

Yes, all SAS Date variables.

DangIT
Fluorite | Level 6

I do not have the most advanced programming skills but here is the logic I am thinking of, and would love any feedback.

I'm not sure how to program the per row processing from Table1 but the idea is:

For each row (Row.) do:

1. From the HistoryTbl, Select * Where HistoryTbl.Member_ID = Row.Member_ID and HistoryTbl.Transaction_ID NE Row.Transaction_ID

          ** This will give me all transactions by that member not including the current transaction being evaluated.

2. Output LAST.Prod_Group from above to get most recent transaction across all Prod_Group for that Member_ID

      ** This will give me the latest transaction by Member by Prod_Group

3. Select Row.Transaction_ID merge by Prod_Group to #2 to get variables related to the SAME Prod_Group (rename the array of variables as InProd_[var_name])

      ** This will give me the latest transaction in the Prod_Group for that Member.

4. From 3 merge by Member_ID from #2 to get variables related to the LAST Prod_Group (rename the array of variables as LastProd_[var_name])

     ** This will give me the latest transaction in Any Prod_Group for that Member.

5. output record to a master table.

END loop.

Does this sound reasonable? Here's where i'm unsure of how I can create this macro inside the datastep to process each row.

Thanks for your help.

Astounding
PROC Star

As you might have guessed, this is a hard problem.  I hope to be able to write up something later tonight.  In the meantime, let's begin with a few questions.

Are MEMBER_ID and PROD_GROUP character or numeric?  (If numeric, what is the largest value?)

How large is the historical data set?  (You may need a lot of memory for this to work, and it may take some time to run.)

Are you allowed to sort the historical data set, and store a copy in the new sorted order?

Also note, there may be too many complications for me to explain them all.  You may want to study up on these topics:  creating a format from a SAS data set (that is, the CNTLIN= option on PROC FORMAT), and the POINT= option on the SET statement.  More later ....

DangIT
Fluorite | Level 6

Thank you for your help on this.

Member_ID is character, Prod_group is a numeric up to 14.

I am currently writing the script on the first month (january) and will be running it every month after. At January 2014 the Historical dataset is 6000 records, with approx 800 records added monthly.

Yes, allowed to sort historical dataset and store a copy.

Astounding
PROC Star

OK, even if the data sets grow this should run quickly enough.  If you were adding 1M records per month, this might not be best.

The idea is to create a format out of the historic file.  Each row of the format translates a combination of MEMBER_ID + PROD_GROUP + a range of dates into the matching observation number in the historic file.  So this is just the code to match based on a matching PROD_GROUP.  A separate, similar set of steps would be needed if the first pass produces no match.

First, create the format.

proc sort data=historic out=historic_sorted;

  by member_ID prod_group;

run;

data ranges;

  set historic_sorted end=done;

  by member_ID prod_group;

  fmtname = '$ranges';

  length label $ 8;

  label = put(_n_-1, 5.);

  retain start;

  if first.prod_group then start = member_id || ' ' || put(prod_group, z2.) || ' ' || put(service_date, yymmdd10.);

  else do;

     end = member_id || ' ' || put(prod_group, z2.) || ' ' || put(service_date-1, yymmdd10.);

     output;

     start = member_id || ' ' || put(prod_group, z2.) || ' ' || put(service_date, yymmdd10.);

  end;

  if last.prod_group then do;

     end = member_id || ' ' || put(prod_group, z2.) || ' ' || put(service_date+364, yymmdd10.);

     output;

  end;

  if done then do;

     hlo='O';

     label='No Match';

     output;

  end;

run;

proc format cntlin=ranges;

run;

That creates the format, although I probably need to look at this with fresh eyes in the morning.

To use the format:

data want;

  set transaction;

  do _i_=service_date to service_date-364 by -1 until (search_result ne 'No Match');

     search_result = put(member_id || ' ' || put(prod_group, z2.) || ' ' || put(_i_, yymmdd10.), $ranges.);

  end;

  if search_result ne 'No Match' then do;

     matching_record = input(search_result, 5.);

     set historic_sorted point=matching_record;

  end;

  else do;

     * Add code to set all variables from historic data to missing;

  end;

run;

So it's hard enough as is, and would need another batch of similar code to perform a follow-up search if no match was found the first time.

Ksharp
Super User

You'd better post these two tables and the output you need .Don't forget to write some words to explain your question clearly .

gergely_batho
SAS Employee

Hi,

First sort datasets by Member_ID and Service_Date.

The read them with a set/by statement (interleaving).

When reading from the history dataset, store the needed variables in a hash object. Key will be the Prod_Group.  If key already exist, just replace, since the dataset is sorted by time, and you need only  the latest.

Also store the needed variables globally (not per group). For simplicity, my code stores it in the same hash object, with key=0.

When reading from qualified dataset, retrieve the needed values from the hash object. Of course, first try with key=Prod_Group. If no success, or entry is too old, try with key=0.

All this should be done for each Memeber_ID. I am simply clearing the hash object at the beginning of  a new group(Member_ID).

Please test it with more meaningful data.

data qualified;

input Member_ID Transaction_ID Service_Date:yymmdd10. Prod_Group;

format Service_Date yymmdd10. ;

cards;

1 123 2014-01-05 1

1 124 2014-01-08 3

2 126 2014-01-07 2

3 128 2014-01-26 5

;

run;

data history;

input Member_ID Transaction_ID Service_Date:yymmdd10. Prod_Group Product_ID;

format Service_Date yymmdd10. ;

cards;

1 123 2013-02-05 1 3

1 124 2013-05-08 3 4

2 126 2013-07-07 2 6

3 128 2014-01-26 5 6

1 123 2013-01-05 1 7

1 124 2013-01-08 3 8

2 126 2014-01-07 2 8

3 128 2014-01-26 5 8

;

run;

/*Sort by ID and date*/

proc sort data=qualified;

  by Member_ID Service_Date;

run;

proc sort data=history;

  by Member_ID Service_Date;

run;

data want;

if _n_=1 then do;

  dcl hash lastGroup();

  lastGroup.defineKey('H_Prod_Group');

  lastGroup.defineData('H_Transaction_ID');

    lastGroup.defineData('H_Service_Date');

  lastGroup.defineData('H_Prod_Group');

    lastGroup.defineData('H_Product_ID');

    lastGroup.defineDone();

  /*The following line is just a trick to define H_: variables in the PDV*/

  if 0 then set history(rename=(Transaction_ID=H_Transaction_ID Service_Date=H_Service_Date Prod_Group=H_Prod_Group Product_ID=H_Product_ID));

end;

set qualified(in=inQ) history(in=inH);

by Member_ID Service_Date;

if first.Member_ID then do;

     lastGroup.clear();

end;

if inH then do;/*Storing data about the last transaction per group, and also globally (key:0)*/

  rc=lastGroup.add    (key:Prod_Group,data:Transaction_ID,data:Service_Date,data:Prod_Group,data:Product_ID);

  rc=lastGroup.replace(key:Prod_Group,data:Transaction_ID,data:Service_Date,data:Prod_Group,data:Product_ID);

  rc=lastGroup.add    (key:0         ,data:Transaction_ID,data:Service_Date,data:Prod_Group,data:Product_ID);

  rc=lastGroup.replace(key:0         ,data:Transaction_ID,data:Service_Date,data:Prod_Group,data:Product_ID);

end;

if inQ then do;

  rc=lastGroup.find(key:Prod_Group);/*fetch group specific*/

  putlog rc=;

  if rc~=0 or H_Service_Date<=intnx('day',Service_Date,-365,'s') then do;/*not found or old entry*/

  call missing(of H_:);

  rc=lastGroup.find(key:0);/*fetch global*/

  if rc~=0 or H_Service_Date<=intnx('day',Service_Date,-365,'s') then do;/*not found or old entry*/

  call missing(of H_:);

  end;

  end;

  output;

end;

run;

DangIT
Fluorite | Level 6

wow thank you.

I don't fully understand it yet, but it worked!

Do you have any links you can direct me to for a deeper understanding of hash objects?

gergely_batho
SAS Employee

I think the SAS doc about hash objects is a good starting point:

SAS(R) 9.3 Language Reference: Concepts, Second Edition

DangIT
Fluorite | Level 6

Hi Gergely,

your solution will give me the the match on Prod_Group, where would you suggest I adjust the code to take into account transactions where there is no previous transactions within the Same product group and pull the latest transaction from ANY product group?

Thanks.

gergely_batho
SAS Employee

The code is supposed to do it. To test it, change the last line of history dataset to:

3 128 2013-05-26 4 8

(If date in history is the same as date in qualified, then this data step ignores it! Because it is not "history" - it happened on the same day.)

DangIT
Fluorite | Level 6

Ah I see, logically, I agree with you, although the business rules says otherwise :smileysilly:

I imagine it is in this block the logic will change to include the same day transactions?

if inQ then do;

  rc=lastGroup.find(key:Prod_Group);/*fetch group specific*/

  putlog rc=;

  if rc~=0 or H_Service_Date<=intnx('day',Service_Date,-&LB_Period_Days,'s') then do;/*not found or old entry*/

  call missing(of H_:);

  rc=lastGroup.find(key:0);/*fetch global*/

  if rc~=0 or H_Service_Date<=intnx('day',Service_Date,-&LB_Period_Days,'s') then do;/*not found or old entry*/

  call missing(of H_:);

  end;

  end;

  output;

end;

gergely_batho
SAS Employee

Almost Smiley Happy

The logic that handles "group found" and "group not found" is there.

But in this case what happens to such contemporary pairs is determined by the order the data step reads them.

So in this case you have to make changes here:

set qualified(in=inQ) history(in=inH);

set history(in=inH) qualified(in=inQ);


Do you also need to take into account observation in the history data set, that are later then the current qualified?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 1693 views
  • 0 likes
  • 6 in conversation