DATA Step, Macro, Functions and more

Recursive pass through data step to identify parts of a group

New Contributor
Posts: 4

Recursive pass through data step to identify parts of a group

I've been swirling on how to complete this, so thought I'd ask some better minds!


I'm trying to flag order lines that are part of a discounted bundle.  I have product data at the line level for each order, and the bundles are identified by if they contain the necessary SKUs.  I want to keep the data at the line level, and add a binary flag for lines that are members of a complete bundle.


I have 68 bundles to identify, and they can range from two to five SKUs making up a bundle.  I'm thinking I need some sort of data step, using first. logic, and then maybe using a macro to run it for each unique bundle.


Below is an example of what I'm looking for:


183567681996TGD25 1
183567682001TGD27 0
183567681997TGD29 1
183567682002XGD04 0
183567681998GJD47 0


I already have the data in the three left columns and just want to add the right column ("Bundle").  The SKUs TGD25 and TGD29 make up a valid bundle.  To complicate things, the SKU TGD27 is part of a different bundle, but NOT this one.  


Also, there may be cases where the order might contain two of one SKU needed, and one of the others (so TGD25, TGD25, TGD29), but I would only want to flag one of the TGD25 SKU's, since there is only one complete bundle.


Any ideas on how to accomplish this would be GREATLY appreciated!  Please let me know if I can add any more details.

Respected Advisor
Posts: 3,908

Re: Recursive pass through data step to identify parts of a group

If I understand this right then a bundle is a defined set of SKU's within the same order. Is it possible for you to create a reference table with 2 columns {SKU, Bundle_No} and post this data as well.


What would be helpful (and you've done part of it already):

- Post two data steps creating sample data (order data, sample data)

- Clearly define the bundeling rules (same order, all SKU's exist in order); is it possible and what should happen if 2 full bundles are in an order

- Post the desired result for your sample data.

New Contributor
Posts: 4

Re: Recursive pass through data step to identify parts of a group

Hi @Patrick,


Yes, a bundle is a defined set of SKUs within the same order.  To @mkeintz's question, a single SKU could be an element of more than one bundle (which makes this even more "fun").  So, in creating a reference table, I'm struggling with how to accomidate that fact.  My current thought is to just define a macro group for each bundle:


%let b1 = ('TGD25', 'TGD38', 'XGD02');

%let b2 = ('TGD25', 'TGD29');


%let b68 = ('PND56', 'PND59');


A very small sample dataset:

data mydata;

input order_id line_id SKU $;


12345 1234 TGD25

12345 1235 TGD38

12345 1236 CLX30

12345 1237 XGD02

12345 1238 BFD01

67891 2345 BLK89

67891 2346 TGD25

67891 2347 CLK89

67891 2348 TGD25

67891 2349 TGD29

56789 4512 TGD38

56789 4513 BLK78

56789 4514 TGD25

56789 4515 CLT98

56789 4516 BLH67

56789 4517 CLX89




In this sample,

---order 12345 has one bundle (as defined in &b1: 'TGD25', 'TGD38', 'XGD02')

---order 67891 has one bundle (as defined in &b2: 'TGD25', 'TGD29'), but also has an extra sku ('TGD25') that should not be counted

---order 56789 has two valid skus for &b1 ('TGD25', 'TGD38'), but does not contain the third and therefore has no bundle.


To both your points, I think the rules are:

1. All SKUs must be on a single order

2. There can be more than one bundle per order, but each SKU still just needs a 1/0 flag

3. Partial bundles (extra SKUs not completing a bundle) should not be flagged


I'd like to keep everything at the line level.  The very end result, after this gets merged in with lots of other orders, is that I could sum the demand of lines that are flagged as part of a full bundle.


For this sample, the desired final dataset would be:




I have been trying to do a datastep that counts the number of each SKU on the order, then takes the minimum for those counts (to get how many valid bundles there are), but struggle with how to keep the unique line_ids through all that so I can go back and flag the lines.


For example, with order 67891 above, if I summed by order_id, by SKU, I would have count_TGD25 = 2 and count_TGD29 = 1.  If I take the min(count_TGD25, TGD29), and get 1, then I know I have one full bundle...but then how to I take that back to flag the first instance of the 'TGD25' SKU, but not the second instance?

Valued Guide
Posts: 797

Re: Recursive pass through data step to identify parts of a group

What if there is a bundle defined as    TGD25 and TGD26

  and another bundle defined as TGD26 and TGD27


and you have a order with all three SKU's.  What is the rule for assigning a bundle indicator?


Valued Guide
Posts: 797

Re: Recursive pass through data step to identify parts of a group

And while you're compiling the information requested by @Patrick, a question:


Can a given SKU belong to more than one bundle?


If so, the question 2:  if a given order_id has two bundles that share an SKU (and that SKU occurs only once), how do you decide which bundle is to be assigned?

Ask a Question
Discussion stats
  • 4 replies
  • 3 in conversation