We’re smarter together. Learn from this collection of community knowledge and add your expertise.

Finding all single linked chains: the allChains macro

by Respected Advisor on ‎01-25-2017 12:58 AM (1,001 Views)

This article presents a tool called allChains for dealing with data representing single linked chains. Single linked chains are often used to record successive changes in a system. For instance, you might create a new profile record each time a user changes his profile. This change could be recorded in a table as OldRecordId - NewRecordId - DateOfChange. Following the OldRecordId - NewRecordId links, you can recreate the evolution of a given profile. To help this task, allChains produces an augmented table containing OldRecordId - NewRecordId - DateOfChange - OriginalRecordId - CurrentRecordId - OrderOfLinkInChain.

 

The allChains macro identifies the first (original) and last (current) record in every chain and the order of every link in its chain. The processing is performed efficiently with a SAS hash object and therefore requires that your data fit into memory.

 

A word of caution. It is expected that single linked chains follow certain rules.

  1. Chains must be disjoint. Links cannot be duplicated and chains cannot split or merge.
  2. Chains cannot contain cycles.
  3. Chains are limited to 100 links.

Failure to comply with these rules will likely generate incorrect results.

The macro and a small example are attached.

Attachment
Comments
by Valued Guide
on ‎01-27-2017 01:40 PM

Hello PGStats, thank you so much for putting this out here.  Great solution.  I do have a two questions though.  If I run this on a small sample the _last_ field does give me what I'm expecting and suits my needs great (thanks again).  BUT I'm getting a repeated ERROR. Duplicate key. in the log.  The code does run but I'd still like to correct that error.  Any thoughts?

 

Also when I run the entire dataset of 33 million rows I get the following error after about 5-10 minutes, is this a issue with my available resources?

An error occured executing the workspace job "All Chains".
The server is disconnected.

by Community Manager
on ‎01-27-2017 03:18 PM

Hi @Steelers_In_DC - I can't answer the first question but as far as "server is disconnected" -- in EG that means the session crashed, became unresponsive, or the connection was interrupted.  There are ways to collect more diagnostics, but if you're running an intense job that might cause the session to go down, I guess that's where you look first.

by Valued Guide
on ‎01-27-2017 04:25 PM

Chris, thanks.  Can you give direction on how I would go about getting more diagnositcs?

by Respected Advisor
on ‎01-27-2017 11:33 PM

ERROR: Duplicate key. signals a split chain, i.e. the presence of many links with the same from value. Try, for example:

 

data have;
input DATE :date9. to from;
format date yymmdd10.;
datalines;
27-Sep-16	10	6  <-- Duplicate from value signals split chain
27-Sep-16	7	5
17-Jun-16	6	4
10-Sep-16	9	8
5-Mar-16	4	2
27-Sep-16	15	6  <-- Duplicate from value signals split chain
;

%allChains(have, want, id=date);

 

For a large dataset like yours, you could modify the macro statement

 

declare hash links();

to

declare hash links(hashexp : 12);

to increase the number of hash bins from the default 256 to 2^12 = 4096 and improve performance.

 

 

 

by Community Manager
on ‎01-29-2017 11:27 AM

@Steelers_In_DC - You would need to turn on SAS Workspace Logging (usually an admin task if you have a shared SAS environment) to capture the SAS session log.

by Valued Guide
on ‎02-03-2017 02:17 PM

I'm have two columns deduped and I still get the duplicate key error.  I tried hashexp : 12 but still get the same results.  I tried running the 16 digit fields as character and numeric but no change.

by Respected Advisor
on ‎02-03-2017 03:41 PM

The 'from' column should contain unique values. Remember: the chains are assumed to be disjoint. 

Contributors
Your turn
Sign In!

Want to write an article? Sign in with your profile.


Looking for the Ask the Expert series? Find it in its new home: communities.sas.com/askexpert.