BookmarkSubscribeRSS Feed
Rain
Obsidian | Level 7

Hi.

I am using a compute window with data quality QKB functions and they return duplicate events I need to discard in next window.

Here's an example:

 

This compute window takes company names as inputs. Single unique row that acts as a key field:

Company AB Industrial

Company Mechanics Ltd

Company BA Industrial

etc

 

Then I apply this function:

            <field-expr><![CDATA[bf3.matchcode("Name", 95, CompanyName, result95) return result95]]></field-expr>

Output is company name and matchcode:

Company AB Industrial, 42&BF&7R7B4#7B8$$$$$$$$$$$$

Company Mechanics Ldt, 42&BF&7Y~Y&87BF$$$$$$$$$$$$

Company BA Industrial, 42&BF&7R7B4#7B8$$$$$$$$$$$$

 

Because I am using matchcode sensitivity 95 then first and third company get similar matchode values. My aim is to have a lookup window with unique matchcode values so I can compare these to data coming from another source window.

Any suggestion how I can get rid of duplicate matchode values? This means that input would be company name, matchode but output would be only unique matchode(this would become unique key field also). Ideally my previous example would produce only two rows:

42&BF&7R7B4#7B8$$$$$$$$$$$$

42&BF&7Y~Y&87BF$$$$$$$$$$$$

I tried union window(strict="false" output-insert-only="true") as it should prevent duplicate outputs but it seems to work only when duplicates origin from different windows that are connected to union window. We are using ESP 4.3.

2 REPLIES 2
AndyT_SAS
SAS Employee

Hi Rain,

This is a little tricky.  How many match codes could you potentially get?  An aggregate window could group events by the matchcode.  But you would need to worry about storage growth in the aggregate window because nothing will remove the old events.

.

You could write a Python or DS2 routine with a hash table that uses the matchcode has the hash key.  Then, you could keep a record of the matchcode values that have been processed.  An event would only be output if it was a new matchcode.  That wouldn't have any memory concerns, as you could make the window pi_EMPTY.

Thanks,

Andy

Rain
Obsidian | Level 7
Thank you. I'll look into the proposed approaches.

Whether you're already using SAS Event Stream Processing or thinking about it, this is where you can connect with your peers, ask questions and find resources.

 

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 913 views
  • 0 likes
  • 2 in conversation