<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: SAS Dataflux Data Management Studio 2.4 in SAS Data Management</title>
    <link>https://communities.sas.com/t5/SAS-Data-Management/SAS-Dataflux-Data-Management-Studio-2-4/m-p/145033#M2340</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Steven,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Data Management Studio has a few nodes in Data Jobs, when used in conjunction, can help you solve your problem.&amp;nbsp; The first node to be aware of is the Matchcode node which is under the Entity Resolution section.&amp;nbsp; This node generates fuzzy representations of input data which can be used for matching purposes.&amp;nbsp; In your case, you have provided a single field of organization names, so you&amp;nbsp; would generate a matchcode using the “Organization” definition (if your locale and definition list is blank, then you will need to download a Quality Knowledge Base from the SAS website to enable this functionality).&amp;nbsp; Since you have only provided a single field, you will need to push that sensitivity up to 95 to prevent over matching as much as possible.&amp;nbsp; One word of warning, you will get lots of overmatching when matching on a single field.&amp;nbsp; Matchcode definitions have been designed to use with combinations of fields such as Name and Address or Name and Phone.&amp;nbsp; If you have other attributes available for you to match on, go ahead and include those as well.&amp;nbsp; You will need to generate matchcodes for each of your data sources that you are joining together.&amp;nbsp; In your example, this would be both columns of your dataset.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Now that you have a fuzzy representation of the data, you need to bring the data together for matching purposes.&amp;nbsp; This now comes down you your personal preference on how you want the data brought together.&amp;nbsp; The first approach is to Union (under Data Integration) the sources together.&amp;nbsp; This will bring the matchcode and the company names together in a single column.&amp;nbsp; You can then use a Clustering (under Entity Resolution) step on matchcode to group like items together and assign them a common cluster id and sorting the records by that cluster id.&amp;nbsp; Finally, you can use a Surviving Record Identification (under Entity Resolution).&amp;nbsp; This step allows you to choose a best record per the rules you have set. &lt;/P&gt;&lt;P&gt;Another option is to Join (under Data Integration) your two data sources together where the matchcodes from both sources match each other.&amp;nbsp; Then carry forward the Company name from data source A and the Org name from data source B.&amp;nbsp; By default, this is an inner join so this node will only output matching fields.&amp;nbsp; In your case, you wanted records that didn’t match as well, so you would need to perform a full outer join.&amp;nbsp; For non-matching records, the field name will be null for the other table.&amp;nbsp; Then you can pass these through an expression statement that will create a new string column for your final company name and issues an if statement to pull the right data:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; string company&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if isnull(comp_name)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; company = org_name&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if isnull(org_name)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; company = comp_name&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; else&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; company = comp_name&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This approach will result in overmatching as well as records being repeated records.&amp;nbsp; For instance FX Alliance Inc in your sample is a Fuzzy match to FX Solutions and FX Alliance.&amp;nbsp; This is an overmatch because Solutions and Alliance are both noise words in the Organization definition, thus are stripped and look the same (see warning above).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;BR /&gt;Mike&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Tue, 26 Nov 2013 15:09:33 GMT</pubDate>
    <dc:creator>MikeFrost</dc:creator>
    <dc:date>2013-11-26T15:09:33Z</dc:date>
    <item>
      <title>SAS Dataflux Data Management Studio 2.4</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/SAS-Dataflux-Data-Management-Studio-2-4/m-p/145030#M2337</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Dear all,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have 2 sets of data from two different sources.&lt;/P&gt;&lt;P&gt;I would like to merge these 2 sets of data.&lt;/P&gt;&lt;P&gt;However, the basis for merging the data is by company name.&lt;/P&gt;&lt;P&gt;As you may know, company names are, however, not standardized (for example, ABC Ltd, ABC Limited,etc).&lt;/P&gt;&lt;P&gt;Thus, I have used SAS Dataflux Data Management Studio 2.4 with the hope that it will assist me.&lt;/P&gt;&lt;P&gt;Since I am new to the software, I find it hard. In fact, I have already spent few days to understand this software.&lt;/P&gt;&lt;P&gt;Attached is a sample of my data (in excel).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The first column (Comp_name) refers to company name from database 1 while the second column (Org_name) refers to company name from database 2.&lt;/P&gt;&lt;P&gt;What I would like to have is:&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;1) If a company exists in both databases, then company name as per column 2 (Org_name) appears in third column.&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;2) If a company exists only in first database, then company name as per column 1 (Comp_name) appears in third column&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;3)If a company exists only in second database, then company name as per column 1 (Comp_name) appears in third column.&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I hope any of you with good knowledge in Dataflux could assist me. My sincerest advance thank you for your time.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Steven&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 15 Nov 2013 13:31:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/SAS-Dataflux-Data-Management-Studio-2-4/m-p/145030#M2337</guid>
      <dc:creator>stevennair</dc:creator>
      <dc:date>2013-11-15T13:31:46Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Dataflux Data Management Studio 2.4</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/SAS-Dataflux-Data-Management-Studio-2-4/m-p/145031#M2338</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Steven,&lt;/P&gt;&lt;P&gt;Simply add an expression node with the following code:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/*pre-expression*/&lt;/P&gt;&lt;P&gt;string column_name_3&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/*expression*/&lt;/P&gt;&lt;P&gt;if not isNull(column_name_1) and not isNull(column_name_2)&lt;/P&gt;&lt;P&gt;then column_name_3 = column_name_2&lt;/P&gt;&lt;P&gt;else&lt;/P&gt;&lt;P&gt;if not isNull(column_name_1) and isNull(column_name_2)&lt;/P&gt;&lt;P&gt;then column_name_3 = column_name_1&lt;/P&gt;&lt;P&gt;else&lt;/P&gt;&lt;P&gt;if isNull(column_name_1) and not isNull(column_name_2)&lt;/P&gt;&lt;P&gt;then column_name_3 = column_name_2&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This checks for non-null values in company_name_1 and company_name_2 and populates the correct company name in the company_name_3 column.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;-shawn&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 15 Nov 2013 14:16:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/SAS-Dataflux-Data-Management-Studio-2-4/m-p/145031#M2338</guid>
      <dc:creator>skillman</dc:creator>
      <dc:date>2013-11-15T14:16:16Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Dataflux Data Management Studio 2.4</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/SAS-Dataflux-Data-Management-Studio-2-4/m-p/145032#M2339</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Skillman,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks for replying and appreciate that.&lt;/P&gt;&lt;P&gt;Sorry for my earlier post that did not mention my intention clearly.&lt;/P&gt;&lt;P&gt;What I really wanted was sas dataflux to do the matching in column 1 and column 2 since the company names are not standardized (for eg: ABC Ltd with ABC Limited, etc).&lt;/P&gt;&lt;P&gt;After matching, only then copy the the matched companies to column 3.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;thanks.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;regards,&lt;/P&gt;&lt;P&gt;steven&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 20 Nov 2013 08:45:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/SAS-Dataflux-Data-Management-Studio-2-4/m-p/145032#M2339</guid>
      <dc:creator>stevennair</dc:creator>
      <dc:date>2013-11-20T08:45:14Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Dataflux Data Management Studio 2.4</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/SAS-Dataflux-Data-Management-Studio-2-4/m-p/145033#M2340</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Steven,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Data Management Studio has a few nodes in Data Jobs, when used in conjunction, can help you solve your problem.&amp;nbsp; The first node to be aware of is the Matchcode node which is under the Entity Resolution section.&amp;nbsp; This node generates fuzzy representations of input data which can be used for matching purposes.&amp;nbsp; In your case, you have provided a single field of organization names, so you&amp;nbsp; would generate a matchcode using the “Organization” definition (if your locale and definition list is blank, then you will need to download a Quality Knowledge Base from the SAS website to enable this functionality).&amp;nbsp; Since you have only provided a single field, you will need to push that sensitivity up to 95 to prevent over matching as much as possible.&amp;nbsp; One word of warning, you will get lots of overmatching when matching on a single field.&amp;nbsp; Matchcode definitions have been designed to use with combinations of fields such as Name and Address or Name and Phone.&amp;nbsp; If you have other attributes available for you to match on, go ahead and include those as well.&amp;nbsp; You will need to generate matchcodes for each of your data sources that you are joining together.&amp;nbsp; In your example, this would be both columns of your dataset.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Now that you have a fuzzy representation of the data, you need to bring the data together for matching purposes.&amp;nbsp; This now comes down you your personal preference on how you want the data brought together.&amp;nbsp; The first approach is to Union (under Data Integration) the sources together.&amp;nbsp; This will bring the matchcode and the company names together in a single column.&amp;nbsp; You can then use a Clustering (under Entity Resolution) step on matchcode to group like items together and assign them a common cluster id and sorting the records by that cluster id.&amp;nbsp; Finally, you can use a Surviving Record Identification (under Entity Resolution).&amp;nbsp; This step allows you to choose a best record per the rules you have set. &lt;/P&gt;&lt;P&gt;Another option is to Join (under Data Integration) your two data sources together where the matchcodes from both sources match each other.&amp;nbsp; Then carry forward the Company name from data source A and the Org name from data source B.&amp;nbsp; By default, this is an inner join so this node will only output matching fields.&amp;nbsp; In your case, you wanted records that didn’t match as well, so you would need to perform a full outer join.&amp;nbsp; For non-matching records, the field name will be null for the other table.&amp;nbsp; Then you can pass these through an expression statement that will create a new string column for your final company name and issues an if statement to pull the right data:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; string company&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if isnull(comp_name)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; company = org_name&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if isnull(org_name)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; company = comp_name&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; else&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; company = comp_name&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This approach will result in overmatching as well as records being repeated records.&amp;nbsp; For instance FX Alliance Inc in your sample is a Fuzzy match to FX Solutions and FX Alliance.&amp;nbsp; This is an overmatch because Solutions and Alliance are both noise words in the Organization definition, thus are stripped and look the same (see warning above).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;BR /&gt;Mike&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 26 Nov 2013 15:09:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/SAS-Dataflux-Data-Management-Studio-2-4/m-p/145033#M2340</guid>
      <dc:creator>MikeFrost</dc:creator>
      <dc:date>2013-11-26T15:09:33Z</dc:date>
    </item>
  </channel>
</rss>

