<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Guidance Needed: Merging Two SAS Datasets with Inconsistent Key Formats in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Guidance-Needed-Merging-Two-SAS-Datasets-with-Inconsistent-Key/m-p/967181#M376298</link>
    <description>Better post some real data to demonstrate these 6 scenarios you mentioned ,and post the desired output .&lt;BR /&gt;So other sas users could understand your requirement better and faster.</description>
    <pubDate>Thu, 22 May 2025 01:34:02 GMT</pubDate>
    <dc:creator>Ksharp</dc:creator>
    <dc:date>2025-05-22T01:34:02Z</dc:date>
    <item>
      <title>Guidance Needed: Merging Two SAS Datasets with Inconsistent Key Formats</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Guidance-Needed-Merging-Two-SAS-Datasets-with-Inconsistent-Key/m-p/967180#M376297</link>
      <description>&lt;P&gt;Dear SAS Experts,&lt;/P&gt;&lt;P&gt;I am currently working on merging two datasets, dataset1 and dataset1, using a common column x. However, I’m facing challenges due to inconsistencies in the format and length of the values in this column across the two datasets. Below are the different scenarios I’ve encountered:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;P&gt;The x column in dataset1&amp;nbsp;contains missing values, whereas the corresponding values are present in x column in dataset2.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;The x column in dataset1 contains 3-digit values (some starting with zero), but the same values appear as single or double digits in x column in dataset2.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;In some cases, x column in dataset1 has 3 or 5 digits, while in&amp;nbsp;dataset2 the corresponding value appears as a shorter form. For these, I need to match based on the last 2 digits.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;In other cases, x column in dataset1 has 3 digits, while in x column in dataset2 the corresponding value appears as a 5-digit code. For these, I need to match based on the &amp;nbsp;last 3 digits.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;When x column has 5 digits in both datasets, the records should match exactly.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;If x column in dataset1 has 5 digits and in x column in dataset2 has 14 digits, I need to match based on the last 5 digits.&lt;/P&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Could you please suggest an efficient and reliable approach to standardize or transform these values and perform a successful merge that accommodates these scenarios?&lt;/P&gt;&lt;P&gt;Thank you in advance for your support.&lt;/P&gt;</description>
      <pubDate>Thu, 22 May 2025 00:39:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Guidance-Needed-Merging-Two-SAS-Datasets-with-Inconsistent-Key/m-p/967180#M376297</guid>
      <dc:creator>kiag</dc:creator>
      <dc:date>2025-05-22T00:39:40Z</dc:date>
    </item>
    <item>
      <title>Re: Guidance Needed: Merging Two SAS Datasets with Inconsistent Key Formats</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Guidance-Needed-Merging-Two-SAS-Datasets-with-Inconsistent-Key/m-p/967181#M376298</link>
      <description>Better post some real data to demonstrate these 6 scenarios you mentioned ,and post the desired output .&lt;BR /&gt;So other sas users could understand your requirement better and faster.</description>
      <pubDate>Thu, 22 May 2025 01:34:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Guidance-Needed-Merging-Two-SAS-Datasets-with-Inconsistent-Key/m-p/967181#M376298</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2025-05-22T01:34:02Z</dc:date>
    </item>
    <item>
      <title>Re: Guidance Needed: Merging Two SAS Datasets with Inconsistent Key Formats</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Guidance-Needed-Merging-Two-SAS-Datasets-with-Inconsistent-Key/m-p/967184#M376299</link>
      <description>&lt;P&gt;I have added more information to understand-&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="kiag_0-1747882601106.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/107229iD7FA77DE554659B0/image-size/medium?v=v2&amp;amp;px=400" role="button" title="kiag_0-1747882601106.png" alt="kiag_0-1747882601106.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;sas code code 2 datasets-&amp;nbsp;&lt;/P&gt;&lt;P&gt;/* Create dataset1 */&lt;BR /&gt;data dataset1;&lt;BR /&gt;input Product $ x $15.;&lt;BR /&gt;datalines;&lt;BR /&gt;via1 .&lt;BR /&gt;via2 003&lt;BR /&gt;via3 014&lt;BR /&gt;via4 GA4&lt;BR /&gt;via5 GA015&lt;BR /&gt;via6 319&lt;BR /&gt;via7 23456&lt;BR /&gt;via8 10101010198765&lt;BR /&gt;;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;/* Create dataset2 */&lt;BR /&gt;data dataset2;&lt;BR /&gt;input Name $ x $15.;&lt;BR /&gt;datalines;&lt;BR /&gt;a 2&lt;BR /&gt;b 3&lt;BR /&gt;c 14&lt;BR /&gt;d 4&lt;BR /&gt;e 15&lt;BR /&gt;f GF319&lt;BR /&gt;g 23456&lt;BR /&gt;h 98765&lt;BR /&gt;;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 22 May 2025 02:57:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Guidance-Needed-Merging-Two-SAS-Datasets-with-Inconsistent-Key/m-p/967184#M376299</guid>
      <dc:creator>kiag</dc:creator>
      <dc:date>2025-05-22T02:57:28Z</dc:date>
    </item>
    <item>
      <title>Re: Guidance Needed: Merging Two SAS Datasets with Inconsistent Key Formats</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Guidance-Needed-Merging-Two-SAS-Datasets-with-Inconsistent-Key/m-p/967186#M376300</link>
      <description>&lt;P&gt;Sure,&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have added more information to understand-&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="kiag_0-1747882917739.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/107231i178868169A19447B/image-size/medium?v=v2&amp;amp;px=400" role="button" title="kiag_0-1747882917739.png" alt="kiag_0-1747882917739.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;sas code code 2 datasets-&amp;nbsp;&lt;/P&gt;&lt;P&gt;/* Create dataset1 */&lt;BR /&gt;data dataset1;&lt;BR /&gt;input Product $ x $15.;&lt;BR /&gt;datalines;&lt;BR /&gt;via1 .&lt;BR /&gt;via2 003&lt;BR /&gt;via3 014&lt;BR /&gt;via4 GA4&lt;BR /&gt;via5 GA015&lt;BR /&gt;via6 319&lt;BR /&gt;via7 23456&lt;BR /&gt;via8 10101010198765&lt;BR /&gt;;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;/* Create dataset2 */&lt;BR /&gt;data dataset2;&lt;BR /&gt;input Name $ x $15.;&lt;BR /&gt;datalines;&lt;BR /&gt;a 2&lt;BR /&gt;b 3&lt;BR /&gt;c 14&lt;BR /&gt;d 4&lt;BR /&gt;e 15&lt;BR /&gt;f GF319&lt;BR /&gt;g 23456&lt;BR /&gt;h 98765&lt;BR /&gt;;&lt;BR /&gt;run;&lt;/P&gt;</description>
      <pubDate>Thu, 22 May 2025 03:02:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Guidance-Needed-Merging-Two-SAS-Datasets-with-Inconsistent-Key/m-p/967186#M376300</guid>
      <dc:creator>kiag</dc:creator>
      <dc:date>2025-05-22T03:02:03Z</dc:date>
    </item>
    <item>
      <title>Re: Guidance Needed: Merging Two SAS Datasets with Inconsistent Key Formats</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Guidance-Needed-Merging-Two-SAS-Datasets-with-Inconsistent-Key/m-p/967187#M376301</link>
      <description>Sure, I have added more information in the comments to understand-</description>
      <pubDate>Thu, 22 May 2025 02:59:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Guidance-Needed-Merging-Two-SAS-Datasets-with-Inconsistent-Key/m-p/967187#M376301</guid>
      <dc:creator>kiag</dc:creator>
      <dc:date>2025-05-22T02:59:46Z</dc:date>
    </item>
    <item>
      <title>Re: Guidance Needed: Merging Two SAS Datasets with Inconsistent Key Formats</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Guidance-Needed-Merging-Two-SAS-Datasets-with-Inconsistent-Key/m-p/967209#M376308</link>
      <description>&lt;P&gt;I am not sure how realistic this example is. In order to be able to compare lengths of variables across data sets, you need to be able to combine them first. I have done that by creating an ID variable and used that on a BY statement in a Merge. However, this assumes that the data sets have the same number of observations and that the X values you are comparing are on the same observation. Is this the case in the actual data? Here is a starting point that combines the data, renaming the X variables and assigning the length. If this is the case with the real data, then you can add IF logic to compare lengths and then reassign values.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Create dataset1 */
data dataset1;
input Product $ x $15.;
datalines;
via1 .
via2 003
via3 014
via4 GA4
via5 GA015
via6 319
via7 23456
via8 10101010198765
;
run;

/* Create dataset2 */
data dataset2;
input Name $ x $15.;
datalines;
a 2
b 3
c 14
d 4
e 15
f GF319
g 23456
h 98765
;
run;

data dataset1;
set dataset1;
id=_n_;
rename x=x1;
run;

data dataset2;
set dataset2;
id=_n_;
rename x=x2;
run;

data combine;
merge dataset1 dataset2;
by id;
len1=length(x1);
len2=length(x2);
run;

proc print data=combine;
run;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 22 May 2025 12:00:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Guidance-Needed-Merging-Two-SAS-Datasets-with-Inconsistent-Key/m-p/967209#M376308</guid>
      <dc:creator>Kathryn_SAS</dc:creator>
      <dc:date>2025-05-22T12:00:45Z</dc:date>
    </item>
    <item>
      <title>Re: Guidance Needed: Merging Two SAS Datasets with Inconsistent Key Formats</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Guidance-Needed-Merging-Two-SAS-Datasets-with-Inconsistent-Key/m-p/967215#M376310</link>
      <description>No, the data sets do not have the same number of observations and that the X values I am comparing are not on the same observation. In fact there are multiple numbers(observations) repeated as well within the same x column of the.</description>
      <pubDate>Thu, 22 May 2025 13:25:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Guidance-Needed-Merging-Two-SAS-Datasets-with-Inconsistent-Key/m-p/967215#M376310</guid>
      <dc:creator>kiag</dc:creator>
      <dc:date>2025-05-22T13:25:07Z</dc:date>
    </item>
    <item>
      <title>Re: Guidance Needed: Merging Two SAS Datasets with Inconsistent Key Formats</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Guidance-Needed-Merging-Two-SAS-Datasets-with-Inconsistent-Key/m-p/967280#M376320</link>
      <description>&lt;P&gt;Can X in dataset 2 ever have more than five digits?&amp;nbsp; If so, the scenario 6 will be ambiguous.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Dataset 1 has&amp;nbsp;10101010198765&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Your dataset 2 has&amp;nbsp;98765.&amp;nbsp; But it might also have 198765.&amp;nbsp; If so, then which obs from dataset 2 is chosen?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 23 May 2025 03:38:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Guidance-Needed-Merging-Two-SAS-Datasets-with-Inconsistent-Key/m-p/967280#M376320</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2025-05-23T03:38:29Z</dc:date>
    </item>
    <item>
      <title>Re: Guidance Needed: Merging Two SAS Datasets with Inconsistent Key Formats</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Guidance-Needed-Merging-Two-SAS-Datasets-with-Inconsistent-Key/m-p/967328#M376340</link>
      <description>no, the x of the dataset contain the below formats -&lt;BR /&gt;missing data&lt;BR /&gt;3 digits&lt;BR /&gt;3 or 5 digits&lt;BR /&gt;5 digits&lt;BR /&gt;&lt;BR /&gt;while the dataset2 has -&lt;BR /&gt;single /double digit&lt;BR /&gt;5 digits&lt;BR /&gt;14 digits&lt;BR /&gt;</description>
      <pubDate>Fri, 23 May 2025 14:30:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Guidance-Needed-Merging-Two-SAS-Datasets-with-Inconsistent-Key/m-p/967328#M376340</guid>
      <dc:creator>kiag</dc:creator>
      <dc:date>2025-05-23T14:30:14Z</dc:date>
    </item>
    <item>
      <title>Re: Guidance Needed: Merging Two SAS Datasets with Inconsistent Key Formats</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Guidance-Needed-Merging-Two-SAS-Datasets-with-Inconsistent-Key/m-p/967329#M376341</link>
      <description>&lt;P&gt;this is what I cam up with till now - but it still not giving accurate result&lt;/P&gt;&lt;P&gt;/* Initial Data Setup */&lt;BR /&gt;data dataset1;&lt;BR /&gt;input Product $ x $15.;&lt;BR /&gt;datalines;&lt;BR /&gt;via1 .&lt;BR /&gt;via2 003&lt;BR /&gt;via3 014&lt;BR /&gt;via4 GA4&lt;BR /&gt;via5 GA015&lt;BR /&gt;via6 319&lt;BR /&gt;via7 23456&lt;BR /&gt;via8 10101010198765&lt;BR /&gt;via9 22201&lt;BR /&gt;via10 6631&lt;BR /&gt;;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;data dataset2;&lt;BR /&gt;input Name $ x $15.;&lt;BR /&gt;datalines;&lt;BR /&gt;a 2&lt;BR /&gt;b 3&lt;BR /&gt;c 14&lt;BR /&gt;d 4&lt;BR /&gt;e 15&lt;BR /&gt;f GF319&lt;BR /&gt;g 23456&lt;BR /&gt;h 201&lt;BR /&gt;i 98765&lt;BR /&gt;j 31&lt;BR /&gt;;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;/* Initial Cleaning Steps for dataset1 */&lt;BR /&gt;data d1clean;&lt;BR /&gt;set dataset1;&lt;BR /&gt;length x1 $15;&lt;BR /&gt;x1 = compress(x, '', 'kd');&lt;BR /&gt;if not missing(x1) then do;&lt;BR /&gt;if input(x1, best32.) = 0 then do;&lt;BR /&gt;x1 = '0';&lt;BR /&gt;end;&lt;BR /&gt;else do;&lt;BR /&gt;x1 = cats(input(x1, best32.));&lt;BR /&gt;end;&lt;BR /&gt;end;&lt;BR /&gt;else do;&lt;BR /&gt;x1 = ''; /* Ensure blank if no digits */&lt;BR /&gt;end;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;/* Initial Cleaning Steps for dataset2 */&lt;/P&gt;&lt;P&gt;data d2clean;&lt;BR /&gt;set dataset2;&lt;BR /&gt;length x1 $15;&lt;BR /&gt;x1 = compress(x, '', 'kd');&lt;BR /&gt;if not missing(x1) then do;&lt;BR /&gt;if input(x1, best32.) = 0 then do;&lt;BR /&gt;x1 = '0';&lt;BR /&gt;end;&lt;BR /&gt;else do;&lt;BR /&gt;x1 = cats(input(x1, best32.));&lt;BR /&gt;end;&lt;BR /&gt;end;&lt;BR /&gt;else do;&lt;BR /&gt;x1 = ''; /* Ensure blank if no digits */&lt;BR /&gt;end;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;PROC SQL;&lt;BR /&gt;/* Create the final transformed version of d1clean */&lt;BR /&gt;CREATE TABLE d1_final_cleaned AS&lt;BR /&gt;SELECT&lt;BR /&gt;d1.Product,&lt;BR /&gt;d1.x AS original_dataset1_x,&lt;BR /&gt;d1.x1 AS x1_after_initial_cleaning,&lt;BR /&gt;COALESCE(&lt;BR /&gt;( /* Subquery to find the shortest d2.x1 that is a suffix of d1.x1 (if d1.x1 is longer) */&lt;BR /&gt;SELECT MIN(d2.x1)&lt;BR /&gt;FROM d2clean d2&lt;BR /&gt;WHERE&lt;BR /&gt;LENGTH(TRIM(d2.x1)) &amp;gt; 0 AND&lt;BR /&gt;LENGTH(TRIM(d1.x1)) &amp;gt; LENGTH(TRIM(d2.x1)) AND&lt;BR /&gt;/* Check if d1.x1 ends with d2.x1 */&lt;BR /&gt;INDEX(TRIM(d1.x1), TRIM(d2.x1)) = (LENGTH(TRIM(d1.x1)) - LENGTH(TRIM(d2.x1)) + 1) AND&lt;BR /&gt;LENGTH(TRIM(d2.x1)) = (&lt;BR /&gt;SELECT MIN(LENGTH(TRIM(d2_inner.x1)))&lt;BR /&gt;FROM d2clean d2_inner&lt;BR /&gt;WHERE&lt;BR /&gt;LENGTH(TRIM(d2_inner.x1)) &amp;gt; 0 AND&lt;BR /&gt;LENGTH(TRIM(d1.x1)) &amp;gt; LENGTH(TRIM(d2_inner.x1)) AND&lt;BR /&gt;INDEX(TRIM(d1.x1), TRIM(d2_inner.x1)) = (LENGTH(TRIM(d1.x1)) - LENGTH(TRIM(d2_inner.x1)) + 1)&lt;BR /&gt;)&lt;BR /&gt;),&lt;BR /&gt;d1.x1&lt;BR /&gt;) AS x1&lt;BR /&gt;FROM d1clean d1;&lt;/P&gt;&lt;P&gt;/* Create the final transformed version of d2clean */&lt;BR /&gt;CREATE TABLE d2_final_cleaned AS&lt;BR /&gt;SELECT&lt;BR /&gt;d2.Name,&lt;BR /&gt;d2.x AS original_dataset2_x,&lt;BR /&gt;d2.x1 AS x1_after_initial_cleaning,&lt;BR /&gt;COALESCE(&lt;BR /&gt;( /* Subquery to find the shortest d1.x1 that is a suffix of d2.x1 (if d2.x1 is longer) */&lt;BR /&gt;SELECT MIN(d1.x1)&lt;BR /&gt;FROM d1clean d1&lt;BR /&gt;WHERE&lt;BR /&gt;LENGTH(TRIM(d1.x1)) &amp;gt; 0 AND&lt;BR /&gt;LENGTH(TRIM(d2.x1)) &amp;gt; LENGTH(TRIM(d1.x1)) AND&lt;BR /&gt;INDEX(TRIM(d2.x1), TRIM(d1.x1)) = (LENGTH(TRIM(d2.x1)) - LENGTH(TRIM(d1.x1)) + 1) AND&lt;BR /&gt;LENGTH(TRIM(d1.x1)) = (&lt;BR /&gt;SELECT MIN(LENGTH(TRIM(d1_inner.x1)))&lt;BR /&gt;FROM d1clean d1_inner&lt;BR /&gt;WHERE&lt;BR /&gt;LENGTH(TRIM(d1_inner.x1)) &amp;gt; 0 AND&lt;BR /&gt;LENGTH(TRIM(d2.x1)) &amp;gt; LENGTH(TRIM(d1_inner.x1)) AND&lt;BR /&gt;INDEX(TRIM(d2.x1), TRIM(d1_inner.x1)) = (LENGTH(TRIM(d2.x1)) - LENGTH(TRIM(d1_inner.x1)) + 1)&lt;BR /&gt;)&lt;BR /&gt;),&lt;BR /&gt;d2.x1&lt;BR /&gt;) AS x1&lt;BR /&gt;FROM d2clean d2;&lt;BR /&gt;QUIT;&lt;/P&gt;&lt;P&gt;/* Turn options back on if you need them for subsequent steps */&lt;BR /&gt;/* OPTIONS NOTES STIMER SOURCE SYNTAXCHECK; */&lt;/P&gt;&lt;P&gt;/* Display the results */&lt;BR /&gt;PROC PRINT DATA=d1_final_cleaned NOOBS;&lt;BR /&gt;TITLE "d1_final_cleaned: x1 transformed based on d2clean suffixes";&lt;BR /&gt;RUN;&lt;/P&gt;&lt;P&gt;PROC PRINT DATA=d2_final_cleaned NOOBS;&lt;BR /&gt;TITLE "d2_final_cleaned: x1 transformed based on d1clean suffixes";&lt;BR /&gt;RUN;&lt;/P&gt;</description>
      <pubDate>Fri, 23 May 2025 14:33:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Guidance-Needed-Merging-Two-SAS-Datasets-with-Inconsistent-Key/m-p/967329#M376341</guid>
      <dc:creator>kiag</dc:creator>
      <dc:date>2025-05-23T14:33:45Z</dc:date>
    </item>
    <item>
      <title>Re: Guidance Needed: Merging Two SAS Datasets with Inconsistent Key Formats</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Guidance-Needed-Merging-Two-SAS-Datasets-with-Inconsistent-Key/m-p/967385#M376346</link>
      <description>&lt;P&gt;You say that dataset1 can have:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN&gt;missing data&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;3 digits&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;3 or 5 digits&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;5 digits&lt;/SPAN&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN&gt;Yet the fourth observation has X=GA4.&amp;nbsp; one digit.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Please be complete in describing the data, and clarifying the matching rules.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;Help us help you.&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 24 May 2025 00:44:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Guidance-Needed-Merging-Two-SAS-Datasets-with-Inconsistent-Key/m-p/967385#M376346</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2025-05-24T00:44:37Z</dc:date>
    </item>
    <item>
      <title>Re: Guidance Needed: Merging Two SAS Datasets with Inconsistent Key Formats</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Guidance-Needed-Merging-Two-SAS-Datasets-with-Inconsistent-Key/m-p/967387#M376347</link>
      <description>&lt;P&gt;Do the values with large numbers of digits have meaningful leading zeros?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;That is can we just assume for the second dataset (the one that appears to slightly more standardized) that if we convert the digits to a number we can divide those number into 4 categories:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;missing&lt;/P&gt;
&lt;P&gt;less 100&lt;/P&gt;
&lt;P&gt;less than 100000&lt;/P&gt;
&lt;P&gt;others&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Or better still just convert the digits to a number (truncate to the last 5 digits using MOD()) and match on the number.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data one;
  set dataset1 ;
  digits1=compress(x,,'kd');
  ndigits1=lengthn(digits1);
  number=mod(input(digits1,??32.),1E5);
  rename x=x1;
run;

data two;
  set dataset2 ;
  digits2=compress(x,,'kd');
  ndigits2=lengthn(digits2);
  number=mod(input(digits2,??32.),1E5);
  rename x=x2;
run;

proc sql;
create table check as 
select * 
from one natural full join two
;
quit;
proc print data=check;
  id number;
  var x1 x2 product name digits1 digits2 ndigits1 ndigits2 ;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Tom_0-1748051955928.png" style="width: 999px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/107320i664DE1AAD49A4E50/image-size/large?v=v2&amp;amp;px=999" role="button" title="Tom_0-1748051955928.png" alt="Tom_0-1748051955928.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 24 May 2025 01:59:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Guidance-Needed-Merging-Two-SAS-Datasets-with-Inconsistent-Key/m-p/967387#M376347</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2025-05-24T01:59:29Z</dc:date>
    </item>
    <item>
      <title>Re: Guidance Needed: Merging Two SAS Datasets with Inconsistent Key Formats</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Guidance-Needed-Merging-Two-SAS-Datasets-with-Inconsistent-Key/m-p/967390#M376348</link>
      <description>&lt;P&gt;So How do I know the missing value of dataset1 match 2 of dataset1 ,not 3 or 4 of dataset2 ?&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data dataset1;
input Product $ x $15.;
datalines;
via1 .
via2 003
via3 014
via4 GA4
via5 GA015
via6 319
via7 23456
via8 10101010198765
;
run;

/* Create dataset2 */
data dataset2;
input Name $ x $15.;
datalines;
a 2
b 3
c 14
d 4
e 15
f GF319
g 23456
h 98765
;
run;

proc sql;
create table temp as
select a.Product,a.x as a_x,b.name,b.x as b_x
 from dataset1 as a,dataset2 as b
  where strip(reverse(a.x)) eqt strip(reverse(b.x)) 
   group by a.x
    having length(b.x)=max(length(b.x))  
     order by b.name;

create table want as
select * from temp
union corr
/*Match missing value of dataset1 to 2 of dataset1*/
select *
 from (select product as Product,x as a_x from dataset1 where x is missing),
      (select name as name,x as b_x from dataset2 where b_x not in (select b_x from temp))
;
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Ksharp_0-1748052388802.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/107321iF93480481EA187E4/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Ksharp_0-1748052388802.png" alt="Ksharp_0-1748052388802.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 24 May 2025 02:06:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Guidance-Needed-Merging-Two-SAS-Datasets-with-Inconsistent-Key/m-p/967390#M376348</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2025-05-24T02:06:36Z</dc:date>
    </item>
    <item>
      <title>Re: Guidance Needed: Merging Two SAS Datasets with Inconsistent Key Formats</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Guidance-Needed-Merging-Two-SAS-Datasets-with-Inconsistent-Key/m-p/967395#M376349</link>
      <description>&lt;P&gt;This can be done in a single data step that first reads DATASET1, stores it in a hash object (lookup table) with a lookup key created by modifying X (remove leading non-digits, shorten it to the last 5 digits if necessary, and remove leading zeroes.&amp;nbsp; This is followed by reading DATASET2, where X is similarly modified, and a lookup is performed to see whether it is in the hash object, from which the PRODUCT value is retrieved:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data dataset1 (label='x with 3 or 5 digits');
  infile datalines missover; 
  input product $4.  x :$20. ;
datalines;
via1 
via2 003
via3 014
via4 GA4
via5 GA015
via6 319
via7 23456
via8 10101010198765
run;

data dataset2;
  infile datalines ;
  input name $1.  x :$20. ;
datalines;
a 2
b 3
c 14
d 4
e 15
f GF319
g 23456
h 98765
run;

data want (drop=rc);;
  set   dataset1 (in=in1) dataset2 (in=in2);
  where x^='';
  if _n_=1 then do;
    declare hash d1 ();
      d1.definekey('x');
      d1.definedata('product');
      d1.definedone();
  end;

  x=substr(x,anydigit(x)); /*Remove leading non-digits*/
  if length(x)&amp;gt;5 then x=substr(x,length(x)-4);  /*If too long, take last 5 digits*/
 
  do while (x=:'0'); /* Strip leading zeroes*/
    x=substr(x,2);
  end;
  if in1=1 then d1.add();
  if in2;
  rc=d1.find();
run;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sat, 24 May 2025 03:51:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Guidance-Needed-Merging-Two-SAS-Datasets-with-Inconsistent-Key/m-p/967395#M376349</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2025-05-24T03:51:16Z</dc:date>
    </item>
  </channel>
</rss>

