About kiag

kiag · ‎05-23-2025

this is what I cam up with till now - but it still not giving accurate result /* Initial Data Setup */ data dataset1; input Product $ x $15.; datalines; via1 . via2 003 via3 014 via4 GA4 via5 GA015 via6 319 via7 23456 via8 10101010198765 via9 22201 via10 6631 ; run; data dataset2; input Name $ x $15.; datalines; a 2 b 3 c 14 d 4 e 15 f GF319 g 23456 h 201 i 98765 j 31 ; run; /* Initial Cleaning Steps for dataset1 */ data d1clean; set dataset1; length x1 $15; x1 = compress(x, '', 'kd'); if not missing(x1) then do; if input(x1, best32.) = 0 then do; x1 = '0'; end; else do; x1 = cats(input(x1, best32.)); end; end; else do; x1 = ''; /* Ensure blank if no digits */ end; run; /* Initial Cleaning Steps for dataset2 */ data d2clean; set dataset2; length x1 $15; x1 = compress(x, '', 'kd'); if not missing(x1) then do; if input(x1, best32.) = 0 then do; x1 = '0'; end; else do; x1 = cats(input(x1, best32.)); end; end; else do; x1 = ''; /* Ensure blank if no digits */ end; run; PROC SQL; /* Create the final transformed version of d1clean */ CREATE TABLE d1_final_cleaned AS SELECT d1.Product, d1.x AS original_dataset1_x, d1.x1 AS x1_after_initial_cleaning, COALESCE( ( /* Subquery to find the shortest d2.x1 that is a suffix of d1.x1 (if d1.x1 is longer) */ SELECT MIN(d2.x1) FROM d2clean d2 WHERE LENGTH(TRIM(d2.x1)) > 0 AND LENGTH(TRIM(d1.x1)) > LENGTH(TRIM(d2.x1)) AND /* Check if d1.x1 ends with d2.x1 */ INDEX(TRIM(d1.x1), TRIM(d2.x1)) = (LENGTH(TRIM(d1.x1)) - LENGTH(TRIM(d2.x1)) + 1) AND LENGTH(TRIM(d2.x1)) = ( SELECT MIN(LENGTH(TRIM(d2_inner.x1))) FROM d2clean d2_inner WHERE LENGTH(TRIM(d2_inner.x1)) > 0 AND LENGTH(TRIM(d1.x1)) > LENGTH(TRIM(d2_inner.x1)) AND INDEX(TRIM(d1.x1), TRIM(d2_inner.x1)) = (LENGTH(TRIM(d1.x1)) - LENGTH(TRIM(d2_inner.x1)) + 1) ) ), d1.x1 ) AS x1 FROM d1clean d1; /* Create the final transformed version of d2clean */ CREATE TABLE d2_final_cleaned AS SELECT d2.Name, d2.x AS original_dataset2_x, d2.x1 AS x1_after_initial_cleaning, COALESCE( ( /* Subquery to find the shortest d1.x1 that is a suffix of d2.x1 (if d2.x1 is longer) */ SELECT MIN(d1.x1) FROM d1clean d1 WHERE LENGTH(TRIM(d1.x1)) > 0 AND LENGTH(TRIM(d2.x1)) > LENGTH(TRIM(d1.x1)) AND INDEX(TRIM(d2.x1), TRIM(d1.x1)) = (LENGTH(TRIM(d2.x1)) - LENGTH(TRIM(d1.x1)) + 1) AND LENGTH(TRIM(d1.x1)) = ( SELECT MIN(LENGTH(TRIM(d1_inner.x1))) FROM d1clean d1_inner WHERE LENGTH(TRIM(d1_inner.x1)) > 0 AND LENGTH(TRIM(d2.x1)) > LENGTH(TRIM(d1_inner.x1)) AND INDEX(TRIM(d2.x1), TRIM(d1_inner.x1)) = (LENGTH(TRIM(d2.x1)) - LENGTH(TRIM(d1_inner.x1)) + 1) ) ), d2.x1 ) AS x1 FROM d2clean d2; QUIT; /* Turn options back on if you need them for subsequent steps */ /* OPTIONS NOTES STIMER SOURCE SYNTAXCHECK; */ /* Display the results */ PROC PRINT DATA=d1_final_cleaned NOOBS; TITLE "d1_final_cleaned: x1 transformed based on d2clean suffixes"; RUN; PROC PRINT DATA=d2_final_cleaned NOOBS; TITLE "d2_final_cleaned: x1 transformed based on d1clean suffixes"; RUN;

kiag · ‎05-23-2025

no, the x of the dataset contain the below formats - missing data 3 digits 3 or 5 digits 5 digits while the dataset2 has - single /double digit 5 digits 14 digits

kiag · ‎05-22-2025

No, the data sets do not have the same number of observations and that the X values I am comparing are not on the same observation. In fact there are multiple numbers(observations) repeated as well within the same x column of the.

kiag · ‎05-21-2025

Sure, I have added more information in the comments to understand-

kiag · ‎05-21-2025

Sure, I have added more information to understand- sas code code 2 datasets- /* Create dataset1 */ data dataset1; input Product $ x $15.; datalines; via1 . via2 003 via3 014 via4 GA4 via5 GA015 via6 319 via7 23456 via8 10101010198765 ; run; /* Create dataset2 */ data dataset2; input Name $ x $15.; datalines; a 2 b 3 c 14 d 4 e 15 f GF319 g 23456 h 98765 ; run;

kiag · ‎05-21-2025

I have added more information to understand- sas code code 2 datasets- /* Create dataset1 */ data dataset1; input Product $ x $15.; datalines; via1 . via2 003 via3 014 via4 GA4 via5 GA015 via6 319 via7 23456 via8 10101010198765 ; run; /* Create dataset2 */ data dataset2; input Name $ x $15.; datalines; a 2 b 3 c 14 d 4 e 15 f GF319 g 23456 h 98765 ; run;

kiag · ‎05-21-2025

Dear SAS Experts, I am currently working on merging two datasets, dataset1 and dataset1, using a common column x. However, I’m facing challenges due to inconsistencies in the format and length of the values in this column across the two datasets. Below are the different scenarios I’ve encountered: The x column in dataset1 contains missing values, whereas the corresponding values are present in x column in dataset2. The x column in dataset1 contains 3-digit values (some starting with zero), but the same values appear as single or double digits in x column in dataset2. In some cases, x column in dataset1 has 3 or 5 digits, while in dataset2 the corresponding value appears as a shorter form. For these, I need to match based on the last 2 digits. In other cases, x column in dataset1 has 3 digits, while in x column in dataset2 the corresponding value appears as a 5-digit code. For these, I need to match based on the last 3 digits. When x column has 5 digits in both datasets, the records should match exactly. If x column in dataset1 has 5 digits and in x column in dataset2 has 14 digits, I need to match based on the last 5 digits. Could you please suggest an efficient and reliable approach to standardize or transform these values and perform a successful merge that accommodates these scenarios? Thank you in advance for your support.

Online Status	Offline
Date Last Visited	‎06-17-2025 09:55 AM

Re: Guidance Needed: Merging Two SAS Datasets with Inconsistent Key Fo...

Re: Guidance Needed: Merging Two SAS Datasets with Inconsistent Key Fo...

Re: Guidance Needed: Merging Two SAS Datasets with Inconsistent Key Fo...

Re: Guidance Needed: Merging Two SAS Datasets with Inconsistent Key Fo...

Re: Guidance Needed: Merging Two SAS Datasets with Inconsistent Key Fo...

Re: Guidance Needed: Merging Two SAS Datasets with Inconsistent Key Fo...

Guidance Needed: Merging Two SAS Datasets with Inconsistent Key Format...

Re: Guidance Needed: Merging Two SAS Datasets with Inconsistent Key Fo...

Re: Guidance Needed: Merging Two SAS Datasets with Inconsistent Key Fo...

Re: Guidance Needed: Merging Two SAS Datasets with Inconsistent Key Fo...

Re: Guidance Needed: Merging Two SAS Datasets with Inconsistent Key Fo...

Re: Guidance Needed: Merging Two SAS Datasets with Inconsistent Key Fo...

Re: Guidance Needed: Merging Two SAS Datasets with Inconsistent Key Fo...

Guidance Needed: Merging Two SAS Datasets with Inconsistent Key Format...