Hi All,
How do I join two tables using PROC SQL if one of my field containing both character and numeric?
Sample dataset
Table 1
Book_number | Identification_nbr | Checkout |
ABC | 00005 | Y |
ABC | 00004A | Y |
BU | 881123-01-1234 | Y |
BU | 1/123332 | Y |
C | 001 | Y |
Table 2
Book_number | Identification_nbr | Age |
ABC | 5 | 1 |
ABC | 00004A | 2 |
BU | 881123-01-1234 | 3 |
BU | 1/123332 | 4 |
C | 1 | 6 |
Desired result:
Book_number | Identification_nbr | Checkout | Age |
ABC | 00005 | Y | 1 |
ABC | 00004A | Y | 2 |
BU | 881123-01-1234 | Y | 3 |
BU | 1/123332 | Y | 4 |
C | 001 | Y | 6 |
Table 2 extracted identification_nbr from table 1 initially but after processing, the 0000 in front missing. I was unable to join both tables because Identification_nbr defined as character in table 1 and table 2.
Do the equate without the leading zéros:
data t1;
input (Book_number Id Checkout) (:$20.);
datalines;
ABC 00005 Y
ABC 00004A Y
BU 881123-01-1234 Y
BU 1/123332 Y
C 001 Y
D 000 Y
;
data t2;
input (Book_number Id) (:$20.) age;
datalines;
ABC 5 1
ABC 00004A 2
BU 881123-01-1234 3
BU 1/123332 4
C 1 6
D 0 10
;
proc sql;
select t1.*, t2.age
from t1 inner join t2 on
t1.book_number=t2.book_number and
substr(t1.id, findc(t1.id,"0","K")) = substr(t2.id, findc(t2.id,"0","K"));
quit;
option K in FINDC requests to find the first character not equal to 0;
A field containing both numeric and characters as shown would have to be character. A numeric variable would not store the character data. One possible exception to this would be if a format was showing a numeric with a character display but its really unlikely with that type of variables you've shown.
My guess is the data was put into Excel at some point to make those conversions. If you can undo that step, it would be your best choice. If you can't, can you verify the import happened correctly?
Otherwise the fix via code is to detect a number that doesn't have leading zeroes. I would use LENGTHN to check the length and then apply leading zeroes. But how many leading zeroes? The first has 4 and the last has 2? How would that be known if you only have access to table 2? Once that logic is clear we can help you turn this into code.
@vincentgoh88 wrote:
Hi All,
How do I join two tables using PROC SQL if one of my field containing both character and numeric?
Sample dataset
Table 1
Book_number Identification_nbr Checkout ABC 00005 Y ABC 00004A Y BU 881123-01-1234 Y BU 1/123332 Y C 001 Y
Table 2
Book_number Identification_nbr Age ABC 5 1 ABC 00004A 2 BU 881123-01-1234 3 BU 1/123332 4 C 1 6
Desired result:
Book_number Identification_nbr Checkout Age ABC 00005 Y 1 ABC 00004A Y 2 BU 881123-01-1234 Y 3 BU 1/123332 Y 4 C 001 Y 6
Table 2 extracted identification_nbr from table 1 initially but after processing, the 0000 in front missing. I was unable to join both tables because Identification_nbr defined as character in table 1 and table 2.
Do the equate without the leading zéros:
data t1;
input (Book_number Id Checkout) (:$20.);
datalines;
ABC 00005 Y
ABC 00004A Y
BU 881123-01-1234 Y
BU 1/123332 Y
C 001 Y
D 000 Y
;
data t2;
input (Book_number Id) (:$20.) age;
datalines;
ABC 5 1
ABC 00004A 2
BU 881123-01-1234 3
BU 1/123332 4
C 1 6
D 0 10
;
proc sql;
select t1.*, t2.age
from t1 inner join t2 on
t1.book_number=t2.book_number and
substr(t1.id, findc(t1.id,"0","K")) = substr(t2.id, findc(t2.id,"0","K"));
quit;
option K in FINDC requests to find the first character not equal to 0;
When I tried to pull from Teradata, such error message occured:
Data Type "Identification_nb" does not match a Defined Type name
Any idea how to fix?
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.