Hi All,
How do I join two tables using PROC SQL if one of my field containing both character and numeric?
Sample dataset
Table 1
Book_number | Identification_nbr | Checkout |
ABC | 00005 | Y |
ABC | 00004A | Y |
BU | 881123-01-1234 | Y |
BU | 1/123332 | Y |
C | 001 | Y |
Table 2
Book_number | Identification_nbr | Age |
ABC | 5 | 1 |
ABC | 00004A | 2 |
BU | 881123-01-1234 | 3 |
BU | 1/123332 | 4 |
C | 1 | 6 |
Desired result:
Book_number | Identification_nbr | Checkout | Age |
ABC | 00005 | Y | 1 |
ABC | 00004A | Y | 2 |
BU | 881123-01-1234 | Y | 3 |
BU | 1/123332 | Y | 4 |
C | 001 | Y | 6 |
Table 2 extracted identification_nbr from table 1 initially but after processing, the 0000 in front missing. I was unable to join both tables because Identification_nbr defined as character in table 1 and table 2.
Do the equate without the leading zéros:
data t1;
input (Book_number Id Checkout) (:$20.);
datalines;
ABC 00005 Y
ABC 00004A Y
BU 881123-01-1234 Y
BU 1/123332 Y
C 001 Y
D 000 Y
;
data t2;
input (Book_number Id) (:$20.) age;
datalines;
ABC 5 1
ABC 00004A 2
BU 881123-01-1234 3
BU 1/123332 4
C 1 6
D 0 10
;
proc sql;
select t1.*, t2.age
from t1 inner join t2 on
t1.book_number=t2.book_number and
substr(t1.id, findc(t1.id,"0","K")) = substr(t2.id, findc(t2.id,"0","K"));
quit;
option K in FINDC requests to find the first character not equal to 0;
A field containing both numeric and characters as shown would have to be character. A numeric variable would not store the character data. One possible exception to this would be if a format was showing a numeric with a character display but its really unlikely with that type of variables you've shown.
My guess is the data was put into Excel at some point to make those conversions. If you can undo that step, it would be your best choice. If you can't, can you verify the import happened correctly?
Otherwise the fix via code is to detect a number that doesn't have leading zeroes. I would use LENGTHN to check the length and then apply leading zeroes. But how many leading zeroes? The first has 4 and the last has 2? How would that be known if you only have access to table 2? Once that logic is clear we can help you turn this into code.
@vincentgoh88 wrote:
Hi All,
How do I join two tables using PROC SQL if one of my field containing both character and numeric?
Sample dataset
Table 1
Book_number Identification_nbr Checkout ABC 00005 Y ABC 00004A Y BU 881123-01-1234 Y BU 1/123332 Y C 001 Y
Table 2
Book_number Identification_nbr Age ABC 5 1 ABC 00004A 2 BU 881123-01-1234 3 BU 1/123332 4 C 1 6
Desired result:
Book_number Identification_nbr Checkout Age ABC 00005 Y 1 ABC 00004A Y 2 BU 881123-01-1234 Y 3 BU 1/123332 Y 4 C 001 Y 6
Table 2 extracted identification_nbr from table 1 initially but after processing, the 0000 in front missing. I was unable to join both tables because Identification_nbr defined as character in table 1 and table 2.
Do the equate without the leading zéros:
data t1;
input (Book_number Id Checkout) (:$20.);
datalines;
ABC 00005 Y
ABC 00004A Y
BU 881123-01-1234 Y
BU 1/123332 Y
C 001 Y
D 000 Y
;
data t2;
input (Book_number Id) (:$20.) age;
datalines;
ABC 5 1
ABC 00004A 2
BU 881123-01-1234 3
BU 1/123332 4
C 1 6
D 0 10
;
proc sql;
select t1.*, t2.age
from t1 inner join t2 on
t1.book_number=t2.book_number and
substr(t1.id, findc(t1.id,"0","K")) = substr(t2.id, findc(t2.id,"0","K"));
quit;
option K in FINDC requests to find the first character not equal to 0;
When I tried to pull from Teradata, such error message occured:
Data Type "Identification_nb" does not match a Defined Type name
Any idea how to fix?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.