I have two datasets: File01 and File06. File06 has a variable called 'Group TIN'n that I want to use to populate a variable called 'TIN' in File01 but ONLY if a variable called 'Client Defined Provider ID'n or 'Client Defined Address ID'n matches from both datasets.
How can I write this in SAS? Is a datastep or PROC SQL better in this case?
First, show us some sample data of two datasets.
I think you can do it either in data step or proc sql, but without knowing the data structure, I can't give you proper advice.
Sure, of course
FILE06 | ||
Client Defined Provider ID | Client Defined Address ID | GroupTIN |
123456 | 7890 | 54678 |
123457 | 7891 | 54679 |
123458 | 7892 | 54680 |
123459 | 7893 | 54681 |
123460 | 7894 | 54682 |
123461 | 7895 | 54683 |
123462 | 7896 | 54684 |
123463 | 7897 | 54685 |
123464 | 7898 | 54686 |
123465 | 7899 | 54687 |
123466 | 7900 | 54688 |
123467 | 7901 | 54689 |
123468 | 7902 | 54690 |
123469 | 7903 | 54691 |
123470 | 7904 | 54692 |
123471 | 7905 | 54693 |
123472 | 7906 | 54694 |
123473 | 7907 | 54695 |
123474 | 7908 | 54696 |
123475 | 7909 | 54697 |
123476 | 7910 | 54698 |
123477 | 7911 | 54699 |
123478 | 7912 | 54700 |
123479 | 7913 | 54701 |
123480 | 7914 | 54702 |
FILE01 | ||
Client Defined Provider ID | Client Defined Address ID | TIN |
123456 | 7890 | . |
123457 | 7891 | . |
123458 | 7892 | . |
123459 | 7893 | . |
123460 | 7894 | . |
123461 | 7895 | . |
123462 | 7896 | . |
123463 | 7897 | . |
123464 | 7898 | . |
123465 | 7899 | . |
123466 | 7900 | . |
123467 | 7901 | . |
123468 | 7902 | . |
123469 | 7903 | . |
123470 | 7904 | . |
123471 | 7905 | . |
123472 | 7906 | . |
123473 | 7907 | . |
123474 | 7908 | . |
123475 | 7909 | . |
123476 | 7910 | . |
123477 | 7911 | . |
123478 | 7912 | . |
123479 | 7913 | . |
123480 | 7914 | . |
how about this code.
data file06;
input ProviderID AddressID GroupTIN;
datalines;
123456 7890 54678
123457 7891 54679
123458 7892 54680
123459 7893 54681
123460 7894 54682
123461 7895 54683
123462 7896 54684
123463 7897 54685
123464 7898 54686
123465 7899 54687
123466 7900 54688
123467 7901 54689
123468 7902 54690
123469 7903 54691
123470 7904 54692
123471 7905 54693
123472 7906 54694
123473 7907 54695
123474 7908 54696
123475 7909 54697
123476 7910 54698
123477 7911 54699
123478 7912 54700
123479 7913 54701
123480 7914 54702
;
run;
data file01;
input ProviderID AddressID TIN;
datalines;
123456 7890 .
123457 7891 .
123458 7892 .
123459 7893 .
123460 7894 .
123461 7895 .
123462 7896 .
123463 7897 .
123464 7898 .
123465 7899 .
123466 7900 .
123467 7901 .
123468 7902 .
123469 7903 .
123470 7904 .
123471 7905 .
123472 7906 .
123473 7907 .
123474 7908 .
123475 7909 .
123476 7910 .
123477 7911 .
123478 7912 .
123479 7913 .
123480 7914 .
;
run;
proc sql;
update file01 set TIN=(select GroupTIN from file06
where file01.ProviderID = file06.ProviderID or
file01.AddressID = file06.AddressID )
;
quit;
Are 'Group TIN'n and 'TIN'n character variables?
Is there a duplicate key obs that is not in the test data? If so, how do you want to handle it?
Are 'Group TIN'n and 'TIN'n character variables?
- They were not but I did just change it so now yes, they are both character variables.
Is there a duplicate key obs that is not in the test data? If so, how do you want to handle it?
- There could be duplicates. If there are duplicates, they should have the same value for 'Group TIN'n meaning that it can either be overwritten with the duplicate value or ignored.
Sorry for the late reply - I didn't get a notification that you sent a reply.
First of all, if TIN and GROUP TIN are the same variable attribute, WARNING will not appear.
Also, if the keys of the duplicate records are the same, you can use distinct as follows to resolve the error.
If the key variables are ProviderID "and" AddressID, change the logical operator to AND instead of OR.
proc sql;
update file01 set TIN=(select GroupTIN from file06
where file01.ProviderID = file06.ProviderID OR
file01.AddressID = file06.AddressID )
;
quit;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.