I have two datasets: File01 and File06. File06 has a variable called 'Group TIN'n that I want to use to populate a variable called 'TIN' in File01 but ONLY if a variable called 'Client Defined Provider ID'n or 'Client Defined Address ID'n matches from both datasets.
How can I write this in SAS? Is a datastep or PROC SQL better in this case?
First, show us some sample data of two datasets.
I think you can do it either in data step or proc sql, but without knowing the data structure, I can't give you proper advice.
Sure, of course
| FILE06 | ||
| Client Defined Provider ID | Client Defined Address ID | GroupTIN | 
| 123456 | 7890 | 54678 | 
| 123457 | 7891 | 54679 | 
| 123458 | 7892 | 54680 | 
| 123459 | 7893 | 54681 | 
| 123460 | 7894 | 54682 | 
| 123461 | 7895 | 54683 | 
| 123462 | 7896 | 54684 | 
| 123463 | 7897 | 54685 | 
| 123464 | 7898 | 54686 | 
| 123465 | 7899 | 54687 | 
| 123466 | 7900 | 54688 | 
| 123467 | 7901 | 54689 | 
| 123468 | 7902 | 54690 | 
| 123469 | 7903 | 54691 | 
| 123470 | 7904 | 54692 | 
| 123471 | 7905 | 54693 | 
| 123472 | 7906 | 54694 | 
| 123473 | 7907 | 54695 | 
| 123474 | 7908 | 54696 | 
| 123475 | 7909 | 54697 | 
| 123476 | 7910 | 54698 | 
| 123477 | 7911 | 54699 | 
| 123478 | 7912 | 54700 | 
| 123479 | 7913 | 54701 | 
| 123480 | 7914 | 54702 | 
| FILE01 | ||
| Client Defined Provider ID | Client Defined Address ID | TIN | 
| 123456 | 7890 | . | 
| 123457 | 7891 | . | 
| 123458 | 7892 | . | 
| 123459 | 7893 | . | 
| 123460 | 7894 | . | 
| 123461 | 7895 | . | 
| 123462 | 7896 | . | 
| 123463 | 7897 | . | 
| 123464 | 7898 | . | 
| 123465 | 7899 | . | 
| 123466 | 7900 | . | 
| 123467 | 7901 | . | 
| 123468 | 7902 | . | 
| 123469 | 7903 | . | 
| 123470 | 7904 | . | 
| 123471 | 7905 | . | 
| 123472 | 7906 | . | 
| 123473 | 7907 | . | 
| 123474 | 7908 | . | 
| 123475 | 7909 | . | 
| 123476 | 7910 | . | 
| 123477 | 7911 | . | 
| 123478 | 7912 | . | 
| 123479 | 7913 | . | 
| 123480 | 7914 | . | 
how about this code.
data file06;
input ProviderID AddressID GroupTIN;
datalines;
123456	7890	54678
123457	7891	54679
123458	7892	54680
123459	7893	54681
123460	7894	54682
123461	7895	54683
123462	7896	54684
123463	7897	54685
123464	7898	54686
123465	7899	54687
123466	7900	54688
123467	7901	54689
123468	7902	54690
123469	7903	54691
123470	7904	54692
123471	7905	54693
123472	7906	54694
123473	7907	54695
123474	7908	54696
123475	7909	54697
123476	7910	54698
123477	7911	54699
123478	7912	54700
123479	7913	54701
123480	7914	54702
;
run;
data file01;
input ProviderID AddressID TIN;
datalines;
123456	7890	.
123457	7891	.
123458	7892	.
123459	7893	.
123460	7894	.
123461	7895	.
123462	7896	.
123463	7897	.
123464	7898	.
123465	7899	.
123466	7900	.
123467	7901	.
123468	7902	.
123469	7903	.
123470	7904	.
123471	7905	.
123472	7906	.
123473	7907	.
123474	7908	.
123475	7909	.
123476	7910	.
123477	7911	.
123478	7912	.
123479	7913	.
123480	7914	.
;
run;
proc sql;
    update file01 set TIN=(select GroupTIN from file06
    where file01.ProviderID = file06.ProviderID or
          file01.AddressID = file06.AddressID )
    ;
quit;Are 'Group TIN'n and 'TIN'n character variables?
Is there a duplicate key obs that is not in the test data? If so, how do you want to handle it?
Are 'Group TIN'n and 'TIN'n character variables?
- They were not but I did just change it so now yes, they are both character variables.
Is there a duplicate key obs that is not in the test data? If so, how do you want to handle it?
- There could be duplicates. If there are duplicates, they should have the same value for 'Group TIN'n meaning that it can either be overwritten with the duplicate value or ignored.
Sorry for the late reply - I didn't get a notification that you sent a reply.
First of all, if TIN and GROUP TIN are the same variable attribute, WARNING will not appear.
Also, if the keys of the duplicate records are the same, you can use distinct as follows to resolve the error.
If the key variables are ProviderID "and" AddressID, change the logical operator to AND instead of OR.
proc sql;
    update file01 set TIN=(select GroupTIN from file06
    where file01.ProviderID = file06.ProviderID OR
          file01.AddressID = file06.AddressID )
    ;
quit;It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
