BookmarkSubscribeRSS Feed
InspectahDex
Obsidian | Level 7

I have two datasets: File01 and File06. File06 has a variable called 'Group TIN'n that I want to use to populate a variable called 'TIN' in File01 but ONLY if a variable called 'Client Defined Provider ID'n or 'Client Defined Address ID'n matches from both datasets.

 

How can I write this in SAS? Is a datastep or PROC SQL better in this case?

7 REPLIES 7
japelin
Rhodochrosite | Level 12

First, show us some sample data of two datasets.
I think you can do it either in data step or proc sql, but without knowing the data structure, I can't give you proper advice.

InspectahDex
Obsidian | Level 7

Sure, of course

FILE06  
Client Defined Provider IDClient Defined Address IDGroupTIN
123456789054678
123457789154679
123458789254680
123459789354681
123460789454682
123461789554683
123462789654684
123463789754685
123464789854686
123465789954687
123466790054688
123467790154689
123468790254690
123469790354691
123470790454692
123471790554693
123472790654694
123473790754695
123474790854696
123475790954697
123476791054698
123477791154699
123478791254700
123479791354701
123480791454702

 

FILE01  
Client Defined Provider IDClient Defined Address IDTIN
1234567890.
1234577891.
1234587892.
1234597893.
1234607894.
1234617895.
1234627896.
1234637897.
1234647898.
1234657899.
1234667900.
1234677901.
1234687902.
1234697903.
1234707904.
1234717905.
1234727906.
1234737907.
1234747908.
1234757909.
1234767910.
1234777911.
1234787912.
1234797913.
1234807914.
japelin
Rhodochrosite | Level 12

how about this code.

 

data file06;
input ProviderID AddressID GroupTIN;
datalines;
123456	7890	54678
123457	7891	54679
123458	7892	54680
123459	7893	54681
123460	7894	54682
123461	7895	54683
123462	7896	54684
123463	7897	54685
123464	7898	54686
123465	7899	54687
123466	7900	54688
123467	7901	54689
123468	7902	54690
123469	7903	54691
123470	7904	54692
123471	7905	54693
123472	7906	54694
123473	7907	54695
123474	7908	54696
123475	7909	54697
123476	7910	54698
123477	7911	54699
123478	7912	54700
123479	7913	54701
123480	7914	54702
;
run;


data file01;
input ProviderID AddressID TIN;
datalines;
123456	7890	.
123457	7891	.
123458	7892	.
123459	7893	.
123460	7894	.
123461	7895	.
123462	7896	.
123463	7897	.
123464	7898	.
123465	7899	.
123466	7900	.
123467	7901	.
123468	7902	.
123469	7903	.
123470	7904	.
123471	7905	.
123472	7906	.
123473	7907	.
123474	7908	.
123475	7909	.
123476	7910	.
123477	7911	.
123478	7912	.
123479	7913	.
123480	7914	.
;
run;

proc sql;
    update file01 set TIN=(select GroupTIN from file06
    where file01.ProviderID = file06.ProviderID or
          file01.AddressID = file06.AddressID )
    ;
quit;
InspectahDex
Obsidian | Level 7
I got an error:

30 proc sql;
31 update file01 set TIN=(select 'Group TIN'n from file06
32 where file01.'Client Defined Provider ID'n = file06.'Client Defined Provider ID'n or
33 file01.'Client Defined Address ID'n = file06.'Client Defined Address ID'n)
34 ;
WARNING: Character expression will be truncated when assigned to character column TIN.
ERROR: Subquery evaluated to more than one row.
NOTE: Correlation values are: 'Client Defined Provider ID'n=56290 'Client Defined Address ID'n=39489 .
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
35 quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
japelin
Rhodochrosite | Level 12

Are 'Group TIN'n and 'TIN'n character variables?

Is there a duplicate key obs that is not in the test data? If so, how do you want to handle it?

InspectahDex
Obsidian | Level 7

Are 'Group TIN'n and 'TIN'n character variables?

- They were not but I did just change it so now yes, they are both character variables.

Is there a duplicate key obs that is not in the test data? If so, how do you want to handle it?

- There could be duplicates. If there are duplicates, they should have the same value for 'Group TIN'n meaning that it can either be overwritten with the duplicate value or ignored.

Sorry for the late reply - I didn't get a notification that you sent a reply.

japelin
Rhodochrosite | Level 12

First of all, if TIN and GROUP TIN are the same variable attribute, WARNING will not appear.
Also, if the keys of the duplicate records are the same, you can use distinct as follows to resolve the error.

If the key variables are ProviderID "and" AddressID, change the logical operator to AND instead of OR.

 

proc sql;
    update file01 set TIN=(select GroupTIN from file06
    where file01.ProviderID = file06.ProviderID OR
          file01.AddressID = file06.AddressID )
    ;
quit;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1020 views
  • 0 likes
  • 2 in conversation