Here's a simple way. Assuming all your data sets are already sorted by NPI:
data FinalDataset;
merge a (in=from_a) b (in=from_b) c (in=from_c) d (in=from_d);
by NPI;
NPIfound = cats(from_a, from_b, from_c, from_d);
run;
By choosing the simple route, you don't get exactly what you asked for. Instead, you get a series of 0's and 1's. For example:
1010 represents found in A, not found in B, found in C, not found in D.
0001 represents found in D only.
Adding to the complexity by adjusting the program is again relatively easy, but this version might be easier to work with.
Learning about MERGE and in= variables is a basic tool you will use often. It is worth the time to learn.