BookmarkSubscribeRSS Feed
InspectahDex
Obsidian | Level 7

I have four original datasets:

Dataset A, B, C, D

 

I also have a final dataset that combines various elements from those four that I'll call FinalDataset.

 

There is a variable in all four original datasets and my FinalDataset that is called NPI.

 

I want to write to a new variable in my FinalDataset called "NPIfound" if the NPI value in the FinalDataset is also found in one of the original datasets.

In other words, If datasetA.NPI = finaldataset.NPI, then output the value "datasetA" to NPIfound.

If that same NPI value is also found in datasetB then I want it to add to NPIfound so that it says "datasetA, datasetB" in the FinalDataset.

3 REPLIES 3
Reeza
Super User
Did you merge the data sets together using a data step merge or a different method? If you used a merge it's trivial to add that in using the IN option.
InspectahDex
Obsidian | Level 7

I merged the datasets together using a data step merge. Why is it trivial to add that variable in using a IN option? Sorry, I'm new.

Astounding
PROC Star

Here's a simple way.  Assuming all your data sets are already sorted by NPI:

data FinalDataset;
   merge a (in=from_a) b (in=from_b) c (in=from_c) d (in=from_d);
   by NPI;
   NPIfound = cats(from_a, from_b, from_c, from_d);
run;

By choosing the simple route, you don't get exactly what you asked for.  Instead,  you get a series of 0's and 1's.  For example:

 

1010 represents found in A, not found in B, found in C, not found in D.

0001 represents found in D only.

 

Adding to the complexity by adjusting the program is again relatively easy, but this version might be easier to work with.

 

Learning about MERGE and in= variables is a basic tool you will use often.  It is worth the time to learn.