BookmarkSubscribeRSS Feed
Calcite | Level 5

I am trying to merge the following two datasets by the NPI number on SAS v9.4 (a unique number identifying individual physicians):

(This datasets lists procedures completed by a physician)


(This dataset lists drugs prescribed by a physician)


I sorted each dataset by the NPI number (after renaming the NPI variables so they are the same) and then merged by the NPI number. This is the problem I am running into:


Say we have Dr. Smith, and Dr. Smith prescribed 5 different drugs during 2019 and completed 10 different procedures in 2019. When I go to merge the dataset, Dr. Smith's procedures 1-5 are merged but procedures 5-10 are subsequently deleted. 


How can I solve this issue?

Super User Tom
Super User

How to solve it depends on what you are trying to do.  How do you want to combine 5 drugs with 10 procedures?  What type of output do you need?

Calcite | Level 5

This is the way the input is currently structured:


1Drug 1
1Drug 2
1Drug 3
1Drug 4
1Drug 5





1Procedure 1
1Procedure 2
1Procedure 3
1Procedure 4
1Procedure 5
1Procedure 6
1Procedure 7
1Procedure 8
1Procedure 9
1Procedure 10


I would like the following output:


1Drug 1 Drug 2Drug 3Drug 4Drug 5Procedure 1Procedure 2Procedure 3Procedure 4Procedure 5Procedure 6Procedure 7Procedure 8Procedure 9Procedure 10Procedure 11
Calcite | Level 5

How would I transpose to the wide format?

Super User

@gabriellep38 wrote:

How would I transpose to the wide format?

See this, using your example data:

data drugs;
infile datalines dlm="|";
input NPI $ drugtype :$20.;
1|Drug 1
1|Drug 2
1|Drug 3
1|Drug 4
1|Drug 5

data procedures;
infile datalines dlm="|";
input NPI $ proctype :$20.;
1|Procedure 1
1|Procedure 2
1|Procedure 3
1|Procedure 4
1|Procedure 5
1|Procedure 6
1|Procedure 7
1|Procedure 8
1|Procedure 9
1|Procedure 10

data all;
  drugs (rename=(drugtype=value))
  procedures (rename=(proctype=value))

proc transpose
  out=want (drop=_name_)
by npi;
var value;
Fluorite | Level 6

It may be that you need to compare the lists by more than a Doctor's last name. For example, in the services dataset, there is a Nirali Patel (Rndrng_NPI = '1265846356'), an Anesthesiologist Assistant in Washington, DC, while in the Drugs dataset there is a different Nirali Patel (Prscrbr_NPI = '1104322072'), a student in an organized health care program in Washington, DC. 

Calcite | Level 5

Thanks this is a good point.

Super User

If you used a data step merge you likely have problems. Data step merge only handles one data set with multiples of the By variable. If both sets have multiples then one of them gets truncated to match the smaller and likely not where you would like.


Without some sample data, made up is fine as long as it behaves like your real data, and show what you expect for a result using the sample data. Best is to provide the example and final data in the form of data step code so we can run tests with it.


Links like that require us to do a lot of work, much less anything like creating accounts just to access it.


From your limited description I doubt that merging ALL drugs with ALL procedures for a given doctor is appropriate anyway.


Meteorite | Level 14

The issue you are facing is typical of many to many merge using SAS data step.
Better to use Proc SQL.
In case it is the intention to use data step, then have a look at this paper. 

Calcite | Level 5

Can Proc SQL be used to merge multiple datasets. I am hoping to merge both the services and drug dataset from all 6 years, for a total of 12 datasets combined into 1.

Meteorite | Level 14


Calcite | Level 5

I tried to use proc sql but it was requiring 100s of GBs in order to do the data processing - so the program would not run. Is there another way to do this that does not require so much storage space?

Super User Tom
Super User

To just go from tall to wide the most efficient way is to use a data step.

You will need to set an upper limit on the number of variables you will create so you can define the arrays.  Define the arrays to match the definition of the variables in the original datasets.  So if PROCEDURE is defined as length $10 then add the $10 to the ARRAY for P so that the new variables are also defined as $10.

data want;
   set procedures(in=in1) drugs(in=in2);
   by npi;
   array p procedure1-procedure20;
   array d drug1-drug20;
   if first.npi then call missing(of np p[*] nd d[*]);
   if in1 then do;
     p[np] = procedure ;
   if in2 then do;
     d[nd] = drug ;
   if last.npi;
   drop procedure drug;

Meteorite | Level 14

Hello @gabriellep38 
The provider and service dataset has approximately 1 million rows and the the part D prescribers has approximately 2.5 million rows and both together may have a size of approximately 7 GB (as csv after unzipping). (I have considered the  data for 2019 year only.)
I strongly recommended that the data must be loaded on to an RDMS database. If there are difficulties , consider using either opensource (MySQL /PostgreSQL) or may be Oracle Express (The 18C express has 12 GB database size restriction) depending upon your preference.

Your process will be simplified. The Oracle listagg function Or the MySQL GROUP_CONCAT functions will  greatly simplify the "tall to wide " transpose as you have shown in your second post. (NPI, drugs and Procedures in a row).
Once you get the final dataset in a form you need, you can do further processing in SAS.


Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 6 in conversation