BookmarkSubscribeRSS Feed
gabriellep38
Calcite | Level 5

I am trying to merge the following two datasets by the NPI number on SAS v9.4 (a unique number identifying individual physicians): 

 

https://data.cms.gov/provider-summary-by-type-of-service/medicare-physician-other-practitioners/medi...

(This datasets lists procedures completed by a physician)

and 

 

https://data.cms.gov/provider-summary-by-type-of-service/medicare-part-d-prescribers/medicare-part-d...

(This dataset lists drugs prescribed by a physician)

 

I sorted each dataset by the NPI number (after renaming the NPI variables so they are the same) and then merged by the NPI number. This is the problem I am running into:

 

Say we have Dr. Smith, and Dr. Smith prescribed 5 different drugs during 2019 and completed 10 different procedures in 2019. When I go to merge the dataset, Dr. Smith's procedures 1-5 are merged but procedures 5-10 are subsequently deleted. 

 

How can I solve this issue?

14 REPLIES 14
Tom
Super User Tom
Super User

How to solve it depends on what you are trying to do.  How do you want to combine 5 drugs with 10 procedures?  What type of output do you need?

gabriellep38
Calcite | Level 5

This is the way the input is currently structured:

 

NPI 
1Drug 1
1Drug 2
1Drug 3
1Drug 4
1Drug 5

 

 

and 

 

NPI 
1Procedure 1
1Procedure 2
1Procedure 3
1Procedure 4
1Procedure 5
1Procedure 6
1Procedure 7
1Procedure 8
1Procedure 9
1Procedure 10

 

I would like the following output:

 

NPI                
1Drug 1 Drug 2Drug 3Drug 4Drug 5Procedure 1Procedure 2Procedure 3Procedure 4Procedure 5Procedure 6Procedure 7Procedure 8Procedure 9Procedure 10Procedure 11
gabriellep38
Calcite | Level 5

How would I transpose to the wide format?

Kurt_Bremser
Super User

@gabriellep38 wrote:

How would I transpose to the wide format?


See this, using your example data:

data drugs;
infile datalines dlm="|";
input NPI $ drugtype :$20.;
datalines;
1|Drug 1
1|Drug 2
1|Drug 3
1|Drug 4
1|Drug 5
;

data procedures;
infile datalines dlm="|";
input NPI $ proctype :$20.;
datalines;
1|Procedure 1
1|Procedure 2
1|Procedure 3
1|Procedure 4
1|Procedure 5
1|Procedure 6
1|Procedure 7
1|Procedure 8
1|Procedure 9
1|Procedure 10
;

data all;
set
  drugs (rename=(drugtype=value))
  procedures (rename=(proctype=value))
;
run;

proc transpose
  data=all
  out=want (drop=_name_)
;
by npi;
var value;
run;
benjamin_2018
Fluorite | Level 6

It may be that you need to compare the lists by more than a Doctor's last name. For example, in the services dataset, there is a Nirali Patel (Rndrng_NPI = '1265846356'), an Anesthesiologist Assistant in Washington, DC, while in the Drugs dataset there is a different Nirali Patel (Prscrbr_NPI = '1104322072'), a student in an organized health care program in Washington, DC. 

gabriellep38
Calcite | Level 5

Thanks this is a good point.

ballardw
Super User

If you used a data step merge you likely have problems. Data step merge only handles one data set with multiples of the By variable. If both sets have multiples then one of them gets truncated to match the smaller and likely not where you would like.

 

Without some sample data, made up is fine as long as it behaves like your real data, and show what you expect for a result using the sample data. Best is to provide the example and final data in the form of data step code so we can run tests with it.

 

Links like that require us to do a lot of work, much less anything like creating accounts just to access it.

 

From your limited description I doubt that merging ALL drugs with ALL procedures for a given doctor is appropriate anyway.

 

Sajid01
Meteorite | Level 14

The issue you are facing is typical of many to many merge using SAS data step.
Better to use Proc SQL.
In case it is the intention to use data step, then have a look at this paper.
https://www.pharmasug.org/proceedings/2011/TU/PharmaSUG-2011-TU05.pdf 

gabriellep38
Calcite | Level 5

Can Proc SQL be used to merge multiple datasets. I am hoping to merge both the services and drug dataset from all 6 years, for a total of 12 datasets combined into 1.

Sajid01
Meteorite | Level 14

Yes

gabriellep38
Calcite | Level 5

I tried to use proc sql but it was requiring 100s of GBs in order to do the data processing - so the program would not run. Is there another way to do this that does not require so much storage space?

Tom
Super User Tom
Super User

To just go from tall to wide the most efficient way is to use a data step.

You will need to set an upper limit on the number of variables you will create so you can define the arrays.  Define the arrays to match the definition of the variables in the original datasets.  So if PROCEDURE is defined as length $10 then add the $10 to the ARRAY for P so that the new variables are also defined as $10.

data want;
   set procedures(in=in1) drugs(in=in2);
   by npi;
   array p procedure1-procedure20;
   array d drug1-drug20;
   if first.npi then call missing(of np p[*] nd d[*]);
   if in1 then do;
     np+1;
     p[np] = procedure ;
   end;
   if in2 then do;
     nd+1;
     d[nd] = drug ;
   end;
   if last.npi;
   drop procedure drug;
run;

Sajid01
Meteorite | Level 14

Hello @gabriellep38 
The provider and service dataset has approximately 1 million rows and the the part D prescribers has approximately 2.5 million rows and both together may have a size of approximately 7 GB (as csv after unzipping). (I have considered the  data for 2019 year only.)
I strongly recommended that the data must be loaded on to an RDMS database. If there are difficulties , consider using either opensource (MySQL /PostgreSQL) or may be Oracle Express (The 18C express has 12 GB database size restriction) depending upon your preference.

Your process will be simplified. The Oracle listagg function Or the MySQL GROUP_CONCAT functions will  greatly simplify the "tall to wide " transpose as you have shown in your second post. (NPI, drugs and Procedures in a row).
Once you get the final dataset in a form you need, you can do further processing in SAS.

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 1797 views
  • 0 likes
  • 6 in conversation