SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Combining 2 data files using multiple variables

Reply
Regular Learner
Posts: 1

Combining 2 data files using multiple variables

I have 2 files of different types of data that I am trying to combine. One file includes information on patient cancer status and the other file includes information on patient demographic and assessment information. The two files both have patient last name and date of service in common (along with a few other variables-but these two are the ones I am trying to combine on), I cannot combine on just patient last name because some patients appear multiple times in the both datasets with different dates of service, so both last name and date of service are needed to identify the specific cases and match the information. My end goal is to have one file that has both the information on patient cancer status and on patient demographic & assessment information in a single file, with each row representing 1 case (and eliminating any patients who don't have a complete set of data from both portions). Is there a way to do this in the data merge steps (and to delete any cases that do not have data available from both data sets)?

Frequent Contributor
Posts: 102

Re: Combining 2 data files using multiple variables

 

Your code would be something like this:

 

/* I have called the patient date set pat and the service data serv.

   merge moves info from the right dataset (serv) into the left dataset (pat)  */;

 

proc sort data= pat;  by lastname dateserv;

 

proc sort data=serv;  by lastname dateserv;

 

data want;   merge pat (in=inpat)  serv (in=inserv) ;  by lastname dateserv;

 

if inpat and inserv then output;

 

Jim

Super User
Posts: 13,292

Re: Combining 2 data files using multiple variables

I am a bit concerned that you are only going to use a LAST name and service date.

It is extremely likely that common last names, Smith or Rodriguez for a pair of examples, may have multiple patients associated on any given date. John Smith, Mary Smith, etc. And if you have new born children then multiple births will cause issues.

I suspect that you want at least one other variable. I know that my hospital asks me for date of birth every time they start to do anything: draw blood, take a weight, blood pressure, administer medication ...

Or see if your organization has a patient id or account variable that is supposed to be unique for each patient.

 

Names are often one of the last things you want to work with if possible. Also some folks change last names, data entry errors.

Ask a Question
Discussion stats
  • 2 replies
  • 154 views
  • 1 like
  • 3 in conversation