BookmarkSubscribeRSS Feed
alotaibifm
Calcite | Level 5

Hi everyone, 

 

I am struggling with merging two datasets by date. I have the following datasets: 

 

Dataset DEMO:

PT_ID Visit_Date 

 

Dataset Diss:

PT_ID Diagnosis_date ICD-Code 

 

 

DATA DEMOA;
MERGE DEMO (in=a)
Diss (in=b);
BY PT_ID;
if a and b;
run;

 

The resulting dataset looks like this:

 

PT_ID Visit_Date     Diagnosis_date   ICD-Code

1        31OCT2016    30OCT2016        599

1        31OCT2016    31OCT2016        599

2        29NOV2012   29NOV2012        466

3        07SEP2014    24JUN2013        486

 

 

I have two problems here: 

1) The date in both files are not matched SOMETIMES, Is there any way that I can round the date if the (Diagnosis date) one or two days after the (ER Visit)?? 

 

2) How can I merge the two files without repeating the observations more than once if the patient has matched date? 

 

Thanks, 

 

 

 

40 REPLIES 40
VDD
Ammonite | Level 13 VDD
Ammonite | Level 13

First you need to change Diagnosis_date or visit_date to the same names and add that date column to your merge.

 

 

 

 

Dataset DEMO:
PT_ID Visit_Date 
 
Dataset Diss:
PT_ID visit_date /*dummy name to merge on */ ICD-Code 
 
 
DATA DEMOA;
MERGE DEMO (in=a)
Diss (in=b);
BY PT_ID visit_date;
if a and b;
run;
alotaibifm
Calcite | Level 5

It didn't work. I have tried that option before, but gave me only 4 observations in the merged dataset, which is impossible that over 40,000 observations, only 4 observations were matched by date. I have doubled checked manually and there are a lot of observations have the same dates between the two files. 

VDD
Ammonite | Level 13 VDD
Ammonite | Level 13

@alotaibifmPlease post sample data in the form of a SAS dataset with datalines that represent your data in the 2 datasets that you are referencing?  You need only represent 7-10 records in each dataset to express your concerns.

 

alotaibifm
Calcite | Level 5

Attached a sample of my datasets. 

ballardw
Super User

@alotaibifm wrote:

Attached a sample of my datasets. 


Your data shows signs of coming from Excel with date time values.

When I change the format from DATE9 to best12. I see values like  20602.772222 for visit date.

You need to truncate the values so that the values are integers. The DATE9. only displays the formatted integer portion of the data but comparisons are using the decimal portion as well.

Something like this in data step modifying your data is needed.

Visit_date = floor(Visit_date);

 

Reeza
Super User
If he imported from CSV files that shouldn't have happened....
ballardw
Super User

@Reeza wrote:
If he imported from CSV files that shouldn't have happened....

I agree that the prior steps are somewhat suspect with such values. Unless he didn't set the entire column to the correct display before doing a SAVE AS.

Or perhaps another step such as taking a MEAN or similar statistic of the date field.

Reeza
Super User
Is there any errors in your log? Are the variable types and formats the same?
alotaibifm
Calcite | Level 5

The codes are correct but merge something I don't want. Thus, no errors on the log. 

Reeza
Super User
So when I set the formats attached to the variables, everything works correctly. Otherwise it doesn't.

You can set the formats using:

format _numeric_ 8.;

libname demo 'C:\Users\fareeza.khurshed\Downloads';
options nofmterr;

proc sort data=demo.erDiss; by pt_id visit_date;
proc sort data=demo.erdemo; by pt_id visit_date;
data demo;
set demo.erdiss;
format _numeric_ 8.;
data diss;
set demo.erdiss;
format _numeric_ 8.;
run;




DATA DEMOA;
MERGE demo (in=a)
Diss (in=b);
BY PT_ID visit_date;
if a and b;
run;
alotaibifm
Calcite | Level 5

I merged the same code without changing the format and gave me the same results. only 4 observations were matched??!! I am sure 100% that there are more than 4 observations should be matched. 

 

Thanks @Reeza

Reeza
Super User
It's something with your data, but not sure what. Where did you originally get the data from?
alotaibifm
Calcite | Level 5

I got them from the Bioinformatics department in our institution. I have received the dataset as an excel format and each file is separate than the other. 

Reeza
Super User
Are they actually xlsx files or csv? Did you import them using PROC IMPORT? If so, try converting the files to CSV, re-importing and I suspect this issue will go away.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 40 replies
  • 1241 views
  • 6 likes
  • 4 in conversation