Hi everyone,
I am struggling with merging two datasets by date. I have the following datasets:
Dataset DEMO:
PT_ID Visit_Date
Dataset Diss:
PT_ID Diagnosis_date ICD-Code
DATA DEMOA;
MERGE DEMO (in=a)
Diss (in=b);
BY PT_ID;
if a and b;
run;
The resulting dataset looks like this:
PT_ID Visit_Date Diagnosis_date ICD-Code
1 31OCT2016 30OCT2016 599
1 31OCT2016 31OCT2016 599
2 29NOV2012 29NOV2012 466
3 07SEP2014 24JUN2013 486
I have two problems here:
1) The date in both files are not matched SOMETIMES, Is there any way that I can round the date if the (Diagnosis date) one or two days after the (ER Visit)??
2) How can I merge the two files without repeating the observations more than once if the patient has matched date?
Thanks,
First you need to change Diagnosis_date or visit_date to the same names and add that date column to your merge.
Dataset DEMO:
PT_ID Visit_Date
Dataset Diss:
PT_ID visit_date /*dummy name to merge on */ ICD-Code
DATA DEMOA;
MERGE DEMO (in=a)
Diss (in=b);
BY PT_ID visit_date;
if a and b;
run;
It didn't work. I have tried that option before, but gave me only 4 observations in the merged dataset, which is impossible that over 40,000 observations, only 4 observations were matched by date. I have doubled checked manually and there are a lot of observations have the same dates between the two files.
@alotaibifmPlease post sample data in the form of a SAS dataset with datalines that represent your data in the 2 datasets that you are referencing? You need only represent 7-10 records in each dataset to express your concerns.
Attached a sample of my datasets.
@alotaibifm wrote:
Attached a sample of my datasets.
Your data shows signs of coming from Excel with date time values.
When I change the format from DATE9 to best12. I see values like 20602.772222 for visit date.
You need to truncate the values so that the values are integers. The DATE9. only displays the formatted integer portion of the data but comparisons are using the decimal portion as well.
Something like this in data step modifying your data is needed.
Visit_date = floor(Visit_date);
@Reeza wrote:
If he imported from CSV files that shouldn't have happened....
I agree that the prior steps are somewhat suspect with such values. Unless he didn't set the entire column to the correct display before doing a SAVE AS.
Or perhaps another step such as taking a MEAN or similar statistic of the date field.
The codes are correct but merge something I don't want. Thus, no errors on the log.
I merged the same code without changing the format and gave me the same results. only 4 observations were matched??!! I am sure 100% that there are more than 4 observations should be matched.
Thanks @Reeza
I got them from the Bioinformatics department in our institution. I have received the dataset as an excel format and each file is separate than the other.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.