Hi All,
I have 2 datasets
In the below dataset for a subject the visit start date and end date has been provided.
patient visit_start visit_end start_date end_date
1001 SCR WEEK0 01JAN2024 06JAN2024
In another dataset we have a set of dates recorded for the same subject.
patient date
1001 31dec2023
1001 01jan2024
1001 01jan2024
1001 04jan2024
1001 05jan2024
1001 05jan2024
The dates in the second table provided should be checked with the date range (start_date end_date) provided in first table and the dates which are not in range should be flagged and the missing calendar dates for the given range should be displayed.
Output: date range 01 jan2024 06jan2024
patient visit_start visit_end start_date end_date date out_of_range Missing_calendar days
1001 SCR WEEK0 01JAN2024 06JAN2024 31dec2023 Y 02JAN2024, 03JAN2024
1001 SCR WEEK0 01JAN2024 06JAN2024 01jan2024
1001 SCR WEEK0 01JAN2024 06JAN2024 01jan2024
1001 SCR WEEK0 01JAN2024 06JAN2024 04jan2024
1001 SCR WEEK0 01JAN2024 06JAN2024 07jan2024 Y
1001 SCR WEEK0 01JAN2024 06JAN2024 08jan2024 Y
So the above output should be applied for all the subjects in the dataset and also for all the other visit_start and visit_end date ranges as well.
Your help will be very much appreciated.
Can you show the code you have tried? I think for the first part, it would be helpful to post your example data as DATA steps with the CARDS statement. It looks like it should be straight forward to merge the two datasets, and check whether in date in inside our outside of the range.
The second part is trickier. One way to approach it is to make a dataset with all dates in the range. Then you can merge that against your data, to find those dates that are missing.
I do not have any code written but have only this data available.
You should, at least, post the data in usable form.
Computing "missing_calendar_days" has one difficulty: creating "missing_calendar_days" with a sufficient length. The variable is just used to display the missing dates and won't be used in any further calculation, right?
Let me provide the required sample data here which has data different than the one i posted earlier. But the concept is same. Please use the Qs dataset provided below so that it has missing dates
Data QS_ ;
do qsdat1 = '26aug2021'd to '08oct2021'd,"13apr2022"d to "15apr2022"d ;
patient = 1001015;
output ;
end ;
format qsdat1 is8601da.;
run ;
Data qs ;
set QS_ ;
if _n_ in (7,10,15,19,25,29) then delete;
RUN ;
Data SV_ ;
input patient 1-7 visit_start $9-15 visit_end $17-23 start_date1 $25-33 end_date1 $ 35-43 ;
datalines ;
1001015 scr1 week0 02sep2021 05oct2021
1001015 week22 week26 07mar2022 12apr2022
1001015 week48 week52 05sep2022 29sep2022
1001015 week74 week78 13mar2023 12apr2023
1001015 week100 week104 05sep2023 09oct2023
;
run;
Yes the missing dates are used only to display not for any calculations purposes.
So this is the same data as presented here
Are you two attending the same class, or in fact the same person?
Let's call your first dataset "patients", and the second "have".
data reference;
set patients;
do date = stat_date to end_date;
output;
end;
keep patient date;
run;
data want;
merge
have (in=h)
reference (in=r)
;
by patient date;
oor = (h and not r);
miss = (r and not h);
run;
You can now use the two boolean variables for further analysis.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.