Some files are automatically loaded during every single day's morning DDS load around 5AM.
VALID_FROM_DTTM and VALID_TO_DTTM are set to the DDS Business Date, which is the previous day's date. Eg: When loading on January 2, 2019, the Business Date shall be set to January 1, 2019.
At roughly 3PM each business day, These files are copied by a computer operator to ftp location with the names MAJOR_FILE_<yyyy-MM-dd-hh-mm-ss>.csv and MINOR_FILE_<yyyy-MM-dd-hh-mm-ss>.csv.
The file is copied with the last business day's data. So, for example, if January 2, 2024 was a Tuesday, we would receive data from December 29, 2023 (Friday - this is because January 1 is a holiday and weekends are not business days). If January 3, 2024 was a Wednesday, we would receive data for January 2, 2024.
In some circumstances, we may go several days without having files downloaded, in which case several files will be dropped at the same time for different days.
These files are then manually renamed to X_EXF_MAJOR_FILE.csv and X_EXF_MINOR_FILE.csv and copied from the ftp location to a different location, where they overwrite the existing files
Implications and Issue
We load these data seven days a week but only receive files for 4-5 days per week. This means that we regularly load certain files more than once.
We use the previous day's date when loading these data, rather than the actual as-of date of the file. This means that we regularly load files with the incorrect date specified. See the chart below for an example
ProcessingDate
Weekday
Holiday?
Actual Date of Data
VALID_FROM_DTTM
12/29/2023
Friday
No
12/27/2023
12/28/2023
12/30/2023
Saturday
No
12/28/2023
12/29/2023
12/31/2023
Sunday
No
12/28/2023
12/30/2023
01/01/2024
Monday
Yes
12/28/2023
12/31/2023
01/02/2024
Tuesday
No
12/28/2023
01/01/2024
01/03/2024
Wednesday
No
12/29/2023
01/02/2024
01/04/2024
Thursday
No
01/02/2024
01/03/2024
This demonstrates that we load these data 4 times for December 28th, and each time it is loaded it is assigned a different invalid date.
QUESTION:
Is there a way we can use a conditional start to ensure the date we are loading is the right one and not just loading any file?
why not get the modified datetime for the file and compare it to what you already have loaded (store the mod dttm in your dataset) to make sure you don't load the same file twice?
why not get the modified datetime for the file and compare it to what you already have loaded (store the mod dttm in your dataset) to make sure you don't load the same file twice?
Thanks. This is the solution we eventually settled for.
Ready to join fellow brilliant minds for the SAS Hackathon?
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.