Hi, Im trying to create an incremental load process from a source table which contains over 40m rows of data. At the moment i am utilizing the created date field to extract and load the latest rows to my final SAS mart but the problem is its not getting the correct rows in, I'll explain. so the source table fields (Answers_Data table) ID Qst_Id Yes/No Ver_no User_id Created_dt 1 112 1 1 345 01Jan19:00:00:00 1 113 2 1 345 01Jan19:00:00:00 1 113 1 1 456 01Jan19:00:00:00 1 114 1 1 345 01Jan19:00:00:00 1 115 2 1 456 01Jan19:00:00:00 1 112 1 2 345 02Jan19:00:00:00 1 113 2 2 345 02Jan19:00:00:00 1 113 1 2 678 02Jan19:00:00:00 1 114 1 2 345 02Jan19:00:00:00 1 115 2 2 345 02Jan19:00:00:00 My goal is to load the latest answers to the final data mart. by using the ver_no field so i first create a subquery to select the latest version answers then to load the data a data step update transformation, with the by variables selected as ID, Qst_Id, User_id running this query the first time loads the first version answers data (i.e. ver_no 1) and then when i run it the following day to load the latest answers (ie. ver_no 2) it keeps the old record from ver no 1 (highlighted in bold below) the final table ends up looking like this. ID Qst_Id Yes/No Ver_no User_id Created_dt 1 112 1 2 345 02Jan19:00:00:00 1 113 2 2 345 02Jan19:00:00:00 1 113 1 1 456 01Jan19:00:00:00 1 113 1 2 678 02Jan19:00:00:00 1 114 1 2 345 02Jan19:00:00:00 1 115 2 2 345 02Jan19:00:00:00 what is happening is that we have two user_id;s answering the same question which is fine, but in the second version of answers a different user_id (678) is answering the same question and we want to overwrite the old answer by user_id(456) with the new answer by user_id (678) so that the final table looks like this ID Qst_Id Yes/No Ver_no User_id Created_dt 1 112 1 2 345 02Jan19:00:00:00 1 113 2 2 345 02Jan19:00:00:00 1 113 1 2 678 02Jan19:00:00:00 1 114 1 2 345 02Jan19:00:00:00 1 115 2 2 345 02Jan19:00:00:00 i understand its because it cant find the matching user_id value when using the data step update with the by variables i selected but how can i load this so that it overwrites the existing values with the new ones. At the moment the process is doing a full data load which is time consuming and resource intensive hence why im trying to load it incrementally. Any help would be appreciated. Thanks
... View more