Hi,
Im trying to create an incremental load process from a source table which contains over 40m rows of data.
At the moment i am utilizing the created date field to extract and load the latest rows to my final SAS mart but the problem is its not getting the correct rows in, I'll explain.
so the source table fields (Answers_Data table)
ID Qst_Id Yes/No Ver_no User_id Created_dt
1 112 1 1 345 01Jan19:00:00:00
1 113 2 1 345 01Jan19:00:00:00
1 113 1 1 456 01Jan19:00:00:00
1 114 1 1 345 01Jan19:00:00:00
1 115 2 1 456 01Jan19:00:00:00
1 112 1 2 345 02Jan19:00:00:00
1 113 2 2 345 02Jan19:00:00:00
1 113 1 2 678 02Jan19:00:00:00
1 114 1 2 345 02Jan19:00:00:00
1 115 2 2 345 02Jan19:00:00:00
My goal is to load the latest answers to the final data mart. by using the ver_no field
so i first create a subquery to select the latest version answers then to load the data a data step update transformation, with the by variables selected as ID, Qst_Id, User_id
running this query the first time loads the first version answers data (i.e. ver_no 1) and then when i run it the following day to load the latest answers (ie. ver_no 2) it keeps the old record from ver no 1 (highlighted in bold below)
the final table ends up looking like this.
ID Qst_Id Yes/No Ver_no User_id Created_dt
1 112 1 2 345 02Jan19:00:00:00
1 113 2 2 345 02Jan19:00:00:00
1 113 1 1 456 01Jan19:00:00:00
1 113 1 2 678 02Jan19:00:00:00
1 114 1 2 345 02Jan19:00:00:00
1 115 2 2 345 02Jan19:00:00:00
what is happening is that we have two user_id;s answering the same question which is fine, but in the second version of answers a different user_id (678) is answering the same question and we want to overwrite the old answer by user_id(456) with the new answer by user_id (678) so that the final table looks like this
ID Qst_Id Yes/No Ver_no User_id Created_dt
1 112 1 2 345 02Jan19:00:00:00
1 113 2 2 345 02Jan19:00:00:00
1 113 1 2 678 02Jan19:00:00:00
1 114 1 2 345 02Jan19:00:00:00
1 115 2 2 345 02Jan19:00:00:00
i understand its because it cant find the matching user_id value when using the data step update with the by variables i selected but how can i load this so that it overwrites the existing values with the new ones.
At the moment the process is doing a full data load which is time consuming and resource intensive hence why im trying to load it incrementally.
Any help would be appreciated.
Thanks
Sample
Old is verno=1
increment verno=2
if my understanding is correct, it's no biggie
data old increment;
input (ID Qst_Id Yes_No Ver_no User_id ) ($) Created_dt :datetime20.;
format Created_dt datetime20.;
if Ver_no='1' then output old;else output increment;
cards;
1 112 1 1 345 01Jan19:00:00:00
1 113 2 1 345 01Jan19:00:00:00
1 113 1 1 456 01Jan19:00:00:00
1 114 1 1 345 01Jan19:00:00:00
1 115 2 1 456 01Jan19:00:00:00
1 112 1 2 345 02Jan19:00:00:00
1 113 2 2 345 02Jan19:00:00:00
1 113 1 2 678 02Jan19:00:00:00
1 114 1 2 345 02Jan19:00:00:00
1 115 2 2 345 02Jan19:00:00:00
;
data want;
if _n_=1 then do;
dcl hash H (dataset:'increment',multidata:'y') ;
h.definekey ("User_id",'id','Qst_Id') ;
h.definedata (all:'y') ;
h.definedone () ;
dcl hiter hi('h');
end;
set old end=lr;
rc=h.find();
if rc=0 then do;
output;
rc1=h.remove();
end;
if lr ;
do while(hi.next()=0);
output;
end;
drop rc:;
run;
And then a sort would get the id's in order
proc sort data=want;
by id qst_id;
run;
what is happening is that we have two user_id;s answering the same question which is fine, but in the second version of answers a different user_id (678) is answering the same question and we want to overwrite the old answer by user_id(456) with the new answer by user_id (678) so that the final table looks like this
How do you know this? What fields tell you this record overwrites the user record? Same date/question only? Location of record?
I'm not clear on what you want. It appears you want, for each QST_ID, all records that fall on the latest date. Is that correct? (Or maybe you want, for each QST_ID, all records that have the latest version).
And is it also true that your data are sorted by date?
And does each date possess at least one instance of each QST_ID?
Finally, in what order to you want the resulting data presented?
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.