12-20-2016 10:03 AM
I'm looking to get some input on how to best approach this project I am working on. I have a SQL web application where the data is inputted in to 42 different data tables. The primary key to this dataset is the ID variable. In every table there is a variable "RecUpdatedOn" that is a date/time stamp whenever an item is updated. I'd like to create a report that will tell me when the last time each case was updated. It could be in any of the data tables.
Any suggestions as to how to start this, it would be greatly appreciated
12-20-2016 10:14 AM
Well, would be good to get further details, but an overall concept would be either set all your data together, by group id, sorted and output only if last.id:
data want; set mydatasets: (keep=id recupdatedon); /* Assumes all datasets named with prefix mydatasets and contains the given variables all the same type */ run; proc sort data=want; by id recupdatedon; run; data want; set want; by id; if last.id then output; run;
Hard to say exactly as not sure on what data you have. Posting examples as datasteps in the post is good. If they are sorted already then it might be possible to just updatee the first dataset to take last.id from each file.
12-20-2016 10:28 AM
You may need to expand on what you mean by "each case". HOW do you identify a "case".
If you can produce a "case identifier" which I call ItemId in the code below this may do some of what you want.
data temp; set dataset1 (keep= RecUpdatedOn ItemId) dataset2 (keep= RecUpdatedOn ItemId) dataset3 (keep= RecUpdatedOn ItemId) ... dataset42 (keep= RecUpdatedOn ItemId) indsname=dsn ; length datasource $ 41; datasource=dsn; run; proc sort data=temp; by itemid RecUpdatedOn; run; data want; set temp; by itemid RecUpdatedOn; if last.itemid; run;
You may have "ItemId" be multiple variables as long as they are in each dataset and of the same type. Dataset options like Rename may help. Or you may have to do more preprocessing out of each of the data sets to standardize the "case" identification
12-20-2016 03:45 PM
This is the perfect first step! Thank you, it gave me exactly what I wanted but I realized I need two more variables to make this report a more rich. The two variables are COUNSELOR and STATUS. Now these variables are only avalable in one of the tables. So when I run your code and include those two variables, they only appear in the first set of IDs that are imported. Is there anyway to match and join so those items are not missing?
data temp; set dataset1 (keep= RecUpdatedOn ItemId COUNSELOR STATUS) dataset2 (keep= RecUpdatedOn ItemId) dataset3 (keep= RecUpdatedOn ItemId) ... dataset42 (keep= RecUpdatedOn ItemId) indsname=dsn ; length datasource $ 41; datasource=dsn; run;
12-20-2016 04:49 PM
If you have a table that has the Counselor and Status linked to "ItemId" somewhere then you can use a look up to bring that into the combined data.
Something like this:
proc sql; create table counselor as select distinct counselor, status, itemid from datasetwithcounselor; quit; /* then merge the values onto another set*/ proc sql; create table combined as select a.*, b.counselor, b.status from bigset as a left join counselor as b on a.itemid = b.itemid; quit;
If ItemId is actually mutiple variables then use each in the ON clause:
on a.variable1 = b.variable1 AND a.variable2=b.variable2
make sure that all of the variables are in the Counselor data set! The names need not be exactly the same but make sure the names do match the one in the data sets. The A and B are aliases so you do not have to keep writing the longname of the dataset everytime you need to reference the table.
Note that you may need to check that spelling is consistent to avoid making duplicats. If you have a counselor J Smith and John Smith that would be two counselors as far as the program is concerned but your usage may intend them to be the same counselor. So if both get associated with the ItemId then there are two different counselors involved and the look up will create two records for each Itemid that involves those two counselors.