DATA Step, Macro, Functions and more

Creating a status report from multiple datasets

Reply
Contributor
Posts: 55

Creating a status report from multiple datasets

Hi everyone, 

 

I'm looking to get some input on how to best approach this project I am working on. I have a SQL web application where the data is inputted in to 42 different data tables. The primary key to this dataset is the ID variable. In every table there is a variable "RecUpdatedOn" that is a date/time stamp whenever an item is updated. I'd like to create a report that will tell me when the last time each case was updated. It could be in any of the data tables. 

 

 

 

Any suggestions as to how to start this, it would be greatly appreciated

 

Thanks!

Super User
Super User
Posts: 7,942

Re: Creating a status report from multiple datasets

Well, would be good to get further details, but an overall concept would be either set all your data together, by group id, sorted and output only if last.id:

data want;
  set mydatasets: (keep=id recupdatedon);   /* Assumes all datasets named with prefix mydatasets and contains the given variables all the same type */
run;
proc sort data=want;
  by id recupdatedon;
run;
data want;
  set want;
  by id;
  if last.id then output;
run;

Hard to say exactly as not sure on what data you have.  Posting examples as datasteps in the post is good.  If they are sorted already then it might be possible to just updatee the first dataset to take last.id from each file.  

Super User
Posts: 11,343

Re: Creating a status report from multiple datasets

You may need to expand on what you mean by "each case". HOW do you identify a "case".

If you can produce a "case identifier" which I call ItemId in the code below this may do some of what you want.

data temp;
   set 
      dataset1 (keep= RecUpdatedOn ItemId)
      dataset2 (keep= RecUpdatedOn ItemId)
      dataset3 (keep= RecUpdatedOn ItemId)
      ...
      dataset42 (keep= RecUpdatedOn ItemId)
      indsname=dsn
  ;
  length datasource $ 41;
  datasource=dsn;
run;

proc sort data=temp;
   by itemid RecUpdatedOn;
run;

data want;
   set temp;
   by itemid RecUpdatedOn;
   if last.itemid;
run;



You may have "ItemId" be multiple variables as long as they are in each dataset and of the same type. Dataset options like Rename may help. Or you may have to do more preprocessing out of each of the data sets to standardize the "case" identification

 

Contributor
Posts: 55

Re: Creating a status report from multiple datasets

Hi Ballard,

 

This is the perfect first step! Thank you, it gave me exactly what I wanted but I realized I need two more variables to make this report a more rich. The two variables are COUNSELOR and STATUS. Now these variables are only avalable in one of the tables. So when I run your code and include those two variables, they only appear in the first set of IDs that are imported. Is there anyway to match and join so those items are not missing?

 

data temp;
   set 
      dataset1 (keep= RecUpdatedOn ItemId COUNSELOR STATUS)
      dataset2 (keep= RecUpdatedOn ItemId)
      dataset3 (keep= RecUpdatedOn ItemId)
      ...
      dataset42 (keep= RecUpdatedOn ItemId)
      indsname=dsn
  ;
  length datasource $ 41;
  datasource=dsn;
run;
Super User
Posts: 11,343

Re: Creating a status report from multiple datasets

If you have a table that has the Counselor and Status linked to "ItemId" somewhere then you can use a look up to bring that into the combined data.

Something like this:

proc sql;
   create table counselor as
   select distinct counselor, status, itemid
   from datasetwithcounselor;
quit;

/* then merge the values onto another set*/

proc sql;
   create table combined as
   select a.*, b.counselor, b.status
   from bigset as a left join counselor as b
      on a.itemid = b.itemid;
quit;

If ItemId is actually mutiple variables then use each in the ON clause:

 

    on a.variable1 = b.variable1 AND a.variable2=b.variable2

make sure that all of the variables are in the Counselor data set! The names need not be exactly the same but make sure the names do match  the one in the data sets. The A and B are aliases so you do not have to keep writing the longname of the dataset everytime you need to reference the table.

 

Note that you may need to check that spelling is consistent to avoid making duplicats. If you have a counselor J Smith and John Smith that would be two counselors as far as the program is concerned but your usage may intend them to be the same counselor. So if both get associated with the ItemId then there are two different counselors involved and the look up will create two records for each Itemid that involves those two counselors.

Ask a Question
Discussion stats
  • 4 replies
  • 195 views
  • 0 likes
  • 3 in conversation