SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

SAS DI using MAX function with timestamp

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 12
Accepted Solution

SAS DI using MAX function with timestamp

[ Edited ]

Community,

 

I am using a subquery in SAS DI to run a MAX function against timestamps of different categorical names from a single table. This is a near real-time table for multiple power meters. However, not all meters have the same latest timestamp at any given time. These meters do not report at the same frequency or same time. See the example below. Since my last times are not synchronous the MAX function will select the singular or combination of meters that have the latest timestamp and drops the other(s). I am using a WHERE->HAVING in the subquery to select the meters of interest (WHERE) and a HAVING function to run the MAX function.  My goal is to select all meters of interest regardless if the times are synchronous. I want to query the selected meters and the time of the last report. Thx.

 

MeterName, UTCDateTime, kW

meter#1, 27APR2018:14:45:00.000, 5

meter#2, 27APR2018:14:45:00.000, 3

meter#3, 27APR2018:14:45:00.000, 7

meter#4, 27APR2018:13:45:00.000, 2

meter#5, 27APR2018:14:00:00.000, 1

 

diagram.pngtable_view.pngtable_view_having.png


Accepted Solutions
Solution
‎04-30-2018 10:23 AM
Respected Advisor
Posts: 4,695

Re: SAS DI using MAX function with timestamp

@DWhite

I believe you're missing a GROUP BY. 

If you just want to select the most recent measure per meter then you need to set-up the DIS transformation in a way that it generates a SQL as below.

data have;
  infile datalines dsd truncover;
  input MeterName $ UTCDateTime:datetime23.3 kW;
  format UTCDateTime datetime23.3;
  datalines;
meter#1, 27APR2018:14:45:00.000, 5
meter#2, 27APR2018:14:45:00.000, 3
meter#3, 27APR2018:14:45:00.000, 7
meter#4, 27APR2018:13:45:00.000, 2
meter#5, 27APR2018:14:00:00.000, 1
meter#1, 27APR2018:13:45:00.000, 15
meter#2, 27APR2018:13:45:00.000, 13
meter#3, 27APR2018:13:45:00.000, 17
meter#4, 27APR2018:12:45:00.000, 12
meter#5, 27APR2018:13:00:00.000, 11
;
run;

proc sql;
  create table want as
    select 
      MeterName,
      UTCDateTime,
      kw
    from have
    group by MeterName
    having max(UTCDateTime)=UTCDateTime
  ;
quit;

 

For your WHERE clause: I'd select an IN operator so you can list all the values on a single line.

 

Should performance become a problem then you could create an intermediary table with the latest records to feed into your SQL. 

Besides of using a SQL for this another efficient way is to use a combination of two SORT nodes generating SAS syntax as below.

proc sort data=have out=inter(sortedby=MeterName);
  by MeterName Descending UTCDateTime;
  where MeterName in ('meter#1','meter#2','meter#3','meter#4','meter#5',);
run;

proc sort data=have out=want2 nodupkey;
  by MeterName;
run;

....and last but not least: I personally find sub-selects in the SQL Join node hard to set-up, "read" and hard to maintain. For this reason I prefer to implement sub-select logic via a separate node where I just implement the logic of the "sub-select" table as a VIEW and then use the view in my main join.

View solution in original post


All Replies
Valued Guide
Posts: 560

Re: SAS DI using MAX function with timestamp

[ Edited ]

You need to use DATEPART() function.

something like: HAVING DATEPART(DATE_TIME) = MAX(DATEPART(DATE_TIME))

Thanks,
Suryakiran
Occasional Contributor
Posts: 12

Re: SAS DI using MAX function with timestamp

Posted in reply to SuryaKiran

Hi Suryakiran,

 

I gave it a go "DATEPART(DATE_TIME) = MAX(DATEPART(DATE_TIME))" and it broke the MAX function. My WHERE statement worked but the HAVING with the MAX function and DATEPART function statement appears to not have worked as I had dozens of the selected meters of multiple timestamps in the final table.

 

Without the DATEPART function I get three of five that have the latest reported time while leaving the other two meters out of the selection since the last reported times are prior to the first three. 

 

meter#1, 27APR2018:14:45:00.000, 5

meter#2, 27APR2018:14:45:00.000, 3

meter#3, 27APR2018:14:45:00.000, 7

 

 

Solution
‎04-30-2018 10:23 AM
Respected Advisor
Posts: 4,695

Re: SAS DI using MAX function with timestamp

@DWhite

I believe you're missing a GROUP BY. 

If you just want to select the most recent measure per meter then you need to set-up the DIS transformation in a way that it generates a SQL as below.

data have;
  infile datalines dsd truncover;
  input MeterName $ UTCDateTime:datetime23.3 kW;
  format UTCDateTime datetime23.3;
  datalines;
meter#1, 27APR2018:14:45:00.000, 5
meter#2, 27APR2018:14:45:00.000, 3
meter#3, 27APR2018:14:45:00.000, 7
meter#4, 27APR2018:13:45:00.000, 2
meter#5, 27APR2018:14:00:00.000, 1
meter#1, 27APR2018:13:45:00.000, 15
meter#2, 27APR2018:13:45:00.000, 13
meter#3, 27APR2018:13:45:00.000, 17
meter#4, 27APR2018:12:45:00.000, 12
meter#5, 27APR2018:13:00:00.000, 11
;
run;

proc sql;
  create table want as
    select 
      MeterName,
      UTCDateTime,
      kw
    from have
    group by MeterName
    having max(UTCDateTime)=UTCDateTime
  ;
quit;

 

For your WHERE clause: I'd select an IN operator so you can list all the values on a single line.

 

Should performance become a problem then you could create an intermediary table with the latest records to feed into your SQL. 

Besides of using a SQL for this another efficient way is to use a combination of two SORT nodes generating SAS syntax as below.

proc sort data=have out=inter(sortedby=MeterName);
  by MeterName Descending UTCDateTime;
  where MeterName in ('meter#1','meter#2','meter#3','meter#4','meter#5',);
run;

proc sort data=have out=want2 nodupkey;
  by MeterName;
run;

....and last but not least: I personally find sub-selects in the SQL Join node hard to set-up, "read" and hard to maintain. For this reason I prefer to implement sub-select logic via a separate node where I just implement the logic of the "sub-select" table as a VIEW and then use the view in my main join.

Occasional Contributor
Posts: 12

Re: SAS DI using MAX function with timestamp

Patrick,

 

Thanks for your comments and examples. I implemented the Group By and that looks good so far. I am finding that sometimes these meters are synchronous and other times they are not. 

 

Dave

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 150 views
  • 0 likes
  • 3 in conversation