- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
When I run the code below, I get the log Note "The execution of this query involves performing one or more Cartesian product joins that can not be optimized." on the LEFT JOIN code highlighted in Bold. Is there a better way to Join the dstDates and Sales tables?
My goal is to use the Daylight Savings Time when applicable. The salesDate is in UTC.
data dstDates;
format date1 datetime18.0 date2 datetime18.0;
input date1 datetime18.0
date2 datetime18.0
;
cards;
10MAR2019:02:00:00 03NOV2019:02:00:00
;
data sales;
format localStandard datetime18.0 localDaylight datetime18.0;
input salesDate datetime18.0
;
localStandard = tzoneu2s(salesDate,'EST');
localDaylight = tzoneu2s(salesDate, 'EDT');
cards;
08MAR2019:02:00:00
12MAR2019:02:00:00
;
proc print data=sales;
title "Sales";
format salesDate datetime18.0;
PROC SQL;
CREATE TABLE work.final1 as
SELECT sales.salesDate,
localStandard,
localDaylight,
CASE WHEN date1 IS NOT NULL then localDaylight else localStandard END AS newDate format datetime18.0
FROM sales sales
LEFT JOIN dstDates dst ON (sales.localStandard >= dst.date1 AND
sales.localDaylight <= dst.date2);
proc print data=final1;
format salesDate datetime18.0;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Not sure if you wanted the output as attached in the image below.
Output as I see
If the above is what you wanted, then I could get it done because you were missing many "RUN" and "QUIT" statements after DATA/PROC steps.
The corrected code is as below.
data dstDates; format date1 datetime18.0 date2 datetime18.0; input date1 datetime18.0 date2 datetime18.0; cards; 10MAR2019:02:00:00 03NOV2019:02:00:00 ; run; data sales; format localStandard datetime18.0 localDaylight datetime18.0; input salesDate datetime18.0; localStandard=tzoneu2s(salesDate, 'EST'); localDaylight=tzoneu2s(salesDate, 'EDT'); cards; 08MAR2019:02:00:00 12MAR2019:02:00:00 ; run; proc print data=sales; title "Sales"; format salesDate datetime18.0; run; PROC SQL; CREATE TABLE work.final1 as SELECT sales.salesDate, localStandard, localDaylight, CASE WHEN date1 IS NOT NULL then localDaylight else localStandard END AS newDate format datetime18.0 FROM sales sales LEFT JOIN dstDates dst ON (sales.localStandard >=dst.date1 AND sales.localDaylight <=dst.date2); quit; proc print data=final1; format salesDate datetime18.0; run;
Please let me know if this helps.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Apologies .. I did not solve your issue, in the above post!! Your issue is the log message (which I am getting as well) and not the fact that the program is not running.
I think the problem is in the statement following LEFT JOIN. You want to do LEFT JOIN of sales with dstDates but in the ON clause you are selecting both the common columns from sales. Also you are saying FROM sales sales.
ON (sales.localStandard >=dst.date1 AND sales.localDaylight <=dst.date2)
I am sorry, I am not a SAS expert and thus can not understand what you want to achieve there but doing an alteration helped me get rid of the log. Below is the code and the output remains same. However this may not be what you want. There is some discussion on this error here.
data dstDates; format date1 datetime18.0 date2 datetime18.0; input date1 datetime18.0 date2 datetime18.0; cards; 10MAR2019:02:00:00 03NOV2019:02:00:00 ; run; data sales; format localStandard datetime18.0 localDaylight datetime18.0; input salesDate datetime18.0; localStandard=tzoneu2s(salesDate, 'EST'); localDaylight=tzoneu2s(salesDate, 'EDT'); cards; 08MAR2019:02:00:00 12MAR2019:02:00:00 ; run; proc print data=sales; title "Sales"; format salesDate datetime18.0; run; PROC SQL; CREATE TABLE work.final1 as SELECT sales.salesDate, localStandard, localDaylight, CASE WHEN date1 IS NOT NULL then localDaylight else localStandard END AS newDate format datetime18.0 FROM sales LEFT JOIN dstDates dst ON (sales.localStandard =dst.date1 AND sales.localDaylight =dst.date2); quit; proc print data=final1; format salesDate datetime18.0; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Why did you insert extra RUN statements after the first two DATA steps? That can confuse novice users as there were two recent questions where users tried to put data step statements after the in-line data.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Tom I inserted RUN statements at the end of each DATA step because I used to think that it is a markup of a block that is run at once. I used to think that it is a good practice rather (as a beginner/novice) to close your beginning DATA and PROC step with its corresponding RUN statement but your post has made me think otherwise. Will you please point me to the recent discussion (that you mentioned in your post and the one that created confusion)? I will get to learn something new.
In the meanwhile you are right. While the output does not change, the LOG says differently. The Memory consumption and CPU time remains "almost" identical but other details are varying.
Below is the pertinent log if you omit the first two run statements. It will be followed by another log, if you keep the first two run statements.
NOTE: DATA statement used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 674.25k OS Memory 28072.00k Timestamp 24/06/2019 12:06:43 AM Step Count 49 Switch Count 2 Page Faults 0 Page Reclaims 90 Page Swaps 0 Voluntary Context Switches 11 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 264 NOTE: PROCEDURE PRINT used (Total process time): real time 0.01 seconds user cpu time 0.01 seconds system cpu time 0.00 seconds memory 2015.18k OS Memory 29096.00k Timestamp 24/06/2019 12:06:43 AM Step Count 50 Switch Count 1 Page Faults 0 Page Reclaims 226 Page Swaps 0 Voluntary Context Switches 7 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 8 NOTE: PROCEDURE SQL used (Total process time): real time 0.00 seconds user cpu time 0.01 seconds system cpu time 0.00 seconds memory 5867.87k OS Memory 34480.00k Timestamp 24/06/2019 12:06:43 AM Step Count 51 Switch Count 2 Page Faults 0 Page Reclaims 345 Page Swaps 0 Voluntary Context Switches 15 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 280 NOTE: PROCEDURE PRINT used (Total process time): real time 0.01 seconds user cpu time 0.01 seconds system cpu time 0.00 seconds memory 717.09k OS Memory 29608.00k Timestamp 24/06/2019 12:06:43 AM Step Count 52 Switch Count 0 Page Faults 0 Page Reclaims 79 Page Swaps 0 Voluntary Context Switches 0 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 0
and this is the log if you keep the first two run statements.
NOTE: DATA statement used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 669.25k OS Memory 28328.00k Timestamp 24/06/2019 12:10:35 AM Step Count 58 Switch Count 2 Page Faults 0 Page Reclaims 151 Page Swaps 0 Voluntary Context Switches 13 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 264 NOTE: DATA statement used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.01 seconds memory 675.25k OS Memory 28328.00k Timestamp 24/06/2019 12:10:35 AM Step Count 59 Switch Count 2 Page Faults 0 Page Reclaims 90 Page Swaps 0 Voluntary Context Switches 10 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 264 NOTE: PROCEDURE PRINT used (Total process time): real time 0.01 seconds user cpu time 0.01 seconds system cpu time 0.00 seconds memory 2015.65k OS Memory 29096.00k Timestamp 24/06/2019 12:10:35 AM Step Count 60 Switch Count 1 Page Faults 0 Page Reclaims 160 Page Swaps 0 Voluntary Context Switches 6 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 16 NOTE: PROCEDURE SQL used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 5867.71k OS Memory 34480.00k Timestamp 24/06/2019 12:10:35 AM Step Count 61 Switch Count 2 Page Faults 0 Page Reclaims 340 Page Swaps 0 Voluntary Context Switches 14 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 280 NOTE: PROCEDURE PRINT used (Total process time): real time 0.01 seconds user cpu time 0.01 seconds system cpu time 0.00 seconds memory 696.18k OS Memory 29608.00k Timestamp 24/06/2019 12:10:35 AM Step Count 62 Switch Count 0 Page Faults 0 Page Reclaims 74 Page Swaps 0 Voluntary Context Switches 0 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 0
Any thoughts on what is going on? Please let me know.
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Timing has nothing to do with it. It takes the compiler very little time to ignore the non-step generated by extra RUN statements.
Here is a link to a question by a users confused by the presence of the superfluous RUN statement after a DATA step.
The idea behind always ending your steps is to make it easier for the humans reading your code and not the compiler. Only in an interactive session will leaving out the termination of the steps matter because if you leave off the last termination the compiler will not run the step since it is waiting for you to finish sending it the code for the step.
The idea is to always explicitly end the step rather than just letting SAS auto detect that it has finished being specified because a new step is being requested by the presence of the next DATA or PROC statement.
But what code it is that ends a step depends on what step you are running. If you are running a data step that uses in-line data then the line with the semi-colon (or four semi-colons) that marks the end of the in-line data marks the end of the data step. For other DATA steps the RUN statement marks the end. For some procs a RUN statement marks the end, but for others a QUIT is required because it can "run" multiple groups of statements.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content