- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Everyone,
I merge target file with tic | date | clock_time to ALL with tic| date| clock_time| VOL on tic date clock_time. (of course for this illustration purpose)
It is so weird that my simple SQL merge below create missing value for VOL for some records while VOL has value in the ALL file.
I am not sure if it is because of data issue or anything else, so I upload both source files and the result file as well as my code so you can see.
Many thanks,
HHC
proc import datafile='E:\target.xlsx' replace out=target;run;
proc sql;
create table output as select a.*, vol from target as a left join ALL b
on a.tic=b.tic and a.date=b.date and a.clock_time=b.clock_time
order by tic, date, clock_time;
quit;
data missing; set output;
if vol=.;run;
Print out of missing file. These rows have VOL data in the ALL dataset
Obs | tic | clock_time | date | vol |
---|---|---|---|---|
1 | A | 14:33:00 | 14NOV2023 | . |
2 | A | 14:34:00 | 14NOV2023 | . |
3 | A | 14:35:00 | 14NOV2023 | . |
4 | A | 14:36:00 | 14NOV2023 | . |
5 | A | 14:37:00 | 14NOV2023 | . |
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
However, if you open the target file, you can see it has 14:33:00.
There is no millisecond shown. But when I use the round function to round everything to minute, the SQL works as expected. Still very weird.
There is no milliseconds shown because of the format used to print the values. They can of course still exist as part of the internal value (stored in a numeric variable as count of seconds since midnight).
Also your Excel source uses potentially a format that doesn't surface fractional seconds. If you export your Excel to a .csv then you should see the values there.
Below sample code to illustrate what I'm talking about.
data work.all;
tic='A';
date='14NOV2023'd;
clock_time='14:33:00't;
format date date9. clock_time time.;
vol=123;
run;
data work.target;
tic='A';
date='14NOV2023'd;
clock_time='14:33:00't;
format date date9. clock_time time.;
output;
clock_time='14:33:00.0000001't;
output;
clock_time='14:33:00.999't;
output;
run;
proc print data=work.all;run;
proc sql;
select
tic
,date
,clock_time format=time.
,clock_time format=time16.7
,clock_time format=best32.
from target
;
quit;
proc sql;
select
a.tic
,a.date
,round(a.clock_time) as clock_time format=time8.
,b.vol
from work.target a left join work.all b
on a.tic=b.tic and a.date=b.date and round(a.clock_time)=round(b.clock_time)
;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It would be easier to help you, if you could post the data used. No excel-files, of course, but data steps using datalines. You have checked the log for suspicious messages?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You left join table ALL that contains your variable Vol to table Target. It appears that for the keys with a missing Vol there is no matching row in your table ALL.
To test the theory run below code. It should create table test with zero rows.
data test;
set all;
if tic='A' and date='14NOV2023'd and clock_time='14:33:00't;
run;
And to ensure that the row exists with this exact key (like no decimals for date and time) also cross-check for table Target. Here you should get a row selected.
data test_2;
set target;
if tic='A' and date='14NOV2023'd and clock_time='14:33:00't;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I see what you meant.
However, if you open the target file, you can see it has 14:33:00.
There is no millisecond shown. But when I use the round function to round everything to minute, the SQL works as expected. Still very weird.
data target; set target;
clock_time = round(clock_time,60);
data all; set all;
clock_time = round(clock_time,60);
data test_2;
set target;
if tic='A' and date='14NOV2023'd and '14:32:00't<clock_time<'14:34:00't;
run;
Obs | tic | clock_time | date |
---|---|---|---|
1 | A | 14:33:00 | 14NOV2023 |
2 | A | 14:34:00 | 14NOV2023 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The fact that fractions of seconds are not displayed by the assigned format does not mean they're not there.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Repeat until nauseated: The formatted display value is not the entire value of the underlying variable.
Numeric formats as supplied by SAS typically "round" displayed values to fit in a specified or default width.
If the value is 0.0000000001 and the format is Best12. (default for unassigned numeric variables) you will typically see a displayed value of 1E-10.
Date formats typically ignore decimal portions of the value as at best that would represent a fraction of a day and is not in the rules for display.
Time or datetime will display fractions of a second in as many decimal points as you specify with the format. If there is no such in the format then the fractional seconds are not displayed. And may well be rounded to match specified width.
Custom formats can do some even more amazing things to the values but the match for Join or By values in data step still use the value of the variable, not the formatted value unless you go to some extra steps to force that behavior.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
However, if you open the target file, you can see it has 14:33:00.
There is no millisecond shown. But when I use the round function to round everything to minute, the SQL works as expected. Still very weird.
There is no milliseconds shown because of the format used to print the values. They can of course still exist as part of the internal value (stored in a numeric variable as count of seconds since midnight).
Also your Excel source uses potentially a format that doesn't surface fractional seconds. If you export your Excel to a .csv then you should see the values there.
Below sample code to illustrate what I'm talking about.
data work.all;
tic='A';
date='14NOV2023'd;
clock_time='14:33:00't;
format date date9. clock_time time.;
vol=123;
run;
data work.target;
tic='A';
date='14NOV2023'd;
clock_time='14:33:00't;
format date date9. clock_time time.;
output;
clock_time='14:33:00.0000001't;
output;
clock_time='14:33:00.999't;
output;
run;
proc print data=work.all;run;
proc sql;
select
tic
,date
,clock_time format=time.
,clock_time format=time16.7
,clock_time format=best32.
from target
;
quit;
proc sql;
select
a.tic
,a.date
,round(a.clock_time) as clock_time format=time8.
,b.vol
from work.target a left join work.all b
on a.tic=b.tic and a.date=b.date and round(a.clock_time)=round(b.clock_time)
;
quit;