got a little problem here and would be glad to get help.
Using a User Written Code in DI Studio I am trying to UNION 2 datasets to filter out duplicate rows.
Dataset a has 1 column with informat 12.7/outformat 12.7, so has dataset b.
dataset a 12.1234594999
dataset b 12.1234595
The value in dataset a is displayed in DI Studio exactly as value b (due to informat/outformat), but once I do not use the usual table view of DI studio, it has all the decimals as written above. So once I do this...
SELECT * from a UNION SELECT * from b
...the values are identified as NOT duplicate.
So there are 2 questions:
- Why does the UNION neglect the in/outformat (do I need to specify the format in the UNION again?)
- How do I round this (round, roundz?) to fit value b so I can round it before the UNION
Suggest you reply to your post with SAS log output showing code and resulting values of concern. Also, for your own investigation, a SAS CONTENTS listing may reveal differences in your two files - when SAS files are concatenated / merged, there are specific rules about how like-named SAS variables are treated when it comes to LENGTH, FORMAT, INFORMAT attributes.
Forget about formats when it comes to comparisons. Formats are used only for WRITING data - but calculations/comparisons are always done with the full precision.
As this is user written code you can't expect DI Studio to do anything for you. And even if it wasn't user written: DI Studio does nothing else than generate SAS code - so look at the generated SAS code if you want to know what's going on.
Seems as rounding the values before the UNION would be a good idea.
Why not using a round() function in the select part of the SQL statement?
May be that could even be implemented in a standard SQL transform (I would have to try it but I expect it to be possible).
Just to show that round() works:
do i=1 to 5;
P.S: I think it's round() you need. Have a look at the SASOnlinedoc to get the exact syntax for the precision you need (it will be something like:
P.P.S: The question is if the values with lower precision are rounded or truncated. You would want to do the same to the values with higher precision.
If it's truncation: Use either a put/input combination or multiply by 10 power the number of digits behind the comma, use int() on the result, divide by 10 power the number of digits behind the comma.