Hello,
without example data that replicates your issue, it is difficult to answer. Can you identify the variables and observations where your coalesce problem occur and create a minimal example in the form of a datastep ?
I wouldn't call what is done "nested joins" but consecutive joins meaning you sequentially append new columns at the right of your dataset . Those are left joins meaning that the contents of data1 define which observations will be in the output dataset.
The query creates in memory a "big" dataset
[D1 | matching obs of D2 | matching Obs of D3 | matching Obs of D4 ]
which is then filtered by the where clauses.
"dataset foo as a" defines an alias "a" for the foo dataset. When there exist some columns with common names in several datasets
the ambiguity is removed by prefixing the variable name by the source dataset alias :a.my_var, b.my_var, ...
If you want a.my_var and b.my_var to be in the output dataset, one of them has to be renamed :
SELECT a.my_var, b.my_var AS my_var_of_b, ...
data3(where=(some filter))
Here the filter is applied before joining the resulting dataset
where some conditions for a and/or b
This filter is applied after the joins. I don't think there is a need to have several such statements. A unique where statement at the end of the query with all conditions should be equivalent.
coalesce(b.var,c.var,d.var,e.var) as newvar
creates a new variable newvar as the first non-missing value of variable var in the datasets data2, data3, data4.
Note : there is no dataset alias e in your query.
... View more