05-03-2016 08:10 AM
I am trying to merge 2 SAS datasets on a column, and also specifying some condition on a common column. But, when I specify the column name, it is being applied in first dataset's column but I want to apply it on second dataset's column. In PROC SQL, this is easily being accomplished using a.column_name or b.column_name. But, how can I do the same in Data step?
Thanks in advance!
05-03-2016 08:25 AM
If you merge data sets and they have variables with the same name, the one on the right overwrites the other so you only end up with one variable in the final dataset. You'll need to explicitly rename the columns ahead of time or in the merge statement to avoid errors.
05-03-2016 08:40 AM
Post your code, and some test data (in the form of a datastep), its hard to tell what your trying to do from the text, is it a where clause? You can provide options and refer to datasets using alias:
data want; merge dataseta (in=a) datasetb (in=b); by <idvars>; if a and ...; run;
05-03-2016 08:49 AM
There are many ways to approach this, depending on what you want to do. First, Reeza's suggestion:
merge a b (rename=(common_column=common_column_from_b));
You can also apply WHERE= as a data set option without renaming:
merge a b (where=(common_column > 5));
When you specify the column name, it will always be applied to the second data set's value (at least for the first observation per ID). The rules are more complex after that first observation. Since you are posting that the condition is applied based on the first data set's value, this indicates that you have a many to one merge (with many observations in the first data set, to one observation in the second data set). So you could also code:
merge a b;
if first.id then first_value = common_column;
if /* some condition based on first_value */;
Give us a little more context ... something will definitely be possible.