Hello, below is an example of my two datasets:
dataset A:
| id | env | water | transport | production |
| 1 | 0 | 0 | 0 | 0 |
| 2 | 1 | 0 | 1 | 0 |
| 3 | 0 | 0 | 0 | 1 |
| 4 | 0 | 0 | 1 | 0 |
dataset B:
| id | env | water | transport | production |
| 1 | 1 | 0 | 0 | 0 |
| 2 | 0 | 0 | 0 | 0 |
| 3 | 0 | 0 | 0 | 0 |
| 4 | 1 | 0 | 1 | 1 |
Just a background, why two datasets have the same column names is because I split my original data into two subsamples and let them run through the same codes to achieve this.
I want that as long as there is '1' in one of the datasets, the combined table should reflect '1'.
Below is my desired outcome:
| id | env | water | transport | production |
| 1 | 1 | 0 | 0 | 0 |
| 2 | 1 | 0 | 1 | 0 |
| 3 | 0 | 0 | 0 | 1 |
| 4 | 1 | 0 | 1 | 1 |
I know there are probably time-consuming ways to achieve the desired outcome by renaming one of the table's column names, merging by id, summing the columns, restricting to greater than 1, and so on. However, I want to know if there is a fast and non-tedious way for operation as the reason why I initially split is that the original dataset was too huge. Hopefully, the expert can provide a more efficient method. Thank you.
Stack the two datasets, then use PROC MEANS:
data stacked,
set a b;
by id;
run;
proc means data=stacked;
by id;
var env water transport production;
output
out=want (drop=_type _freq_)
max()=
;
run;
Stack the two datasets, then use PROC MEANS:
data stacked,
set a b;
by id;
run;
proc means data=stacked;
by id;
var env water transport production;
output
out=want (drop=_type _freq_)
max()=
;
run;
A merge could be used, but requires renaming the variables of one dataset:
data want;
merge have_a have_b(rename=(env= b_env water=b_water transport = b_transport production = b_production));
by id;
env = env or b_env;
water = water or b_water;
transport = transport or b_transport;
production = production or b_production;
drop b_:;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.