Hello, below is an example of my two datasets:
dataset A:
id | env | water | transport | production |
1 | 0 | 0 | 0 | 0 |
2 | 1 | 0 | 1 | 0 |
3 | 0 | 0 | 0 | 1 |
4 | 0 | 0 | 1 | 0 |
dataset B:
id | env | water | transport | production |
1 | 1 | 0 | 0 | 0 |
2 | 0 | 0 | 0 | 0 |
3 | 0 | 0 | 0 | 0 |
4 | 1 | 0 | 1 | 1 |
Just a background, why two datasets have the same column names is because I split my original data into two subsamples and let them run through the same codes to achieve this.
I want that as long as there is '1' in one of the datasets, the combined table should reflect '1'.
Below is my desired outcome:
id | env | water | transport | production |
1 | 1 | 0 | 0 | 0 |
2 | 1 | 0 | 1 | 0 |
3 | 0 | 0 | 0 | 1 |
4 | 1 | 0 | 1 | 1 |
I know there are probably time-consuming ways to achieve the desired outcome by renaming one of the table's column names, merging by id, summing the columns, restricting to greater than 1, and so on. However, I want to know if there is a fast and non-tedious way for operation as the reason why I initially split is that the original dataset was too huge. Hopefully, the expert can provide a more efficient method. Thank you.
Stack the two datasets, then use PROC MEANS:
data stacked,
set a b;
by id;
run;
proc means data=stacked;
by id;
var env water transport production;
output
out=want (drop=_type _freq_)
max()=
;
run;
Stack the two datasets, then use PROC MEANS:
data stacked,
set a b;
by id;
run;
proc means data=stacked;
by id;
var env water transport production;
output
out=want (drop=_type _freq_)
max()=
;
run;
A merge could be used, but requires renaming the variables of one dataset:
data want;
merge have_a have_b(rename=(env= b_env water=b_water transport = b_transport production = b_production));
by id;
env = env or b_env;
water = water or b_water;
transport = transport or b_transport;
production = production or b_production;
drop b_:;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.