I have a dataset where some IDs have multiple observations. I want to create an indicator variable that tells me if certain variable values for all observations for an ID are the same (=1), or if any of the variable values for any of the ID's observations are different (=0). For an ID with a singular observation, I would want that to be same=1.
In the example below, I only care about a,b,c being the same. So, in the want dataset the "same" variable should be 1 for any ID where a,b,c were the same (regardless of what the other variables were), or if there was only one observation for the ID (id 2 and 4). For any IDs where any of a,b,c, were different in any observations I would want the "same" variable to be a 0 for each observation.
How could I best do this? Thank you.
data have;
input id date :yymmdd10. a b c d e f;
format date yymmdd10.;
datalines;
1 2020-01-01 0 1 1 0 0 0
1 2018-07-06 0 1 1 0 0 1
2 2015-02-15 1 2 0 1 2 3
3 2020-02-01 1 1 1 3 3 3
3 2021-07-05 1 1 0 3 3 3
3 2021-08-09 1 1 1 3 3 3
4 2020-01-30 0 2 2 1 0 0
5 2018-10-10 1 2 0 0 1 2
5 2019-10-11 1 2 0 1 2 2
;
RUN;
data want;
input id date :yymmdd10. a b c d e f same;
format date yymmdd10.;
datalines;
1 2020-01-01 0 1 1 0 0 0 1
1 2018-07-06 0 1 1 0 0 1 1
2 2015-02-15 1 2 0 1 2 3 1
3 2020-02-01 1 1 1 3 3 3 0
3 2021-07-05 1 1 0 3 3 3 0
3 2021-08-09 1 1 1 3 3 3 0
4 2020-01-30 0 2 2 1 0 0 1
5 2018-10-10 1 2 0 0 1 2 1
5 2019-10-11 1 2 0 1 2 2 1
;
RUN;
... View more