BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
sasgorilla
Pyrite | Level 9

I have a dataset where some IDs have multiple observations. I want to create an indicator variable that tells me if certain variable values for all observations for an ID are the same (=1), or if any of the variable values for any of the ID's observations are different (=0). For an ID with a singular observation, I would want that to be same=1. 

 

In the example below, I only care about a,b,c being the same. So, in the want dataset the "same" variable should be 1 for any ID where a,b,c were the same (regardless of what the other variables were), or if there was only one observation for the ID (id 2 and 4). For any IDs where any of a,b,c, were different in any observations I would want the "same" variable to be a 0 for each observation. 

 

How could I best do this? Thank you. 

 

data have;
input id date :yymmdd10.  a b c d e f;
format date yymmdd10.;
datalines;
1 2020-01-01 0 1 1 0 0 0
1 2018-07-06 0 1 1 0 0 1
2 2015-02-15 1 2 0 1 2 3
3 2020-02-01 1 1 1 3 3 3
3 2021-07-05 1 1 0 3 3 3
3 2021-08-09 1 1 1 3 3 3 
4 2020-01-30 0 2 2 1 0 0
5 2018-10-10 1 2 0 0 1 2
5 2019-10-11 1 2 0 1 2 2 
;
RUN;

data want;
input id date :yymmdd10.  a b c d e f same;
format date yymmdd10.;
datalines;
1 2020-01-01 0 1 1 0 0 0 1
1 2018-07-06 0 1 1 0 0 1 1
2 2015-02-15 1 2 0 1 2 3 1
3 2020-02-01 1 1 1 3 3 3 0
3 2021-07-05 1 1 0 3 3 3 0
3 2021-08-09 1 1 1 3 3 3 0  
4 2020-01-30 0 2 2 1 0 0 1
5 2018-10-10 1 2 0 0 1 2 1
5 2019-10-11 1 2 0 1 2 2 1
;
RUN;
1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have;
input id date :yymmdd10.  a b c d e f;
format date yymmdd10.;
datalines;
1 2020-01-01 0 1 1 0 0 0
1 2018-07-06 0 1 1 0 0 1
2 2015-02-15 1 2 0 1 2 3
3 2020-02-01 1 1 1 3 3 3
3 2021-07-05 1 1 0 3 3 3
3 2021-08-09 1 1 1 3 3 3 
4 2020-01-30 0 2 2 1 0 0
5 2018-10-10 1 2 0 0 1 2
5 2019-10-11 1 2 0 1 2 2 
;
RUN;

proc sql;
create table want as
select *,(range(a)+range(b)+range(c))=0 as same
 from have
  group by id;
quit;

View solution in original post

7 REPLIES 7
Ksharp
Super User
data have;
input id date :yymmdd10.  a b c d e f;
format date yymmdd10.;
datalines;
1 2020-01-01 0 1 1 0 0 0
1 2018-07-06 0 1 1 0 0 1
2 2015-02-15 1 2 0 1 2 3
3 2020-02-01 1 1 1 3 3 3
3 2021-07-05 1 1 0 3 3 3
3 2021-08-09 1 1 1 3 3 3 
4 2020-01-30 0 2 2 1 0 0
5 2018-10-10 1 2 0 0 1 2
5 2019-10-11 1 2 0 1 2 2 
;
RUN;

proc sql;
create table want as
select *,(range(a)+range(b)+range(c))=0 as same
 from have
  group by id;
quit;
sasgorilla
Pyrite | Level 9

Thanks, ksharp. This worked perfectly!

sasgorilla
Pyrite | Level 9

Ah, so I found one problem in this dataset. If an id has any missing values for any of the variables of interest it appears this results in same=0 as well.

 

How can I modify the proc sql code so even same missing values are same=1? In other words, as long as the same missingness exists across multiple observations for that id, and all other variables are the same, I would want same=1. 

 

data have;
input id date :yymmdd10.  a b c d e f;
format date yymmdd10.;
datalines;
1 2020-01-01 0 1 . 0 0 0
1 2018-07-06 0 1 . 0 0 1
2 2015-02-15 1 2 0 1 2 3
3 2020-02-01 1 1 1 3 3 3
3 2021-07-05 1 1 0 3 3 3
3 2021-08-09 1 1 1 3 3 3 
4 2020-01-30 0 2 . 1 0 0
5 2018-10-10 1 2 0 0 1 2
5 2019-10-11 1 2 0 1 2 2 
;
RUN;

data want;
input id date :yymmdd10.  a b c d e f same;
format date yymmdd10.;
datalines;
1 2020-01-01 0 1 . 0 0 0 1
1 2018-07-06 0 1 . 0 0 1 1
2 2015-02-15 1 2 0 1 2 3 1
3 2020-02-01 1 1 1 3 3 3 0
3 2021-07-05 1 1 0 3 3 3 0
3 2021-08-09 1 1 1 3 3 3 0  
4 2020-01-30 0 2 . 1 0 0 1
5 2018-10-10 1 2 0 0 1 2 1
5 2019-10-11 1 2 0 1 2 2 1
;
RUN;

 

 

Tom
Super User Tom
Super User

In that case it might be better to use the NLEVEL option of PROC FREQ instead.  The only drawback would be if the variable could have so many distinct levels that PROC FREQ runs out of memory.  But that type of variable should probably not be tested in this way anyway.

ods select none;
ods output nlevels=nlevels;
proc freq nlevels data=have ;
  by id ;
  tables a b c / noprint;
run;
ods select all;

proc transpose data=nlevels out=multilevel;
  by id;
  where nlevels ne 1;
  id tablevar;
  var nlevels;
run;

data want;
  merge have multilevel(in=in1 keep=id);
  by id;
  same = not in1;
run;
Ksharp
Super User

[EDIT]

OK. Using SUM() function to replace with + operator, SUM() would take care of missing value.

 

data have;
input id date :yymmdd10.  a b c d e f;
format date yymmdd10.;
datalines;
1 2020-01-01 0 1 . 0 0 0
1 2018-07-06 0 1 . 0 0 1
2 2015-02-15 1 2 0 1 2 3
3 2020-02-01 1 1 1 3 3 3
3 2021-07-05 1 1 0 3 3 3
3 2021-08-09 1 1 1 3 3 3 
4 2020-01-30 0 2 . 1 0 0
5 2018-10-10 1 2 0 0 1 2
5 2019-10-11 1 2 0 1 2 2 
;
RUN;
proc sql;
create table want as
select *,sum(range(a),range(b),range(c))=0 as same
 from have
  group by id;
quit;

 

And if you also need to consider about missing and zero or other non-mssing values (id=0), check this:

data have;
input id date :yymmdd10.  a b c d e f;
format date yymmdd10.;
datalines;
0 2020-01-01 0 1 . 0 0 0
0 2018-07-06 0 1 0 0 0 1
0 2018-07-06 0 1 0 0 0 1
1 2020-01-01 0 1 . 0 0 0
1 2018-07-06 0 1 . 0 0 1
2 2015-02-15 1 2 0 1 2 3
3 2020-02-01 1 1 1 3 3 3
3 2021-07-05 1 1 0 3 3 3
3 2021-08-09 1 1 1 3 3 3 
4 2020-01-30 0 2 . 1 0 0
5 2018-10-10 1 2 0 0 1 2
5 2019-10-11 1 2 0 1 2 2 
;
RUN;
proc sql;
create table want as
select *,
sum(
count(distinct a)+(nmiss(a) ne 0),
count(distinct b)+(nmiss(b) ne 0),
count(distinct c)+(nmiss(c) ne 0)
)=3 as same
 from have
  group by id;
quit;

 

 

Tom
Super User Tom
Super User

Use PROC SUMMARY to find the RANGE of the variables per ID group.

Then you can test it all of the ranges are zero.

If you want you an remerge the results back by ID.

data have;
  input id date :yymmdd. a b c d e f;
  format date yymmdd10.;
datalines;
1 2020-01-01 0 1 1 0 0 0
1 2018-07-06 0 1 1 0 0 1
2 2015-02-15 1 2 0 1 2 3
3 2020-02-01 1 1 1 3 3 3
3 2021-07-05 1 1 0 3 3 3
3 2021-08-09 1 1 1 3 3 3 
4 2020-01-30 0 2 2 1 0 0
5 2018-10-10 1 2 0 0 1 2
5 2019-10-11 1 2 0 1 2 2 
;

proc summary data=have;
  by id;
  var a b c ;
  output out=range(drop=_freq_ _type_) range=;
run;

data range;
  set range;
  same = 0=min(of a b c)=max(of a b c);
run;

data want;
  merge have range(keep=id same);
  by id;
run;
sasgorilla
Pyrite | Level 9

Thanks, Tom. I'm realizing proc summary can be used to do a lot of things!

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 997 views
  • 4 likes
  • 3 in conversation