Solved: compare values based on a unique identifier

nanmz · Posted 03-03-2021 08:54 AM

Hi everyone,

I am still pretty much new to SAS and would appreciate any sort of guidance/pointers from the experts here.

I have a dataset that looks like this:

ID	A1	B1	C1
a	0.12	30	promo1
b	0.12	30	promo1
b	0.15	30	promo1
b	0.12	30	promo1
b	0.18	40	promo2
c	0.12	30	promo1
c	0.18	40	promo2
c	0.165	40	promo2
c	0.18	40	promo2

My main aim is to flag (0 or 1) for rows where if their IDs are the same and having the same values in column C1, same values in column B1 BUT having different values in column A1, to be flagged as 1. Thus, the output that I'm looking for would look something like this:

ID	A1	B1	C1	Flag
a	0.12	30	promo1	0
b	0.12	30	promo1	1
b	0.15	30	promo1	1
b	0.12	30	promo1	1
b	0.18	40	promo2	0
c	0.12	30	promo1	0
c	0.18	40	promo2	1
c	0.165	40	promo2	1
c	0.18	40	promo2	1

The "incorrect" values/rows to be flagged are of those highlighted in red. I initially wanted to transpose the column from long to wide and have the table to be unique based on their ID. However the duplicate IDs have up to 60+ rows hence I feel that it is not as efficient having hundreds of columns that I might not even use. Is there any other alternative in approaching this? Would appreciate any form of input. Thanks in advance.

FreelanceReinh · Posted 03-03-2021 10:37 AM

Hi @nanmz,

Try this to obtain your desired result:

/* Create sample data */

data have;
input ID $ A1 B1 C1 $;
cards;
a 0.12 30 promo1
b 0.12 30 promo1
b 0.15 30 promo1
b 0.12 30 promo1
b 0.18 40 promo2
c 0.12 30 promo1
c 0.18 40 promo2
c 0.165 40 promo2
c 0.18 40 promo2
;

/* Create a unique sort key */

data _tmp / view=_tmp;
set have;
_seqno=_n_;
run;

/* Flag groups of observations with inconsistent A1 values */

proc sql;
create table want(drop=_seqno) as
select *, count(distinct A1)>1 as Flag
from _tmp
group by ID, B1, C1
order by _seqno;
drop view _tmp;
quit;

View solution in original post

AMSAS · Posted 03-03-2021 09:22 AM

Hi,

This can be done with FIRST. and LAST. DATA Step Variables

Although I'm not clear on what you are attempting to achieve. I don't understand why flag=1 in rows 2 & 7

ID	A1	B1	C1	Flag
a	0.12	30	promo1	0
b	0.12	30	promo1	1
b	0.15	30	promo1	1
b	0.12	30	promo1	1
b	0.18	40	promo2	0
c	0.12	30	promo1	0
c	0.18	40	promo2	1
c	0.165	40	promo2	1
c	0.18	40	promo2	1

nanmz · Posted 03-05-2021 05:47 AM

Hi,

Thanks for your reply. I wanted to flag rows 2 and 7 as well because they belong in the same group (i.e. rows 2,3, and 4 are in the same group, rows 7, 8 and 9 are in the same group as well). My goal is to find inaccuracies in column A1 in terms of their value as I don't have an external lookup table that would be able to map column A1, B1 and C1 respectively.

Shmuel · Posted 03-03-2021 10:26 AM

You input test tables are each 9 observations.

By comparing each possible pair of the two tables observations the output contains more observations then you posted as wanted:

data table1;
  infile cards dlm='0920'x truncover;
  input ID $ A1 B1 c1 $;
cards;
a	0.12	30	promo1
b	0.12	30	promo1
b	0.15	30	promo1
b	0.12	30	promo1
b	0.18	40	promo2
c	0.12	30	promo1
c	0.18	40	promo2
c	0.165	40	promo2
c	0.18	40	promo2
;run;
data table2;
  infile cards dlm='0920'x truncover;
  input ID $ A1 B1 c1 $;
cards;
a	0.12	30	promo1	
b	0.12	30	promo1	
b	0.15	30	promo1	
b	0.12	30	promo1	
b	0.18	40	promo2	
c	0.12	30	promo1	
c	0.18	40	promo2	
c	0.165	40	promo2	
c	0.18	40	promo2  
;run;
proc sql;
  create table want as
  select a.*,
  case when (a.a1 ne b.a1) 
      then 1 else 0 end as flag 
  from table1 as a
  left join table2 as b
  on (a.ID=b.ID and a.c1=b.c1 and a.b1=b.b1);
quit;

AS for the first compare method mentioned:

FreelanceReinh · Posted 03-03-2021 10:37 AM

Hi @nanmz,

Try this to obtain your desired result:

/* Create sample data */

data have;
input ID $ A1 B1 C1 $;
cards;
a 0.12 30 promo1
b 0.12 30 promo1
b 0.15 30 promo1
b 0.12 30 promo1
b 0.18 40 promo2
c 0.12 30 promo1
c 0.18 40 promo2
c 0.165 40 promo2
c 0.18 40 promo2
;

/* Create a unique sort key */

data _tmp / view=_tmp;
set have;
_seqno=_n_;
run;

/* Flag groups of observations with inconsistent A1 values */

proc sql;
create table want(drop=_seqno) as
select *, count(distinct A1)>1 as Flag
from _tmp
group by ID, B1, C1
order by _seqno;
drop view _tmp;
quit;

nanmz · Posted 03-05-2021 05:42 AM

Thanks for your reply @FreelanceReinh ! this worked out great for me. Just a follow up question that I'm still trying to wrap my head around:

1) how does creating a unique sort key (_seqno) prior to the proc sql help me achieve the desired output?

Thanks again!

FreelanceReinh · Posted 03-05-2021 06:05 AM

@nanmz wrote:

Thanks for your reply @FreelanceReinh ! this worked out great for me. Just a follow up question that I'm still trying to wrap my head around:

1) how does creating a unique sort key (_seqno) prior to the proc sql help me achieve the desired output?

You're welcome. I introduced the sort key just to ensure that the sort order of WANT is the same as that of HAVE. If there's another suitable sort key in your real data, you can use it instead of _seqno (and omit the intermediate DATA step). Also, if you don't care about the sort order of WANT, you can omit everything related to _seqno as well and thus simplify the code and reduce run time.

compare values based on a unique identifier

Re: compare values based on a unique identifier

Re: compare values based on a unique identifier

Re: compare values based on a unique identifier

Re: compare values based on a unique identifier

Re: compare values based on a unique identifier

Re: compare values based on a unique identifier

Re: compare values based on a unique identifier

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!