Hi everyone,
I am still pretty much new to SAS and would appreciate any sort of guidance/pointers from the experts here.
I have a dataset that looks like this:
ID | A1 | B1 | C1 |
a | 0.12 | 30 | promo1 |
b | 0.12 | 30 | promo1 |
b | 0.15 | 30 | promo1 |
b | 0.12 | 30 | promo1 |
b | 0.18 | 40 | promo2 |
c | 0.12 | 30 | promo1 |
c | 0.18 | 40 | promo2 |
c | 0.165 | 40 | promo2 |
c | 0.18 | 40 | promo2 |
My main aim is to flag (0 or 1) for rows where if their IDs are the same and having the same values in column C1, same values in column B1 BUT having different values in column A1, to be flagged as 1. Thus, the output that I'm looking for would look something like this:
ID | A1 | B1 | C1 | Flag |
a | 0.12 | 30 | promo1 | 0 |
b | 0.12 | 30 | promo1 | 1 |
b | 0.15 | 30 | promo1 | 1 |
b | 0.12 | 30 | promo1 | 1 |
b | 0.18 | 40 | promo2 | 0 |
c | 0.12 | 30 | promo1 | 0 |
c | 0.18 | 40 | promo2 | 1 |
c | 0.165 | 40 | promo2 | 1 |
c | 0.18 | 40 | promo2 | 1 |
The "incorrect" values/rows to be flagged are of those highlighted in red. I initially wanted to transpose the column from long to wide and have the table to be unique based on their ID. However the duplicate IDs have up to 60+ rows hence I feel that it is not as efficient having hundreds of columns that I might not even use. Is there any other alternative in approaching this? Would appreciate any form of input. Thanks in advance.
Hi @nanmz,
Try this to obtain your desired result:
/* Create sample data */
data have;
input ID $ A1 B1 C1 $;
cards;
a 0.12 30 promo1
b 0.12 30 promo1
b 0.15 30 promo1
b 0.12 30 promo1
b 0.18 40 promo2
c 0.12 30 promo1
c 0.18 40 promo2
c 0.165 40 promo2
c 0.18 40 promo2
;
/* Create a unique sort key */
data _tmp / view=_tmp;
set have;
_seqno=_n_;
run;
/* Flag groups of observations with inconsistent A1 values */
proc sql;
create table want(drop=_seqno) as
select *, count(distinct A1)>1 as Flag
from _tmp
group by ID, B1, C1
order by _seqno;
drop view _tmp;
quit;
Hi,
This can be done with FIRST. and LAST. DATA Step Variables
Although I'm not clear on what you are attempting to achieve. I don't understand why flag=1 in rows 2 & 7
ID | A1 | B1 | C1 | Flag |
a | 0.12 | 30 | promo1 | 0 |
b | 0.12 | 30 | promo1 | 1 |
b | 0.15 | 30 | promo1 | 1 |
b | 0.12 | 30 | promo1 | 1 |
b | 0.18 | 40 | promo2 | 0 |
c | 0.12 | 30 | promo1 | 0 |
c | 0.18 | 40 | promo2 | 1 |
c | 0.165 | 40 | promo2 | 1 |
c | 0.18 | 40 |
promo2 |
1 |
Hi,
Thanks for your reply. I wanted to flag rows 2 and 7 as well because they belong in the same group (i.e. rows 2,3, and 4 are in the same group, rows 7, 8 and 9 are in the same group as well). My goal is to find inaccuracies in column A1 in terms of their value as I don't have an external lookup table that would be able to map column A1, B1 and C1 respectively.
You input test tables are each 9 observations.
By comparing each possible pair of the two tables observations the output contains more observations then you posted as wanted:
data table1;
infile cards dlm='0920'x truncover;
input ID $ A1 B1 c1 $;
cards;
a 0.12 30 promo1
b 0.12 30 promo1
b 0.15 30 promo1
b 0.12 30 promo1
b 0.18 40 promo2
c 0.12 30 promo1
c 0.18 40 promo2
c 0.165 40 promo2
c 0.18 40 promo2
;run;
data table2;
infile cards dlm='0920'x truncover;
input ID $ A1 B1 c1 $;
cards;
a 0.12 30 promo1
b 0.12 30 promo1
b 0.15 30 promo1
b 0.12 30 promo1
b 0.18 40 promo2
c 0.12 30 promo1
c 0.18 40 promo2
c 0.165 40 promo2
c 0.18 40 promo2
;run;
proc sql;
create table want as
select a.*,
case when (a.a1 ne b.a1)
then 1 else 0 end as flag
from table1 as a
left join table2 as b
on (a.ID=b.ID and a.c1=b.c1 and a.b1=b.b1);
quit;
AS for the first compare method mentioned:
Hi @nanmz,
Try this to obtain your desired result:
/* Create sample data */
data have;
input ID $ A1 B1 C1 $;
cards;
a 0.12 30 promo1
b 0.12 30 promo1
b 0.15 30 promo1
b 0.12 30 promo1
b 0.18 40 promo2
c 0.12 30 promo1
c 0.18 40 promo2
c 0.165 40 promo2
c 0.18 40 promo2
;
/* Create a unique sort key */
data _tmp / view=_tmp;
set have;
_seqno=_n_;
run;
/* Flag groups of observations with inconsistent A1 values */
proc sql;
create table want(drop=_seqno) as
select *, count(distinct A1)>1 as Flag
from _tmp
group by ID, B1, C1
order by _seqno;
drop view _tmp;
quit;
Thanks for your reply @FreelanceReinh ! this worked out great for me. Just a follow up question that I'm still trying to wrap my head around:
1) how does creating a unique sort key (_seqno) prior to the proc sql help me achieve the desired output?
Thanks again!
@nanmz wrote:
Thanks for your reply @FreelanceReinh ! this worked out great for me. Just a follow up question that I'm still trying to wrap my head around:
1) how does creating a unique sort key (_seqno) prior to the proc sql help me achieve the desired output?
You're welcome. I introduced the sort key just to ensure that the sort order of WANT is the same as that of HAVE. If there's another suitable sort key in your real data, you can use it instead of _seqno (and omit the intermediate DATA step). Also, if you don't care about the sort order of WANT, you can omit everything related to _seqno as well and thus simplify the code and reduce run time.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.