Contributor
Posts: 25

# find the unique value in each pair

Dear community members;

I have a dataset in which i have a some duplicate obs on ID variable

I want to find the obs and var combination for which there is a unque value

like A3XXX (ie OBS2 and VAR2) and B6XXXX

 ID VAR2 VAR3 VAR4 VAR5 OBS1 A1 A2 A3 A4 A5 OBS2 A1 A3XXXX A3 A4 A5 OBS3 B1 B2 B3 B4 B5 OBS4 B1 B2 B3 B4 B6XXX OBS5 C1 C2 C3 C4 C5 OBS6 C1 C2 C3 C4 C5 OBS7 D1 D2 D3 D4 D5 OBS8 D1 D2 D3 D4 D5 OBS9 E1 E2 E3 E4 E5 OBS10 E1 E2 E3 E4 E5

how should i approach this problem any suggestion?

thanks a ton in advance :-)

Posts: 5,541

## Re: find the unique value in each pair

Here is a way to spot non uniform replicates:

``````data have;
length i ID VAR2 VAR3 VAR4 VAR5 \$8;
input i ID VAR2 VAR3 VAR4 VAR5;
datalines;
OBS1	A1	A2	A3	A4	A5
OBS2	A1	A3XXXX	A3	A4	A5
OBS3	B1	B2	B3	B4	B5
OBS4	B1	B2	B3	B4	B6XXX
OBS5	C1	C2	C3	C4	C5
OBS6	C1	C2	C3	C4	C5
OBS7	D1	D2	D3	D4	D5
OBS8	D1	D2	D3	D4	D5
OBS9	E1	E2	E3	E4	E5
OBS10	E1	E2	E3	E4	E5
;

proc sql;
create table dups as
select *
from have
group by ID
having count(distinct catx("@", VAR2, VAR3, VAR4, VAR5)) > 1
order by i;
quit;

proc print data=dups; by id; id id; run;``````
PG
Super Contributor
Posts: 271

## Re: find the unique value in each pair

Please elaborate and share what do you want in output dataset.

Contributor
Posts: 25

## Re: find the unique value in each pair

Hi Rahul the output I want is like mentioned here

 VAR1 VAR2 VAR3 VAR4 VAR5 OBS1 A2 OBS2 A3XXXX OBS3 B5 OBS4 B6XXX

i

Contributor
Posts: 25

## Re: find the unique value in each pair

or even as PGStat replied grouping is also good
Posts: 5,541

## Re: find the unique value in each pair

The same idea as before, but nulling out identical entries:

``````data have;
length i ID VAR2 VAR3 VAR4 VAR5 \$8;
input i ID VAR2 VAR3 VAR4 VAR5;
datalines;
OBS1	A1	A2	A3	A4	A5
OBS2	A1	A3XXXX	A3	A4	A5
OBS3	B1	B2	B3	B4	B5
OBS4	B1	B2	B3	B4	B6XXX
OBS5	C1	C2	C3	C4	C5
OBS6	C1	C2	C3	C4	C5
OBS7	D1	D2	D3	D4	D5
OBS8	D1	D2	D3	D4	D5
OBS9	E1	E2	E3	E4	E5
OBS10	E1	E2	E3	E4	E5
;

proc sql;
create table dups as
select *,
count(distinct VAR2) as nVar2,
count(distinct VAR3) as nVar3,
count(distinct VAR4) as nVar4,
count(distinct VAR5) as nVar5
from have
group by ID
having max(nVar2, nVar3, nVar4, nVar5) > 1
order by i;
quit;

data nulDups;
set dups;
array _v var2-var5;
array _n nVar2-nVar5;
do j = 1 to dim(_v);
if _n{j} = 1 then call missing(_v{j});
end;
drop j nVar:;
run;

proc print data=nulDups; by id; id id; run;
``````
PG
PROC Star
Posts: 2,370

## Re: find the unique value in each pair

Like this?

proc sql;

select ID

,ifc(count(unique VAR2) =1,' ', VAR2) as VAR2

,ifc(count(unique VAR3) =1,' ', VAR3) as VAR3

,ifc(count(unique VAR4) =1,' ', VAR4) as VAR4

,ifc(count(unique VAR5) =1,' ', VAR5) as VAR5

from HAVE

group by ID

having lengthn(cats(VAR2,VAR3,VAR4,VAR5));

quit;

ID VAR2 VAR3 VAR4 VAR5
A1 A3XXXX
A1 A2
B1       B5
B1       B6XXX
Discussion stats
• 6 replies
• 496 views
• 5 likes
• 4 in conversation