Re: find the unique value in each pair

makarand · Posted 08-03-2016 12:59 PM

Dear community members;

I have a dataset in which i have a some duplicate obs on ID variable

I want to find the obs and var combination for which there is a unque value

like A3XXX (ie OBS2 and VAR2) and B6XXXX

	ID	VAR2	VAR3	VAR4	VAR5
OBS1	A1	A2	A3	A4	A5
OBS2	A1	A3XXXX	A3	A4	A5
OBS3	B1	B2	B3	B4	B5
OBS4	B1	B2	B3	B4	B6XXX
OBS5	C1	C2	C3	C4	C5
OBS6	C1	C2	C3	C4	C5
OBS7	D1	D2	D3	D4	D5
OBS8	D1	D2	D3	D4	D5
OBS9	E1	E2	E3	E4	E5
OBS10	E1	E2	E3	E4	E5

how should i approach this problem any suggestion?

thanks a ton in advance 🙂

PGStats · Posted 08-03-2016 01:16 PM

Here is a way to spot non uniform replicates:

data have;
length i ID VAR2 VAR3 VAR4 VAR5 $8;
input i ID VAR2 VAR3 VAR4 VAR5;
datalines;
OBS1	A1	A2	A3	A4	A5
OBS2	A1	A3XXXX	A3	A4	A5
OBS3	B1	B2	B3	B4	B5
OBS4	B1	B2	B3	B4	B6XXX
OBS5	C1	C2	C3	C4	C5
OBS6	C1	C2	C3	C4	C5
OBS7	D1	D2	D3	D4	D5
OBS8	D1	D2	D3	D4	D5
OBS9	E1	E2	E3	E4	E5
OBS10	E1	E2	E3	E4	E5
;

proc sql;
create table dups as 
select * 
from have
group by ID
having count(distinct catx("@", VAR2, VAR3, VAR4, VAR5)) > 1
order by i;
quit;

proc print data=dups; by id; id id; run;

PG

RahulG · Posted 08-03-2016 01:18 PM

Please elaborate and share what do you want in output dataset.

makarand · Posted 08-04-2016 01:36 AM

Hi Rahul the output I want is like mentioned here

	VAR1	VAR2	VAR3	VAR4	VAR5
OBS1		A2
OBS2		A3XXXX
OBS3					B5
OBS4					B6XXX

i

makarand · Posted 08-04-2016 01:37 AM

or even as PGStat replied grouping is also good

PGStats · Posted 08-04-2016 10:22 PM

The same idea as before, but nulling out identical entries:

data have;
length i ID VAR2 VAR3 VAR4 VAR5 $8;
input i ID VAR2 VAR3 VAR4 VAR5;
datalines;
OBS1	A1	A2	A3	A4	A5
OBS2	A1	A3XXXX	A3	A4	A5
OBS3	B1	B2	B3	B4	B5
OBS4	B1	B2	B3	B4	B6XXX
OBS5	C1	C2	C3	C4	C5
OBS6	C1	C2	C3	C4	C5
OBS7	D1	D2	D3	D4	D5
OBS8	D1	D2	D3	D4	D5
OBS9	E1	E2	E3	E4	E5
OBS10	E1	E2	E3	E4	E5
;

proc sql;
create table dups as 
select *,
    count(distinct VAR2) as nVar2,
    count(distinct VAR3) as nVar3,
    count(distinct VAR4) as nVar4,
    count(distinct VAR5) as nVar5 
from have
group by ID
having max(nVar2, nVar3, nVar4, nVar5) > 1
order by i;
quit;

data nulDups;
set dups;
array _v var2-var5;
array _n nVar2-nVar5;
do j = 1 to dim(_v);
    if _n{j} = 1 then call missing(_v{j});
    end;
drop j nVar:;
run;

proc print data=nulDups; by id; id id; run;

PG

ChrisNZ · Posted 08-04-2016 11:12 PM

Like this?

proc sql;

select ID

,ifc(count(unique VAR2) =1,' ', VAR2) as VAR2

,ifc(count(unique VAR3) =1,' ', VAR3) as VAR3

,ifc(count(unique VAR4) =1,' ', VAR4) as VAR4

,ifc(count(unique VAR5) =1,' ', VAR5) as VAR5

from HAVE

group by ID

having lengthn(cats(VAR2,VAR3,VAR4,VAR5));

quit;

ID	VAR2	VAR5
A1	A3XXXX
A1	A2
B1		B5
B1		B6XXX

High-Performance SAS Coding - Third Edition

find the unique value in each pair