DATA Step, Macro, Functions and more

find the unique value in each pair

Reply
Contributor
Posts: 25

find the unique value in each pair

Dear community members;

I have a dataset in which i have a some duplicate obs on ID variable 

I want to find the obs and var combination for which there is a unque value

like A3XXX (ie OBS2 and VAR2) and B6XXXX

 

 IDVAR2VAR3VAR4VAR5
OBS1A1A2A3A4A5
OBS2A1A3XXXXA3A4A5
OBS3B1B2B3B4B5
OBS4B1B2B3B4B6XXX
OBS5C1C2C3C4C5
OBS6C1C2C3C4C5
OBS7D1D2D3D4D5
OBS8D1D2D3D4D5
OBS9E1E2E3E4E5
OBS10E1E2E3E4E5

 

 

how should i approach this problem any suggestion?

thanks a ton in advance :-)

Respected Advisor
Posts: 4,644

Re: find the unique value in each pair

Here is a way to spot non uniform replicates:

 

data have;
length i ID VAR2 VAR3 VAR4 VAR5 $8;
input i ID VAR2 VAR3 VAR4 VAR5;
datalines;
OBS1	A1	A2	A3	A4	A5
OBS2	A1	A3XXXX	A3	A4	A5
OBS3	B1	B2	B3	B4	B5
OBS4	B1	B2	B3	B4	B6XXX
OBS5	C1	C2	C3	C4	C5
OBS6	C1	C2	C3	C4	C5
OBS7	D1	D2	D3	D4	D5
OBS8	D1	D2	D3	D4	D5
OBS9	E1	E2	E3	E4	E5
OBS10	E1	E2	E3	E4	E5
;

proc sql;
create table dups as 
select * 
from have
group by ID
having count(distinct catx("@", VAR2, VAR3, VAR4, VAR5)) > 1
order by i;
quit;

proc print data=dups; by id; id id; run;
PG
Regular Contributor
Posts: 241

Re: find the unique value in each pair

Please elaborate and share what do you want in output dataset.

Contributor
Posts: 25

Re: find the unique value in each pair

Hi Rahul the output I want is like mentioned here

 

 VAR1VAR2VAR3VAR4VAR5
OBS1 A2   
OBS2 A3XXXX   
OBS3    B5
OBS4    B6XXX

 

i

Contributor
Posts: 25

Re: find the unique value in each pair

or even as PGStat replied grouping is also good
Respected Advisor
Posts: 4,644

Re: find the unique value in each pair

The same idea as before, but nulling out identical entries:

 

data have;
length i ID VAR2 VAR3 VAR4 VAR5 $8;
input i ID VAR2 VAR3 VAR4 VAR5;
datalines;
OBS1	A1	A2	A3	A4	A5
OBS2	A1	A3XXXX	A3	A4	A5
OBS3	B1	B2	B3	B4	B5
OBS4	B1	B2	B3	B4	B6XXX
OBS5	C1	C2	C3	C4	C5
OBS6	C1	C2	C3	C4	C5
OBS7	D1	D2	D3	D4	D5
OBS8	D1	D2	D3	D4	D5
OBS9	E1	E2	E3	E4	E5
OBS10	E1	E2	E3	E4	E5
;

proc sql;
create table dups as 
select *,
    count(distinct VAR2) as nVar2,
    count(distinct VAR3) as nVar3,
    count(distinct VAR4) as nVar4,
    count(distinct VAR5) as nVar5 
from have
group by ID
having max(nVar2, nVar3, nVar4, nVar5) > 1
order by i;
quit;

data nulDups;
set dups;
array _v var2-var5;
array _n nVar2-nVar5;
do j = 1 to dim(_v);
    if _n{j} = 1 then call missing(_v{j});
    end;
drop j nVar:;
run;

proc print data=nulDups; by id; id id; run;
PG
PROC Star
Posts: 1,561

Re: find the unique value in each pair

Like this?

 

proc sql;

select ID

,ifc(count(unique VAR2) =1,' ', VAR2) as VAR2

,ifc(count(unique VAR3) =1,' ', VAR3) as VAR3

,ifc(count(unique VAR4) =1,' ', VAR4) as VAR4

,ifc(count(unique VAR5) =1,' ', VAR5) as VAR5

from HAVE

group by ID

having lengthn(cats(VAR2,VAR3,VAR4,VAR5));

quit;

 

ID VAR2 VAR3 VAR4 VAR5
A1 A3XXXX      
A1 A2      
B1       B5
B1       B6XXX
Ask a Question
Discussion stats
  • 6 replies
  • 312 views
  • 5 likes
  • 4 in conversation