Solved: remove duplicated pairs of variable values

somebody · Posted 12-07-2017 05:15 AM

I would like to remove observations where the value in 2 columns are the same (exists before). For example, pair A and B exist already so I would like to remove the fourth observation. similarly, I would like to remove the last obs as the pair B and C already exist.

student1	student2	treatment
A	B	keep
A	C	keep
A	D	keep
B	A	remove
B	C	keep
B	D	keep
C	A	keep
C	B	remove

s_lassen · Posted 12-07-2017 05:28 AM

If you can live with an arbitrary order of your students in the rows, you can use SORTC to get the students in the same order everywhere. Then it is just a question of removing the duplicates (SORT with NODUPKEY):

data sorted;
  set have;
  call sortc(student1,student2);
run;

proc sort nodupkey;
  by student1 student2;
run;

View solution in original post

s_lassen · Posted 12-07-2017 05:28 AM

If you can live with an arbitrary order of your students in the rows, you can use SORTC to get the students in the same order everywhere. Then it is just a question of removing the duplicates (SORT with NODUPKEY):

data sorted;
  set have;
  call sortc(student1,student2);
run;

proc sort nodupkey;
  by student1 student2;
run;

RW9 · Posted 12-07-2017 05:28 AM

!!!Post test data in the form of a datastep using the code window which is the {i} above post!!!

data have;
  input student1 $ student2 $;
datalines;
A B
A C
A D
B A
B C
B D
;
run;

data want;
  set have;
  array student{2};
  call sortc(of student{*});
run;

proc sort data=want nodupkey;
  by student:;
run;

remove duplicated pairs of variable values

Re: remove duplicated pairs of variable values

Re: remove duplicated pairs of variable values

Re: remove duplicated pairs of variable values

remove duplicated pairs of variable values

Re: remove duplicated pairs of variable values

Re: remove duplicated pairs of variable values

Re: remove duplicated pairs of variable values

Click image to register for webinar

Classroom Training Available!