I would like to remove observations where the value in 2 columns are the same (exists before). For example, pair A and B exist already so I would like to remove the fourth observation. similarly, I would like to remove the last obs as the pair B and C already exist.
student1 | student2 | treatment |
A | B | keep |
A | C | keep |
A | D | keep |
B | A | remove |
B | C | keep |
B | D | keep |
C | A | keep |
C | B | remove |
If you can live with an arbitrary order of your students in the rows, you can use SORTC to get the students in the same order everywhere. Then it is just a question of removing the duplicates (SORT with NODUPKEY):
data sorted; set have; call sortc(student1,student2); run; proc sort nodupkey; by student1 student2; run;
If you can live with an arbitrary order of your students in the rows, you can use SORTC to get the students in the same order everywhere. Then it is just a question of removing the duplicates (SORT with NODUPKEY):
data sorted; set have; call sortc(student1,student2); run; proc sort nodupkey; by student1 student2; run;
!!!Post test data in the form of a datastep using the code window which is the {i} above post!!!
data have; input student1 $ student2 $; datalines; A B A C A D B A B C B D ; run; data want; set have; array student{2}; call sortc(of student{*}); run; proc sort data=want nodupkey; by student:; run;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.