I would like to remove observations where the value in 2 columns are the same (exists before). For example, pair A and B exist already so I would like to remove the fourth observation. similarly, I would like to remove the last obs as the pair B and C already exist.
student1 | student2 | treatment |
A | B | keep |
A | C | keep |
A | D | keep |
B | A | remove |
B | C | keep |
B | D | keep |
C | A | keep |
C | B | remove |
If you can live with an arbitrary order of your students in the rows, you can use SORTC to get the students in the same order everywhere. Then it is just a question of removing the duplicates (SORT with NODUPKEY):
data sorted; set have; call sortc(student1,student2); run; proc sort nodupkey; by student1 student2; run;
If you can live with an arbitrary order of your students in the rows, you can use SORTC to get the students in the same order everywhere. Then it is just a question of removing the duplicates (SORT with NODUPKEY):
data sorted; set have; call sortc(student1,student2); run; proc sort nodupkey; by student1 student2; run;
!!!Post test data in the form of a datastep using the code window which is the {i} above post!!!
data have; input student1 $ student2 $; datalines; A B A C A D B A B C B D ; run; data want; set have; array student{2}; call sortc(of student{*}); run; proc sort data=want nodupkey; by student:; run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.