I have 100k obs of data with the variable NAME. Looks as follows.
OBS | NAME |
1 | John |
2 | Mary |
3 | Joe |
4 | John |
5 | Steve |
6 | Joan |
7 | John |
8 | Mary |
9 | Steve |
10 | shawn |
What I would like to do is create a list that has each unique value of the variable NAME. From the sample above the output list would be...
Obs | Name |
1 | John |
2 | Mary |
3 | Joe |
4 | Steve |
5 | Joan |
6 | Shawn |
I only want to capture each unique value of the variable NAME. I've thought of doing this using a STACK feature in other languages and read up on the ARRAY function in SAS. I need to be able to read my input dataset capture the value of NAME and examine it against a stack or array to see if I have already stored its value.
Is there a better way of doing this? Is there a PROCEDURE that will do this? I'm thinking the DATA Step is the best approach but wanted to share my problem with this forum.
Best regards
please try the proc sort procedure with the nodupkey option to remove the duplicates.
proc sort data=have out=want nodupkey;
by name;
run;
The suggested SORT will work, as will PROC SQL using SELECT DISTINCT.
Before that, however, you have to decide what makes a name different? You have "shawn" in your sample data. Is "shawn" different than "Shawn" ? You might want to account for capitalization before applying a procedure.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.