I am trying to delete the observations in my data set that are the same across multiple variables.
I am trying to delete the observations in my data set that are the same across multiple variables
For example
PIN Start Date End Date
1 Jan 1 2014 Jan 3 2014>
1 Jan 1 2014 Jan 3 2015
3 March 2 2014 March 5 2014
4 July 1 2014 July 8 2014
5 July 1 2014 July 8 2014
6 August 9 2014 August 24 2014
I would want to remove those with the same PIN and Start Date.
when I do proc sort data=final out=nodups nodupkey;
by PIN start date;
run;
It does not find the dups.
But when I do just by PIN it finds two dups.
What is the actual name of your start date variable? If there is actually a space in the name then you need to use a name literal that would look like 'start date'n to be valid code.
If the date values are character it is possible that you have leading blanks on some values or the number of spaces between values in the characters may differ though are hard to tell.
I suggest that you proc contents on your data set and show us the result.
There could be a few reasons why the proc sort statement is not finding duplicate observations with the same PIN and Start Date.
One reason could be that the Start Date variable is not formatted as a date. In order to properly compare dates, the Start Date variable should be formatted as a date in SAS using the date9. format. This can be done with the following code:
data final;
set final;
format start_date date9.;
run;
Another reason could be that the Start Date variable has different date formats across observations. For example, some observations may have the date formatted as "Jan 1 2014" while others may have it formatted as "01JAN2014". In order to properly compare dates, all observations should have the same date format. This can be done with the following code:
data final;
set final;
start_date = input(start_date, date9.);
run;
Once the Start Date variable has been properly formatted as a date, you can try running the proc sort statement again to see if it finds duplicate observations with the same PIN and Start Date.
The format attached to a variable does not change the values stored in the variable.
So just adding
format start_date date9.;
to change the format used to display the values of START_DATE will make no difference in how the values are sorted.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.