BookmarkSubscribeRSS Feed
xxartpopxx
Fluorite | Level 6

I am trying to delete the observations in my data set that are the same across multiple variables.

I am trying to delete the observations in my data set that are the same across multiple variables

For example

PIN       Start Date          End Date
1          Jan 1 2014         Jan 3 2014>
1         Jan 1 2014           Jan 3 2015
3         March 2 2014       March 5 2014
4        July 1 2014        July 8 2014
5         July 1 2014        July 8 2014
6        August 9 2014         August 24 2014

I would want to remove those with the same PIN and Start Date.

when I do proc sort data=final out=nodups nodupkey;

by PIN start date;

run;

It does not find the dups.

But when I do just by PIN it finds two dups.

5 REPLIES 5
ballardw
Super User

What is the actual name of your start date variable? If there is actually a space in the name then you need to use a name literal that would look like 'start date'n to be valid code.

 

If the date values are character it is possible that you have leading blanks on some values or the number of spaces between values in the characters may differ though are hard to tell.

 

I suggest that you proc contents on your data set and show us the result.

webart999ARM
Quartz | Level 8

There could be a few reasons why the proc sort statement is not finding duplicate observations with the same PIN and Start Date.
One reason could be that the Start Date variable is not formatted as a date. In order to properly compare dates, the Start Date variable should be formatted as a date in SAS using the date9. format. This can be done with the following code:

data final;
set final;
format start_date date9.;
run;

Another reason could be that the Start Date variable has different date formats across observations. For example, some observations may have the date formatted as "Jan 1 2014" while others may have it formatted as "01JAN2014". In order to properly compare dates, all observations should have the same date format. This can be done with the following code:

data final;
set final;
start_date = input(start_date, date9.);
run;

Once the Start Date variable has been properly formatted as a date, you can try running the proc sort statement again to see if it finds duplicate observations with the same PIN and Start Date.

Tom
Super User Tom
Super User

The format attached to a variable does not change the values stored in the variable. 

So just adding 

format start_date date9.;

to change the format used to display the values of START_DATE will make no difference in how the values are sorted.

 

gema
Calcite | Level 5
It looked to me like it was an invalid SAS name. start date.
Needs to be start_date
gema
Calcite | Level 5
start date is seen as 2 sas variables. start_date is a valid variable name

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 452 views
  • 0 likes
  • 5 in conversation