DATA Step, Macro, Functions and more

sorting data by dates in character form

Reply
Super Contributor
Posts: 323

sorting data by dates in character form

Dear,

 

I have a quick question about sorting data by dates which are in character form. (eg. sorting data in ex dataset by usubjid and exstdtc ( character date) and finding first.obs or last.obs.)

 

Is it ok to sort data with out converting the exstdtc to numeric form and then find first.obs and last.obs (or) it is required to convert to numeric form and do the data step.

  Please suggest.

 

 

Super User
Posts: 23,740

Re: sorting data by dates in character form

Posted in reply to knveraraju91
It depends. What format are the dates in? If it's YYMMDD it'll naturally sort out. If its Date9 it won't because 01Aug is alphabetically before 01Jan.
Super User
Posts: 10,258

Re: sorting data by dates in character form

Posted in reply to knveraraju91

There's no reason for storing dates as character. Numeric dates can be stored in 4 bytes (less than any character date), and they can be used directly in date functions and with date formats. On top of that, they always sort in correct temporal order.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Valued Guide
Posts: 596

Re: sorting data by dates in character form

Posted in reply to knveraraju91

If you want to use first. and last. and not to create a numeric variable, you can try something like this: 

PROC SQL;
select *,case when INPUT(Date,DATE9.)=min(INPUT(Date,DATE9.)) then "First"
			  When INPUT(Date,DATE9.)=max(INPUT(Date,DATE9.)) then "Last"
			  else " " end as F_L
	from test
	Group by ID
order by ID,INPUT(Date,DATE9.);
quit;
Thanks,
Suryakiran
Super User
Posts: 13,554

Re: sorting data by dates in character form

Posted in reply to knveraraju91

What appearance do your current character-valued dates have?

 

About the only one that sorts correctly with regularity is a yyyymmdd or one of its variants where mm(month) and dd(day) have leading zeroes. 2017-2-18 would come after 2017-11-15, or 2018-2-9 2018-2-12  because character comparisons for sorting are done left to right

(the -2 is after -1 in the first case and -9 after -1 in the second). So to get those to sort properly you need 2017-02-18 2018-02-09 and 2018-02-12.

Two digits years are also a potential issue if the data crosses the century boundary: 990105  (5Jan1999) would come after 051212 (12Dec2005) as the 99 comes after 05.

 

If the dates are mmddyy then order would basically ignore year as 01012018 would come before 01022017.

if the dates are ddmmyy then you would have the first of all months then the second 01012018 01022018 01032018 then 0201208 (back to January).

Date9 or Date7 (05Jan2017) would have April and August before January, February and March...

Ask a Question
Discussion stats
  • 4 replies
  • 101 views
  • 4 likes
  • 5 in conversation