BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.

my dates in this cirque du soleil data come in rather jumbled up from the internet.

 

I'd like to figure out longest standing shows, which means I need start date & end date.

I can always extract the last word & if "present" use today() as enddate.

 

dates with a single value are one year events so those also get ruled out.

but how do I extract start & end dates for the remaining values where the date are stored in all kinds of text strings.

 

scratched my head with the scan function, & not much time left to complete my ppt so any tips greatly appreciated.

 

thank you

Date
30 June 2018–present
13 July 2013 – 30 July 2017
24 June 2009 – 1 September 2013
38549
13–28 December 2014
38179
17–19 October 2008
15 July 2015–present
34851
42195
May–September 2010
6 February 2016 – 23 December 2020
12 June 1997 – 28 May 2000
13 May–30 August 2015
16 June–9 September 2017
27 November 2019 – 18 December 2022
14 August 2018 – 19 August 2019
2–10 January 2016
2 May 2022–present
23 September 2018—present
June–September 2008
20 March–29 November 1992
39117
40965
37339
1 ACCEPTED SOLUTION

Accepted Solutions
Quentin
Super User

Hi Charu,

 

I typically try to stay away from using ANYDTDTE, but for a mess like this, it may be helpful.

 

I took a brute force approach, I'm sure there are much better ways. I ignore any runs that don't have a valid start date and end date.  I treat 'June–September 2008' as an invalid start date, because since there is no start year, it must be less than one year, so we don't need to worry about it.  

 

But this is an ugly hack...

 

data have ;
  input date $50. ;
  startc=scan(date,1,'–—') ;
  endc=scan(date,2,'–—') ;
  if endc='present' then endc=put(today(),date9.) ;
  if length(startc)>5 then startn=input(startc,?? anydtdte60.) ;
  if length(endc)>5 then endn=input(endc,?? anydtdte60.) ;
  if n(startn,endn)=2 then duration=endn-startn ;

  format startn endn date9. ;

  cards ;
30 June 2018–present
13 July 2013 – 30 July 2017
24 June 2009 – 1 September 2013
38549
13–28 December 2014
38179
17–19 October 2008
15 July 2015–present
34851
42195
May–September 2010
6 February 2016 – 23 December 2020
12 June 1997 – 28 May 2000
13 May–30 August 2015
16 June–9 September 2017
27 November 2019 – 18 December 2022
14 August 2018 – 19 August 2019
2–10 January 2016
2 May 2022–present
23 September 2018—present
June–September 2008
20 March–29 November 1992
39117
40965
37339
;
run ;

Looking forward to seeing you at SAS Explore!

 

The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.

View solution in original post

5 REPLIES 5
ballardw
Super User

Define the cases.

Provide a solution for each case.

If no -, (note: check your data for what character is actually the separator. The 'dash' I get copy and pasting to my editor is not a dash but a high order ASCII character that depends on font to display. That might be a result of pasting into the forum or what ever was used to create that gridlike appearance.) is present it should be a single date and a single INPUT function call should work on the value.

Attempt to convert the part before /after the - for the start/end date  with appropriate informat. If the date fails it needs intervention.

Cases with a numeric value only for the first should be the same month and year of the enddate so you get   mdy(month(enddate),firstday,year(enddate))

 

When you identify a part that has no day of the month you get to impute some value for the day of the month to set a date.

Here is a couple of cases. Similar for missing value and parsing the bits of the string to but into different cases.

data have;
   infile datalines truncover;
   input string $50.;
   startdate= input(scan(string,1,'–'),?? anydtdte32. );
   enddate= input(scan(string,2,'–'),?? anydtdte32. );
   if missing(startdate) then do;
      /* check if the first bit is numeric*/
      firstday =input(scan(string,1,'–'),?? f4. );
      if firstday>0 then startdate= mdy(month(enddate),firstday,year(enddate));
   end;
   format startdate enddate date9.;

datalines;
6 February 2016 – 23 December 2020
13–28 December 2014
;

Note use of the ANYDTDTE function can be locale dependent when you get things like 12/10/12 as which gets "guessed" to be month, day and year but that does not appear to be the case with your example data.

The ?? in the INPUT function calls is to suppress the known invalid data messages so the log is somewhat manageable.

 

A couple of case it may be easier to create the word date value and use the INPUT function.

 

The stupid almost certainly spreadsheet numeric values of 34851 and such I suggest a step before "fixing". Use spreadsheet to set the column to date type, even if most of the values aren't, set a know display layout for the dates. Export the data as CSV and read the file with a datastep. That should get the single dates to appear as such and not need the post hoc intervention of adjusting from 1Jan1900 that is a moderately well know problem

sqlGoddess
SAS Employee
the excel spreadsheet fix is a brilliant idea, thank you. I ran the code & I got missing values for the newly minted date columns, maybe its just me?
Quentin
Super User

Hi Charu,

 

I typically try to stay away from using ANYDTDTE, but for a mess like this, it may be helpful.

 

I took a brute force approach, I'm sure there are much better ways. I ignore any runs that don't have a valid start date and end date.  I treat 'June–September 2008' as an invalid start date, because since there is no start year, it must be less than one year, so we don't need to worry about it.  

 

But this is an ugly hack...

 

data have ;
  input date $50. ;
  startc=scan(date,1,'–—') ;
  endc=scan(date,2,'–—') ;
  if endc='present' then endc=put(today(),date9.) ;
  if length(startc)>5 then startn=input(startc,?? anydtdte60.) ;
  if length(endc)>5 then endn=input(endc,?? anydtdte60.) ;
  if n(startn,endn)=2 then duration=endn-startn ;

  format startn endn date9. ;

  cards ;
30 June 2018–present
13 July 2013 – 30 July 2017
24 June 2009 – 1 September 2013
38549
13–28 December 2014
38179
17–19 October 2008
15 July 2015–present
34851
42195
May–September 2010
6 February 2016 – 23 December 2020
12 June 1997 – 28 May 2000
13 May–30 August 2015
16 June–9 September 2017
27 November 2019 – 18 December 2022
14 August 2018 – 19 August 2019
2–10 January 2016
2 May 2022–present
23 September 2018—present
June–September 2008
20 March–29 November 1992
39117
40965
37339
;
run ;

Looking forward to seeing you at SAS Explore!

 

The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.
PaigeMiller
Diamond | Level 26

Numbers like 34851 are possibly "Excel dates", which can be converted to SAS dates by subtracting 21916. Or maybe they mean something else?

--
Paige Miller
sqlGoddess
SAS Employee
this worked, thanks a lot Quentin, I didn't realize the expertise in communities would help me with solutions so quickly. See you at SAS Explore!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1456 views
  • 5 likes
  • 4 in conversation