<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Dizzy Dates - another ETL question for sas explore-extracting start and end date from string in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Dizzy-Dates-another-ETL-question-for-sas-explore-extracting/m-p/890303#M351786</link>
    <description>&lt;P&gt;Define the cases.&lt;/P&gt;
&lt;P&gt;Provide a solution for each case.&lt;/P&gt;
&lt;P&gt;If no -, (&lt;STRONG&gt;note:&amp;nbsp;&lt;/STRONG&gt;check your data for what character is actually the separator. The 'dash' I get copy and pasting to my editor is not a dash but a high order ASCII character that depends on font to display. That might be a result of pasting into the forum or what ever was used to create that gridlike appearance.) is present it should be a single date and a single INPUT function call should work on the value.&lt;/P&gt;
&lt;P&gt;Attempt to convert the part before /after&amp;nbsp;the - for the start/end date&amp;nbsp; with appropriate informat. If the date fails it needs intervention.&lt;/P&gt;
&lt;P&gt;Cases with a numeric value only for the first should be the same month and year of the enddate so you get&amp;nbsp;&amp;nbsp; mdy(month(enddate),firstday,year(enddate))&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;When you identify a part that has no day of the month you get to impute some value for the day of the month to set a date.&lt;/P&gt;
&lt;P&gt;Here is a couple of cases. Similar for missing value and parsing the bits of the string to but into different cases.&lt;/P&gt;
&lt;PRE&gt;data have;
   infile datalines truncover;
   input string $50.;
   startdate= input(scan(string,1,'–'),?? anydtdte32. );
   enddate= input(scan(string,2,'–'),?? anydtdte32. );
   if missing(startdate) then do;
      /* check if the first bit is numeric*/
      firstday =input(scan(string,1,'–'),?? f4. );
      if firstday&amp;gt;0 then startdate= mdy(month(enddate),firstday,year(enddate));
   end;
   format startdate enddate date9.;

datalines;
6 February 2016 – 23 December 2020
13–28 December 2014
;

&lt;/PRE&gt;
&lt;P&gt;Note use of the ANYDTDTE function can be locale dependent when you get things like 12/10/12 as which gets "guessed" to be month, day and year but that does not appear to be the case with your example data.&lt;/P&gt;
&lt;P&gt;The ?? in the INPUT function calls is to suppress the known invalid data messages so the log is somewhat manageable.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;A couple of case it may be easier to create the word date value and use the INPUT function.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The stupid almost certainly spreadsheet numeric values of 34851 and such I suggest a step before "fixing". Use spreadsheet to set the column to date type, even if most of the values aren't, set a know display layout for the dates. Export the data as CSV and read the file with a datastep. That should get the single dates to appear as such and not need the post hoc intervention of adjusting from 1Jan1900 that is a moderately well know problem&lt;/P&gt;</description>
    <pubDate>Mon, 21 Aug 2023 21:08:45 GMT</pubDate>
    <dc:creator>ballardw</dc:creator>
    <dc:date>2023-08-21T21:08:45Z</dc:date>
    <item>
      <title>Dizzy Dates - another ETL question for sas explore-extracting start and end date from string</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Dizzy-Dates-another-ETL-question-for-sas-explore-extracting/m-p/890293#M351780</link>
      <description>&lt;P&gt;my dates in this cirque du soleil data come in rather jumbled up from the internet.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'd like to figure out longest standing shows, which means I need start date &amp;amp; end date.&lt;/P&gt;
&lt;P&gt;I can always extract the last word &amp;amp; if "present" use today() as enddate.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;dates with a single value are one year events so those also get ruled out.&lt;/P&gt;
&lt;P&gt;but how do I extract start &amp;amp; end dates for the remaining values where the date are stored in all kinds of text strings.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;scratched my head with the scan function, &amp;amp; not much time left to complete my ppt so any tips greatly appreciated.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;thank you&lt;/P&gt;
&lt;TABLE width="263"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD width="263"&gt;Date&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;30 June 2018–present&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;13 July 2013 – 30 July 2017&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;24 June 2009 – 1 September 2013&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;38549&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;13–28 December 2014&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;38179&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;17–19 October 2008&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;15 July 2015–present&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;34851&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;42195&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;May–September 2010&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;6 February 2016 – 23 December 2020&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;12 June 1997 – 28 May 2000&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;13 May–30 August 2015&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;16 June–9 September 2017&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;27 November 2019 – 18 December 2022&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;14 August 2018 – 19 August 2019&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;2–10 January 2016&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;2 May 2022–present&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;23 September 2018—present&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;June–September 2008&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;20 March–29 November 1992&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;39117&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;40965&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;37339&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;</description>
      <pubDate>Mon, 21 Aug 2023 19:58:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Dizzy-Dates-another-ETL-question-for-sas-explore-extracting/m-p/890293#M351780</guid>
      <dc:creator>sqlGoddess</dc:creator>
      <dc:date>2023-08-21T19:58:01Z</dc:date>
    </item>
    <item>
      <title>Re: Dizzy Dates - another ETL question for sas explore-extracting start and end date from string</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Dizzy-Dates-another-ETL-question-for-sas-explore-extracting/m-p/890303#M351786</link>
      <description>&lt;P&gt;Define the cases.&lt;/P&gt;
&lt;P&gt;Provide a solution for each case.&lt;/P&gt;
&lt;P&gt;If no -, (&lt;STRONG&gt;note:&amp;nbsp;&lt;/STRONG&gt;check your data for what character is actually the separator. The 'dash' I get copy and pasting to my editor is not a dash but a high order ASCII character that depends on font to display. That might be a result of pasting into the forum or what ever was used to create that gridlike appearance.) is present it should be a single date and a single INPUT function call should work on the value.&lt;/P&gt;
&lt;P&gt;Attempt to convert the part before /after&amp;nbsp;the - for the start/end date&amp;nbsp; with appropriate informat. If the date fails it needs intervention.&lt;/P&gt;
&lt;P&gt;Cases with a numeric value only for the first should be the same month and year of the enddate so you get&amp;nbsp;&amp;nbsp; mdy(month(enddate),firstday,year(enddate))&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;When you identify a part that has no day of the month you get to impute some value for the day of the month to set a date.&lt;/P&gt;
&lt;P&gt;Here is a couple of cases. Similar for missing value and parsing the bits of the string to but into different cases.&lt;/P&gt;
&lt;PRE&gt;data have;
   infile datalines truncover;
   input string $50.;
   startdate= input(scan(string,1,'–'),?? anydtdte32. );
   enddate= input(scan(string,2,'–'),?? anydtdte32. );
   if missing(startdate) then do;
      /* check if the first bit is numeric*/
      firstday =input(scan(string,1,'–'),?? f4. );
      if firstday&amp;gt;0 then startdate= mdy(month(enddate),firstday,year(enddate));
   end;
   format startdate enddate date9.;

datalines;
6 February 2016 – 23 December 2020
13–28 December 2014
;

&lt;/PRE&gt;
&lt;P&gt;Note use of the ANYDTDTE function can be locale dependent when you get things like 12/10/12 as which gets "guessed" to be month, day and year but that does not appear to be the case with your example data.&lt;/P&gt;
&lt;P&gt;The ?? in the INPUT function calls is to suppress the known invalid data messages so the log is somewhat manageable.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;A couple of case it may be easier to create the word date value and use the INPUT function.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The stupid almost certainly spreadsheet numeric values of 34851 and such I suggest a step before "fixing". Use spreadsheet to set the column to date type, even if most of the values aren't, set a know display layout for the dates. Export the data as CSV and read the file with a datastep. That should get the single dates to appear as such and not need the post hoc intervention of adjusting from 1Jan1900 that is a moderately well know problem&lt;/P&gt;</description>
      <pubDate>Mon, 21 Aug 2023 21:08:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Dizzy-Dates-another-ETL-question-for-sas-explore-extracting/m-p/890303#M351786</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2023-08-21T21:08:45Z</dc:date>
    </item>
    <item>
      <title>Re: Dizzy Dates - another ETL question for sas explore-extracting start and end date from string</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Dizzy-Dates-another-ETL-question-for-sas-explore-extracting/m-p/890305#M351787</link>
      <description>&lt;P&gt;Hi Charu,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I typically try to stay away from using ANYDTDTE, but for a mess like this, it may be helpful.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I took a brute force approach, I'm sure there are much better ways. I ignore any runs that don't have a valid start date and end date.&amp;nbsp; I treat 'June–September 2008' as an invalid start date, because since there is no start year, it must be less than one year, so we don't need to worry about it.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But this is an ugly hack...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have ;
  input date $50. ;
  startc=scan(date,1,'–—') ;
  endc=scan(date,2,'–—') ;
  if endc='present' then endc=put(today(),date9.) ;
  if length(startc)&amp;gt;5 then startn=input(startc,?? anydtdte60.) ;
  if length(endc)&amp;gt;5 then endn=input(endc,?? anydtdte60.) ;
  if n(startn,endn)=2 then duration=endn-startn ;

  format startn endn date9. ;

  cards ;
30 June 2018–present
13 July 2013 – 30 July 2017
24 June 2009 – 1 September 2013
38549
13–28 December 2014
38179
17–19 October 2008
15 July 2015–present
34851
42195
May–September 2010
6 February 2016 – 23 December 2020
12 June 1997 – 28 May 2000
13 May–30 August 2015
16 June–9 September 2017
27 November 2019 – 18 December 2022
14 August 2018 – 19 August 2019
2–10 January 2016
2 May 2022–present
23 September 2018—present
June–September 2008
20 March–29 November 1992
39117
40965
37339
;
run ;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Looking forward to seeing you at SAS Explore!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 21 Aug 2023 21:14:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Dizzy-Dates-another-ETL-question-for-sas-explore-extracting/m-p/890305#M351787</guid>
      <dc:creator>Quentin</dc:creator>
      <dc:date>2023-08-21T21:14:17Z</dc:date>
    </item>
    <item>
      <title>Re: Dizzy Dates - another ETL question for sas explore-extracting start and end date from string</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Dizzy-Dates-another-ETL-question-for-sas-explore-extracting/m-p/890306#M351788</link>
      <description>&lt;P&gt;Numbers like 34851 are possibly "Excel dates", which can be converted to SAS dates by subtracting 21916. Or maybe they mean something else?&lt;/P&gt;</description>
      <pubDate>Mon, 21 Aug 2023 21:47:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Dizzy-Dates-another-ETL-question-for-sas-explore-extracting/m-p/890306#M351788</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2023-08-21T21:47:27Z</dc:date>
    </item>
    <item>
      <title>Re: Dizzy Dates - another ETL question for sas explore-extracting start and end date from string</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Dizzy-Dates-another-ETL-question-for-sas-explore-extracting/m-p/890387#M351820</link>
      <description>the excel spreadsheet fix is a brilliant idea, thank you. I ran the code &amp;amp; I got missing values for the newly minted date columns, maybe its just me?</description>
      <pubDate>Tue, 22 Aug 2023 13:58:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Dizzy-Dates-another-ETL-question-for-sas-explore-extracting/m-p/890387#M351820</guid>
      <dc:creator>sqlGoddess</dc:creator>
      <dc:date>2023-08-22T13:58:24Z</dc:date>
    </item>
    <item>
      <title>Re: Dizzy Dates - another ETL question for sas explore-extracting start and end date from string</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Dizzy-Dates-another-ETL-question-for-sas-explore-extracting/m-p/890397#M351837</link>
      <description>this worked, thanks a lot Quentin, I didn't realize the expertise in communities would help me with solutions so quickly. See you at SAS Explore!</description>
      <pubDate>Tue, 22 Aug 2023 15:00:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Dizzy-Dates-another-ETL-question-for-sas-explore-extracting/m-p/890397#M351837</guid>
      <dc:creator>sqlGoddess</dc:creator>
      <dc:date>2023-08-22T15:00:04Z</dc:date>
    </item>
  </channel>
</rss>

