<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Perl-regular expressions in SAS to add imputations to date and highlight data errors in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Perl-regular-expressions-in-SAS-to-add-imputations-to-date-and/m-p/841450#M332731</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm trying to use perl-regular expressions to impute missing datetime variables to eventually put in a macro as the imputed variable (e.g adtmc) is quite common in datasets. My approach works to a degree but I was wondering is it possible to further expand it to highlight uncommon data problems. Also depending on data raw datetime variables can sometimes contain a space or a T in between date and time. My current code only accounts for spacing but how can I account for both but make sure the T is removed and replaced by a space? I've provided sample data, desired output and the code I'm using currently to apply&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
length ecdtc $16;
infile datalines truncover;
input @1 ecdtc $16.;
datalines;
2020-01-01 01:02
2020-01-01T01:02
2020-01-01T01:
2020-01-01T01:89
2020-01-01T
2020-01-01' '/*Space indicates a possible datetime*/
2020-01-02
2020-01
2020
junk
;;;;
run;


data want;
length ecdtc $16;
infile datalines truncover;
input @1 ecdtc $16.;
datalines;
2020-01-01 01:02
2020-01-01 01:02
2020-01-01 01:XX
2020-01-01 01:XX
2020-01-01 XX:XX
2020-01-01 XX:XX 
2020-01-02 
2020-01-XX
2020-XX-XX
junk
;;;;
run;


data want;
  set have;
  length adtmc $16;
  array vals[3] $;
  vals[1]='XXXX';
  vals[2]='-XX';
  vals[3]='-XX';
  _rx = prxparse('/(\d{4})(-\d{2})?(-\d{2})?( \d{2}:\d{2})?/ios');
  _rc = prxmatch(_rx,ecdtc);   *this does the matching.  Probably should check for value of _rc to make sure it matched before continuing.;
  do _i = 1 to 4;   *now iterate through the four capture buffers;
    _rt = prxposn(_rx,_i,ecdtc);
    if _i le 3 then vals[_i] = coalescec(_rt,vals[_i]);
    else timepart = _rt;  *we do the timepart outside the array since it needs to be catted with a space while the others do not, easier this way;
  end;
  
  adtmc = cats(of vals[*]);  *cat them together now - if you do not capture the hyphen then use catx ('-',of vals[*]) instead;
  if timepart ne ' ' then adtmc = catx(' ',adtmc,timepart);  *and append the timepart after.;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Sat, 29 Oct 2022 21:49:48 GMT</pubDate>
    <dc:creator>smackerz1988</dc:creator>
    <dc:date>2022-10-29T21:49:48Z</dc:date>
    <item>
      <title>Perl-regular expressions in SAS to add imputations to date and highlight data errors</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Perl-regular-expressions-in-SAS-to-add-imputations-to-date-and/m-p/841450#M332731</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm trying to use perl-regular expressions to impute missing datetime variables to eventually put in a macro as the imputed variable (e.g adtmc) is quite common in datasets. My approach works to a degree but I was wondering is it possible to further expand it to highlight uncommon data problems. Also depending on data raw datetime variables can sometimes contain a space or a T in between date and time. My current code only accounts for spacing but how can I account for both but make sure the T is removed and replaced by a space? I've provided sample data, desired output and the code I'm using currently to apply&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
length ecdtc $16;
infile datalines truncover;
input @1 ecdtc $16.;
datalines;
2020-01-01 01:02
2020-01-01T01:02
2020-01-01T01:
2020-01-01T01:89
2020-01-01T
2020-01-01' '/*Space indicates a possible datetime*/
2020-01-02
2020-01
2020
junk
;;;;
run;


data want;
length ecdtc $16;
infile datalines truncover;
input @1 ecdtc $16.;
datalines;
2020-01-01 01:02
2020-01-01 01:02
2020-01-01 01:XX
2020-01-01 01:XX
2020-01-01 XX:XX
2020-01-01 XX:XX 
2020-01-02 
2020-01-XX
2020-XX-XX
junk
;;;;
run;


data want;
  set have;
  length adtmc $16;
  array vals[3] $;
  vals[1]='XXXX';
  vals[2]='-XX';
  vals[3]='-XX';
  _rx = prxparse('/(\d{4})(-\d{2})?(-\d{2})?( \d{2}:\d{2})?/ios');
  _rc = prxmatch(_rx,ecdtc);   *this does the matching.  Probably should check for value of _rc to make sure it matched before continuing.;
  do _i = 1 to 4;   *now iterate through the four capture buffers;
    _rt = prxposn(_rx,_i,ecdtc);
    if _i le 3 then vals[_i] = coalescec(_rt,vals[_i]);
    else timepart = _rt;  *we do the timepart outside the array since it needs to be catted with a space while the others do not, easier this way;
  end;
  
  adtmc = cats(of vals[*]);  *cat them together now - if you do not capture the hyphen then use catx ('-',of vals[*]) instead;
  if timepart ne ' ' then adtmc = catx(' ',adtmc,timepart);  *and append the timepart after.;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sat, 29 Oct 2022 21:49:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Perl-regular-expressions-in-SAS-to-add-imputations-to-date-and/m-p/841450#M332731</guid>
      <dc:creator>smackerz1988</dc:creator>
      <dc:date>2022-10-29T21:49:48Z</dc:date>
    </item>
    <item>
      <title>Re: Perl-regular expressions in SAS to add imputations to date and highlight data errors</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Perl-regular-expressions-in-SAS-to-add-imputations-to-date-and/m-p/841488#M332740</link>
      <description>&lt;P&gt;It's always a tradeoff between validation strength and complexity, This should filter out many anomalies:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
if not id then id + prxparse(
    "/([12][90]\d\d-[01]\d-[0123]\d|[12][90]\d\d-[01]\d|[12][90]\d\d)[ T]([012]\d:[012345]\d|[012]\d:)?/io");
set have;
length adtmc $16;
if prxmatch(id, ecdtc) then do;
    d = prxposn(id, 1, ecdtc);
    t = prxposn(id, 2, ecdtc);
    adtmc = catx(" ", 
        catx("-", scan(d,1), coalescec(scan(d,2), "XX"), coalescec(scan(d,3), "XX")),
        catx(":", coalescec(scan(t,1,":"), "XX"), coalescec(scan(t,2,":"), "XX")) );
    end;
else adtmc = ecdtc;
drop id d t;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="PGStats_0-1667076982354.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/76762i56BAE36F25F1A4B1/image-size/medium?v=v2&amp;amp;px=400" role="button" title="PGStats_0-1667076982354.png" alt="PGStats_0-1667076982354.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 29 Oct 2022 20:58:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Perl-regular-expressions-in-SAS-to-add-imputations-to-date-and/m-p/841488#M332740</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2022-10-29T20:58:47Z</dc:date>
    </item>
    <item>
      <title>Re: Perl-regular expressions in SAS to add imputations to date and highlight data errors</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Perl-regular-expressions-in-SAS-to-add-imputations-to-date-and/m-p/841489#M332741</link>
      <description>&lt;P&gt;Very nice solution thanks!. Quick question does the id+prxparse have to be before the set statement?. Also does this pick up when no time is present and so will only impute the date and not the full datetime imputation if no timepart exists?&lt;/P&gt;</description>
      <pubDate>Sat, 29 Oct 2022 21:29:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Perl-regular-expressions-in-SAS-to-add-imputations-to-date-and/m-p/841489#M332741</guid>
      <dc:creator>smackerz1988</dc:creator>
      <dc:date>2022-10-29T21:29:14Z</dc:date>
    </item>
    <item>
      <title>Re: Perl-regular expressions in SAS to add imputations to date and highlight data errors</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Perl-regular-expressions-in-SAS-to-add-imputations-to-date-and/m-p/841490#M332742</link>
      <description>&lt;P&gt;No it doesn't. As long as it runs before PRXMATCH. In any case, you should make sure that the datastep variable names are not the same as the input dataset variable names.&lt;/P&gt;</description>
      <pubDate>Sat, 29 Oct 2022 21:34:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Perl-regular-expressions-in-SAS-to-add-imputations-to-date-and/m-p/841490#M332742</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2022-10-29T21:34:09Z</dc:date>
    </item>
    <item>
      <title>Re: Perl-regular expressions in SAS to add imputations to date and highlight data errors</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Perl-regular-expressions-in-SAS-to-add-imputations-to-date-and/m-p/841491#M332743</link>
      <description>&lt;P&gt;Is there a way to automate to check the length of the variable(in this case ecdtc) to see what imputation is required?. Basically I'm trying to put this into a macro and some date variables will have different imputation requirements based on if it is a date or datetime&lt;/P&gt;</description>
      <pubDate>Sat, 29 Oct 2022 21:37:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Perl-regular-expressions-in-SAS-to-add-imputations-to-date-and/m-p/841491#M332743</guid>
      <dc:creator>smackerz1988</dc:creator>
      <dc:date>2022-10-29T21:37:12Z</dc:date>
    </item>
    <item>
      <title>Re: Perl-regular expressions in SAS to add imputations to date and highlight data errors</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Perl-regular-expressions-in-SAS-to-add-imputations-to-date-and/m-p/841492#M332744</link>
      <description>&lt;P&gt;Apologies I've updated my have/want datasets to illustrate my approach better&lt;/P&gt;</description>
      <pubDate>Sat, 29 Oct 2022 21:47:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Perl-regular-expressions-in-SAS-to-add-imputations-to-date-and/m-p/841492#M332744</guid>
      <dc:creator>smackerz1988</dc:creator>
      <dc:date>2022-10-29T21:47:17Z</dc:date>
    </item>
    <item>
      <title>Re: Perl-regular expressions in SAS to add imputations to date and highlight data errors</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Perl-regular-expressions-in-SAS-to-add-imputations-to-date-and/m-p/841496#M332745</link>
      <description>&lt;P&gt;you could use the length function on ecdtc and code the logic accordingly.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 30 Oct 2022 00:27:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Perl-regular-expressions-in-SAS-to-add-imputations-to-date-and/m-p/841496#M332745</guid>
      <dc:creator>tarheel13</dc:creator>
      <dc:date>2022-10-30T00:27:19Z</dc:date>
    </item>
    <item>
      <title>Re: Perl-regular expressions in SAS to add imputations to date and highlight data errors</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Perl-regular-expressions-in-SAS-to-add-imputations-to-date-and/m-p/841508#M332748</link>
      <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
length ecdtc $16;
infile datalines truncover;
input @1 ecdtc $16.;
datalines;
2020-01-01 01:02
2020-01-01T01:02
2020-01-01T01:
2020-01-01T01:89
2020-01-01T
2020-01-01
2020-01-02
2020-01
2020
junk
;;;;
run;

data want;
 set have;
 want='XXXX-XX-XX XX:XX';
 if anydigit(ecdtc)=1 then do;
   do i=1 to length(ecdtc);
     substr(want,i,1)=substr(translate(ecdtc,' ','T'),i,1);
   end;
 end;
 else want=ecdtc;
 drop i;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sun, 30 Oct 2022 09:57:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Perl-regular-expressions-in-SAS-to-add-imputations-to-date-and/m-p/841508#M332748</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2022-10-30T09:57:56Z</dc:date>
    </item>
    <item>
      <title>Re: Perl-regular expressions in SAS to add imputations to date and highlight data errors</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Perl-regular-expressions-in-SAS-to-add-imputations-to-date-and/m-p/841529#M332749</link>
      <description>&lt;BLOCKQUOTE&gt;
&lt;P&gt;2020-01-01' '/*Space indicates a possible datetime*/&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Remember that SAS stores character strings as fixed length.&amp;nbsp; There is no way to tell the difference between '2020-01-01' and '2020-01-01&amp;nbsp; &amp;nbsp; &amp;nbsp;' once you have the value in a variable.&lt;/P&gt;</description>
      <pubDate>Sun, 30 Oct 2022 13:47:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Perl-regular-expressions-in-SAS-to-add-imputations-to-date-and/m-p/841529#M332749</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2022-10-30T13:47:02Z</dc:date>
    </item>
    <item>
      <title>Re: Perl-regular expressions in SAS to add imputations to date and highlight data errors</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Perl-regular-expressions-in-SAS-to-add-imputations-to-date-and/m-p/841536#M332751</link>
      <description>&lt;P&gt;Thanks everyone. Some really useful approaches here!&lt;/P&gt;</description>
      <pubDate>Sun, 30 Oct 2022 14:36:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Perl-regular-expressions-in-SAS-to-add-imputations-to-date-and/m-p/841536#M332751</guid>
      <dc:creator>smackerz1988</dc:creator>
      <dc:date>2022-10-30T14:36:29Z</dc:date>
    </item>
  </channel>
</rss>

