<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Read huge and kinda messy text data into SAS in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Read-huge-and-kinda-messy-text-data-into-SAS/m-p/473187#M121391</link>
    <description>&lt;P&gt;Sometimes that's a problem with the ANYDTDTM informat rather than the actual data, I would confirm those data points are incorrect by verifying against the actual data first.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/132289"&gt;@Cruise&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;&lt;BR /&gt;Good point. proc freq turned 0.01% of years were '3000'. Bad dates. Thanks Paul.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;</description>
    <pubDate>Mon, 25 Jun 2018 22:07:41 GMT</pubDate>
    <dc:creator>Reeza</dc:creator>
    <dc:date>2018-06-25T22:07:41Z</dc:date>
    <item>
      <title>Read huge and kinda messy text data into SAS</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Read-huge-and-kinda-messy-text-data-into-SAS/m-p/473148#M121379</link>
      <description>&lt;P&gt;Hello SAS experts,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have a huge text file trying to read in with SAS. The first few observations look as below when I open it with excel. The data is separated by }. However, excel truncates to its first million rows which is a fraction of the complete data.&amp;nbsp; I attached csv file of these first few observations to this post, in case. As you will note, ndc_code variable is there for the majority of the data (99%) but not 100% consistent in all rows.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'll highly appreciate any help. The data is so important to me guys. I'm freaked out to lose or break it before or during my attempts to read it into SAS.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks in advance.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Copy of first few observations. Attached as csv as well.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;TABLE width="512"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD colspan="4" width="256"&gt;ID}"trans_id"}"ndc_code"}"start_date"&lt;/TD&gt;
&lt;TD width="64"&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD width="64"&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD width="64"&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD width="64"&gt;&amp;nbsp;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;Y96033K}200&lt;/TD&gt;
&lt;TD&gt;524&lt;/TD&gt;
&lt;TD&gt;406&lt;/TD&gt;
&lt;TD&gt;3&lt;/TD&gt;
&lt;TD colspan="3"&gt;641}}2005-08-02 00:00:00&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;C89950C}200&lt;/TD&gt;
&lt;TD&gt;524&lt;/TD&gt;
&lt;TD&gt;406&lt;/TD&gt;
&lt;TD&gt;7&lt;/TD&gt;
&lt;TD colspan="3"&gt;191}}2005-07-15 00:00:00&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;C89950C}200&lt;/TD&gt;
&lt;TD&gt;524&lt;/TD&gt;
&lt;TD&gt;406&lt;/TD&gt;
&lt;TD&gt;7&lt;/TD&gt;
&lt;TD colspan="3"&gt;194}}2005-07-19 00:00:00&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;G57388K}200&lt;/TD&gt;
&lt;TD&gt;524&lt;/TD&gt;
&lt;TD&gt;406&lt;/TD&gt;
&lt;TD&gt;7&lt;/TD&gt;
&lt;TD colspan="3"&gt;444}}2005-07-29 00:00:00&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;P33846V}200&lt;/TD&gt;
&lt;TD&gt;524&lt;/TD&gt;
&lt;TD&gt;406&lt;/TD&gt;
&lt;TD&gt;11&lt;/TD&gt;
&lt;TD colspan="3"&gt;998}}2005-07-29 00:00:00&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;X87617C}200&lt;/TD&gt;
&lt;TD&gt;524&lt;/TD&gt;
&lt;TD&gt;406&lt;/TD&gt;
&lt;TD&gt;24&lt;/TD&gt;
&lt;TD colspan="3"&gt;406}}2005-08-09 00:00:00&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;S88141M}200&lt;/TD&gt;
&lt;TD&gt;524&lt;/TD&gt;
&lt;TD&gt;406&lt;/TD&gt;
&lt;TD&gt;38&lt;/TD&gt;
&lt;TD colspan="3"&gt;001}}2005-06-17 00:00:00&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;P01353V}200&lt;/TD&gt;
&lt;TD&gt;524&lt;/TD&gt;
&lt;TD&gt;406&lt;/TD&gt;
&lt;TD&gt;41&lt;/TD&gt;
&lt;TD colspan="3"&gt;517}}2005-07-23 00:00:00&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;Y58550R}200&lt;/TD&gt;
&lt;TD&gt;524&lt;/TD&gt;
&lt;TD&gt;406&lt;/TD&gt;
&lt;TD&gt;55&lt;/TD&gt;
&lt;TD colspan="4"&gt;262}"00185011701"}2005-08-24 00:00:00&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;N87928G}200&lt;/TD&gt;
&lt;TD&gt;524&lt;/TD&gt;
&lt;TD&gt;406&lt;/TD&gt;
&lt;TD&gt;63&lt;/TD&gt;
&lt;TD colspan="4"&gt;646}"00186504054"}2005-08-25 00:00:00&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;E11620G}200&lt;/TD&gt;
&lt;TD&gt;524&lt;/TD&gt;
&lt;TD&gt;406&lt;/TD&gt;
&lt;TD&gt;68&lt;/TD&gt;
&lt;TD colspan="4"&gt;200}"00024542131"}2005-08-25 00:00:00&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;K96751N}200&lt;/TD&gt;
&lt;TD&gt;524&lt;/TD&gt;
&lt;TD&gt;406&lt;/TD&gt;
&lt;TD&gt;69&lt;/TD&gt;
&lt;TD colspan="3"&gt;379}}2005-07-14 00:00:00&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;W65400Y}200&lt;/TD&gt;
&lt;TD&gt;524&lt;/TD&gt;
&lt;TD&gt;406&lt;/TD&gt;
&lt;TD&gt;74&lt;/TD&gt;
&lt;TD colspan="4"&gt;975}"00456321060"}2005-08-25 00:00:00&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;W65400Y}200&lt;/TD&gt;
&lt;TD&gt;524&lt;/TD&gt;
&lt;TD&gt;406&lt;/TD&gt;
&lt;TD&gt;75&lt;/TD&gt;
&lt;TD colspan="4"&gt;085}"00536375610"}2005-08-25 00:00:00&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;S47546N}200&lt;/TD&gt;
&lt;TD&gt;524&lt;/TD&gt;
&lt;TD&gt;406&lt;/TD&gt;
&lt;TD&gt;75&lt;/TD&gt;
&lt;TD colspan="4"&gt;269}"00006011731"}2005-08-25 00:00:00&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;W65400Y}200&lt;/TD&gt;
&lt;TD&gt;524&lt;/TD&gt;
&lt;TD&gt;406&lt;/TD&gt;
&lt;TD&gt;75&lt;/TD&gt;
&lt;TD colspan="4"&gt;358}"00013830304"}2005-08-25 00:00:00&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;X45852C}200&lt;/TD&gt;
&lt;TD&gt;524&lt;/TD&gt;
&lt;TD&gt;406&lt;/TD&gt;
&lt;TD&gt;78&lt;/TD&gt;
&lt;TD colspan="4"&gt;350}"00088110747"}2005-08-25 00:00:00&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;W41407V}200&lt;/TD&gt;
&lt;TD&gt;524&lt;/TD&gt;
&lt;TD&gt;406&lt;/TD&gt;
&lt;TD&gt;83&lt;/TD&gt;
&lt;TD colspan="4"&gt;114}"65726023510"}2005-08-25 00:00:00&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;E39839F}200&lt;/TD&gt;
&lt;TD&gt;524&lt;/TD&gt;
&lt;TD&gt;406&lt;/TD&gt;
&lt;TD&gt;83&lt;/TD&gt;
&lt;TD colspan="4"&gt;373}"00182432906"}2005-08-25 00:00:00&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;X94973M}200&lt;/TD&gt;
&lt;TD&gt;524&lt;/TD&gt;
&lt;TD&gt;406&lt;/TD&gt;
&lt;TD&gt;84&lt;/TD&gt;
&lt;TD colspan="4"&gt;142}"00023917715"}2005-08-25 00:00:00&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 25 Jun 2018 19:58:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Read-huge-and-kinda-messy-text-data-into-SAS/m-p/473148#M121379</guid>
      <dc:creator>Cruise</dc:creator>
      <dc:date>2018-06-25T19:58:14Z</dc:date>
    </item>
    <item>
      <title>Re: Read huge and kinda messy text data into SAS</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Read-huge-and-kinda-messy-text-data-into-SAS/m-p/473161#M121383</link>
      <description>&lt;P&gt;firstly, i wouldn't modify the csv. Not sure if you were contemplating that, in any case I would make the changes to the data within sas eg x=index(variable,'}') y=substr(variable,x+1) etc. I don't see where ndc is missing, but it wouldn't matter if it was. Are you having problems reading the file or you're just concerned about the format you received it in ie with "}" etc&lt;/P&gt;</description>
      <pubDate>Mon, 25 Jun 2018 20:36:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Read-huge-and-kinda-messy-text-data-into-SAS/m-p/473161#M121383</guid>
      <dc:creator>pau13rown</dc:creator>
      <dc:date>2018-06-25T20:36:49Z</dc:date>
    </item>
    <item>
      <title>Re: Read huge and kinda messy text data into SAS</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Read-huge-and-kinda-messy-text-data-into-SAS/m-p/473164#M121384</link>
      <description>&lt;P&gt;Thanks for asking.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It just worked out as below:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;However, Resulting date 'start_date' looks like: 01JUN05:00:00:00. I don't know yet if it is the best format to read in date with. Please let me know if better suggestion.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data drug;
infile 'data.csv' delimiter='}' missover
dsd lrecl=32767 firstobs=2;
informat ID $7.;
informat second_id $20. ;
informat ndc $11.;
informat start_date anydtdtm40. ;
format ID $7. ;
format second_id $20. ;
format ndc $11. ;
format start_date datetime. ;
input 
IF $
second_id $
ndc $
start_date
;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 25 Jun 2018 20:48:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Read-huge-and-kinda-messy-text-data-into-SAS/m-p/473164#M121384</guid>
      <dc:creator>Cruise</dc:creator>
      <dc:date>2018-06-25T20:48:14Z</dc:date>
    </item>
    <item>
      <title>Re: Read huge and kinda messy text data into SAS</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Read-huge-and-kinda-messy-text-data-into-SAS/m-p/473171#M121386</link>
      <description>&lt;P&gt;re date format, since you have so many obs you'd want to check whether there are partial dates or spurious dates eg year only. You may want to read it as text and then derive the date within sas. If you have time as part of the start date then you might want to separate out the time component. It appears that you don't have time and thus you may want to use date9. format because the time part is superfluous&lt;/P&gt;</description>
      <pubDate>Mon, 25 Jun 2018 21:11:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Read-huge-and-kinda-messy-text-data-into-SAS/m-p/473171#M121386</guid>
      <dc:creator>pau13rown</dc:creator>
      <dc:date>2018-06-25T21:11:57Z</dc:date>
    </item>
    <item>
      <title>Re: Read huge and kinda messy text data into SAS</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Read-huge-and-kinda-messy-text-data-into-SAS/m-p/473175#M121387</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/183379"&gt;@pau13rown&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;Good point. proc freq turned 0.01% of years were '3000'. Bad dates. Thanks Paul.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have1(compress=yes); set have;
start_date1 = datepart(start_date);
format start_date1 date9.;
start_year=year(start_date1);
run;
proc freq data=have1(compress=yes);
tables start_year;
run; &lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 25 Jun 2018 21:23:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Read-huge-and-kinda-messy-text-data-into-SAS/m-p/473175#M121387</guid>
      <dc:creator>Cruise</dc:creator>
      <dc:date>2018-06-25T21:23:42Z</dc:date>
    </item>
    <item>
      <title>Re: Read huge and kinda messy text data into SAS</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Read-huge-and-kinda-messy-text-data-into-SAS/m-p/473187#M121391</link>
      <description>&lt;P&gt;Sometimes that's a problem with the ANYDTDTM informat rather than the actual data, I would confirm those data points are incorrect by verifying against the actual data first.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/132289"&gt;@Cruise&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;&lt;BR /&gt;Good point. proc freq turned 0.01% of years were '3000'. Bad dates. Thanks Paul.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;</description>
      <pubDate>Mon, 25 Jun 2018 22:07:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Read-huge-and-kinda-messy-text-data-into-SAS/m-p/473187#M121391</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2018-06-25T22:07:41Z</dc:date>
    </item>
  </channel>
</rss>

