<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Import not cleaned text string column into a dataset in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/298729#M62866</link>
    <description>&lt;P&gt;This is not reducing my run time.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Can we write a similar program using PRX fncitons. that would help in running time?&lt;/P&gt;</description>
    <pubDate>Thu, 15 Sep 2016 17:29:05 GMT</pubDate>
    <dc:creator>kumarK</dc:creator>
    <dc:date>2016-09-15T17:29:05Z</dc:date>
    <item>
      <title>Import not cleaned text string column into a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/298177#M62676</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;How to import special characters (HIdden quotes,double quotes, next line) string around 2000 length in one of the column in csv file.(attached)&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;TABLE style="border-collapse: collapse; width: 288pt;" border="0" width="384" cellspacing="0" cellpadding="0"&gt;&lt;COLGROUP&gt;&lt;COL style="width: 48pt;" span="6" width="64" /&gt; &lt;/COLGROUP&gt;
&lt;TBODY&gt;
&lt;TR style="height: 15.0pt;"&gt;
&lt;TD width="64" height="20" style="height: 15.0pt; width: 48pt;"&gt;Cust_id&lt;/TD&gt;
&lt;TD width="64" style="width: 48pt;"&gt;trans_type&lt;/TD&gt;
&lt;TD width="64" style="width: 48pt;"&gt;gender&lt;/TD&gt;
&lt;TD width="64" style="width: 48pt;"&gt;trans_date&lt;/TD&gt;
&lt;TD width="64" style="width: 48pt;"&gt;trans_amt&lt;/TD&gt;
&lt;TD width="64" style="width: 48pt;"&gt;Comments&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 15.0pt;"&gt;
&lt;TD height="20" align="right" style="height: 15.0pt;"&gt;11201&lt;/TD&gt;
&lt;TD&gt;Entertainment&lt;/TD&gt;
&lt;TD&gt;F&lt;/TD&gt;
&lt;TD align="center" class="xl63"&gt;########&lt;/TD&gt;
&lt;TD&gt;$50.65&lt;/TD&gt;
&lt;TD&gt;I Talked to customer, but he is not interested&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 15.0pt;"&gt;
&lt;TD height="20" align="right" style="height: 15.0pt;"&gt;11205&lt;/TD&gt;
&lt;TD&gt;Grocery&lt;/TD&gt;
&lt;TD&gt;m&lt;/TD&gt;
&lt;TD align="center" class="xl63"&gt;########&lt;/TD&gt;
&lt;TD&gt;$90.00&lt;/TD&gt;
&lt;TD width="64" class="xl64" style="width: 48pt;"&gt;1. he is not wiling&lt;BR /&gt; but he is will continue in future&lt;BR /&gt; 2. he is good&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 15.0pt;"&gt;
&lt;TD height="20" align="right" style="height: 15.0pt;"&gt;11206&lt;/TD&gt;
&lt;TD&gt;Travel&lt;/TD&gt;
&lt;TD&gt;m&lt;/TD&gt;
&lt;TD align="center" class="xl63"&gt;########&lt;/TD&gt;
&lt;TD&gt;$23.00&lt;/TD&gt;
&lt;TD&gt;"not wiiling to continue" regards Diip&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 15.0pt;"&gt;
&lt;TD height="20" align="right" style="height: 15.0pt;"&gt;11206&lt;/TD&gt;
&lt;TD&gt;Travel&lt;/TD&gt;
&lt;TD&gt;m&lt;/TD&gt;
&lt;TD align="center" class="xl63"&gt;########&lt;/TD&gt;
&lt;TD&gt;$23.00&lt;/TD&gt;
&lt;TD width="64" class="xl64" style="width: 48pt;"&gt;"not talked &lt;BR /&gt; &lt;BR /&gt; &lt;BR /&gt; @@ regards hiip&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;</description>
      <pubDate>Tue, 13 Sep 2016 23:58:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/298177#M62676</guid>
      <dc:creator>kumarK</dc:creator>
      <dc:date>2016-09-13T23:58:03Z</dc:date>
    </item>
    <item>
      <title>Re: Import not cleaned text string column into a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/298181#M62678</link>
      <description>&lt;P&gt;Yuck.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This is one of the few cases where I say keep the data in Excel since it will keep,relevant content in a single cell.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Note this assumes you started with an Excel file since going to Excel from a CSV will have the same issues.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 14 Sep 2016 00:10:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/298181#M62678</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2016-09-14T00:10:54Z</dc:date>
    </item>
    <item>
      <title>Re: Import not cleaned text string column into a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/298185#M62680</link>
      <description>&lt;P&gt;unfortunately we dont have access to import excel files. we need to convert either csv or text file.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;this is just sample data i have unstructured data with so much cooments wecan't controal that column..i need to load this data into a dataset.do you have any recommendations how can i import?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 14 Sep 2016 00:36:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/298185#M62680</guid>
      <dc:creator>kumarK</dc:creator>
      <dc:date>2016-09-14T00:36:52Z</dc:date>
    </item>
    <item>
      <title>Re: Import not cleaned text string column into a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/298192#M62683</link>
      <description>&lt;PRE&gt;
You are luck. The termstr is CRLF . Others are CR . after I checked by NotePad++ .




filename x '/folders/myfolders/comments.csv'  termstr=crlf;
proc import datafile=x out=want dbms=csv replace;run;

&lt;/PRE&gt;</description>
      <pubDate>Wed, 14 Sep 2016 02:03:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/298192#M62683</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2016-09-14T02:03:53Z</dc:date>
    </item>
    <item>
      <title>Re: Import not cleaned text string column into a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/298193#M62684</link>
      <description>&lt;PRE&gt;
You are luck. The termstr is CRLF . Others are CR . after I checked by NotePad++ .




filename x '/folders/myfolders/comments.csv'  termstr=crlf;
proc import datafile=x out=want dbms=csv replace;
 guessingrows=32767;
run;

&lt;/PRE&gt;</description>
      <pubDate>Wed, 14 Sep 2016 02:04:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/298193#M62684</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2016-09-14T02:04:41Z</dc:date>
    </item>
    <item>
      <title>Re: Import not cleaned text string column into a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/298194#M62685</link>
      <description>&lt;P&gt;I don't see any attached file.&lt;/P&gt;
&lt;P&gt;Why do you say that you cannot read the Excel file? &amp;nbsp;SAS can directly read both XLS and XLSX files.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;libname mydata xlsx 'myfile.xlsx';
proc copy inlib=mydata outlib=work;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;If you have a CSV file generated by Excel then you can probably read it into SAS by making sure to tell SAS to use CRLF as the end of line markers. &amp;nbsp;That way it will read the embedded CR that Excel normally uses for line breaks inside of cells.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want ;
  infile 'myfile.csv' dsd truncover lrecl=100000 truncover firstobs=2 termstr=crlf;
  length col1-col6 $32767;
  input col1-col6;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 14 Sep 2016 02:07:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/298194#M62685</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2016-09-14T02:07:57Z</dc:date>
    </item>
    <item>
      <title>Re: Import not cleaned text string column into a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/298198#M62688</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I created the attached txt file, simulating an export from Excel to CSV.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I imported it with this code:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data test;
   infile "\\UNC path to my file\test.txt" delimiter="09"x termstr="|" dsd;
   length col1 col2 $1000;
   input col1 col2;
run;
proc print;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;If your data&amp;nbsp;has embedded CRLF, then use termstr to change the character that SAS thinks marks the end-of-record. &amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Note: &amp;nbsp;usually Excel doesn't use &lt;STRONG&gt;CRLF&lt;/STRONG&gt; to embed line feeds, but only &lt;STRONG&gt;LF&lt;/STRONG&gt; (I think it's "0A"x). &amp;nbsp;So, you may not have to use termstr= or the pipe symbol to mark your end-of-record.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a Powershell script which runs a query against an Excel file and returns it as a CSV stream. &amp;nbsp;You can use an infile pipe to dynamically read the stream in a data step. &amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You could modify it to write out your custom termstr character at the end of each line (something like SELECT [*,"|"] FROM [Sheet1$]). &amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Or, use another technique, such as adding the end-of-record character in your Excel file itself. &amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If you want to try the Powershell script, I don't have time to support further questions, but Google and the SAS documentation are your friend.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here's the link: &amp;nbsp;&lt;A href="https://github.com/scottbass/Powershell/blob/master/Scripts/Query-Excel.ps1" target="_blank"&gt;https://github.com/scottbass/Powershell/blob/master/Scripts/Query-Excel.ps1&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hope this helps...&lt;/P&gt;</description>
      <pubDate>Wed, 14 Sep 2016 02:19:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/298198#M62688</guid>
      <dc:creator>ScottBass</dc:creator>
      <dc:date>2016-09-14T02:19:20Z</dc:date>
    </item>
    <item>
      <title>Re: Import not cleaned text string column into a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/298473#M62778</link>
      <description>&lt;P&gt;i have successfully able to import the file. now&amp;nbsp;i want to strip the words in text. below code works fine for small datset.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;however i have large datset it's taking too much time.. anyone have any suggestions to improve the code. Thanks&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data have1;&lt;BR /&gt; set have(keep=comments);&lt;BR /&gt; count =countw(comments," ");&lt;BR /&gt; run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data have2(drop=count i);&lt;BR /&gt; set have1;&lt;BR /&gt; do i=1 to count;&lt;BR /&gt; words = scan(comments,i ," ");&lt;BR /&gt; output;&lt;BR /&gt; end;&lt;BR /&gt; run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 15 Sep 2016 00:51:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/298473#M62778</guid>
      <dc:creator>kumarK</dc:creator>
      <dc:date>2016-09-15T00:51:21Z</dc:date>
    </item>
    <item>
      <title>Re: Import not cleaned text string column into a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/298488#M62781</link>
      <description>&lt;PRE&gt;
You can combine these two data step into one :


 
data have2(drop=count i);
set have;
do i=1 to countw(comments," ");
words = scan(comments,i ," ");
output;
end;
run;

&lt;/PRE&gt;</description>
      <pubDate>Thu, 15 Sep 2016 02:24:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/298488#M62781</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2016-09-15T02:24:52Z</dc:date>
    </item>
    <item>
      <title>Re: Import not cleaned text string column into a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/298729#M62866</link>
      <description>&lt;P&gt;This is not reducing my run time.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Can we write a similar program using PRX fncitons. that would help in running time?&lt;/P&gt;</description>
      <pubDate>Thu, 15 Sep 2016 17:29:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/298729#M62866</guid>
      <dc:creator>kumarK</dc:creator>
      <dc:date>2016-09-15T17:29:05Z</dc:date>
    </item>
    <item>
      <title>Re: Import not cleaned text string column into a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/298840#M62899</link>
      <description>I don't think PRX functions could reduce your time. The fast efficient way I could think is using IML code. But that need lots of memory if you have big table.</description>
      <pubDate>Fri, 16 Sep 2016 01:50:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/298840#M62899</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2016-09-16T01:50:21Z</dc:date>
    </item>
    <item>
      <title>Re: Import not cleaned text string column into a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/298844#M62902</link>
      <description>&lt;P&gt;Define takes too much time?&lt;/P&gt;
&lt;P&gt;Is this a process that needs to be repeated every week? Month?&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2016 02:57:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/298844#M62902</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2016-09-16T02:57:29Z</dc:date>
    </item>
    <item>
      <title>Re: Import not cleaned text string column into a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/298847#M62904</link>
      <description>&lt;PRE&gt;
Define a LENGTH statement might save you some time.

data have2(drop=count i);
set have;
LENGTH WORDS $ 40;
do i=1 to countw(comments," ");
words = scan(comments,i ," ");
output;
end;
run;


&lt;/PRE&gt;</description>
      <pubDate>Fri, 16 Sep 2016 03:28:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/298847#M62904</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2016-09-16T03:28:29Z</dc:date>
    </item>
    <item>
      <title>Re: Import not cleaned text string column into a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/298853#M62906</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/15653"&gt;@kumarK&lt;/a&gt; wrote:&lt;BR /&gt;&lt;P&gt;i have successfully able to import the file. now&amp;nbsp;i want to strip the words in text. below code works fine for small datset.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;however i have large datset it's taking too much time.. anyone have any suggestions to improve the code. Thanks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;data have1;&lt;BR /&gt;set have(keep=comments);&lt;BR /&gt;count =countw(comments," ");&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;data have2(drop=count i);&lt;BR /&gt;set have1;&lt;BR /&gt;do i=1 to count;&lt;BR /&gt;words = scan(comments,i ," ");&lt;BR /&gt;output;&lt;BR /&gt;end;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;Well, sure, there's plenty of ways to make this run faster:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Get a faster server&lt;/LI&gt;&lt;LI&gt;Use MP_Connect to process your data in multiple SAS processes&lt;/LI&gt;&lt;LI&gt;If you're using EG, use EG's parallel processing to process your data in multiple SAS processes&lt;/LI&gt;&lt;LI&gt;Use PROC DS2's threading capability to process your data in multiple threads&lt;/LI&gt;&lt;LI&gt;As&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/18408"&gt;@Ksharp&lt;/a&gt;&amp;nbsp;said, don't do a double pass over your data. &amp;nbsp;Your first&amp;nbsp;step reads comments into the PDV - do your processing on it then.&lt;/LI&gt;&lt;LI&gt;You might get a slight performance gain by using the SPDE engine for your source and target datasets.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;However, it would be a huge help to know WHY you're doing what your doing? &amp;nbsp;Clearly, you're transforming your data, writing each word of the comments field as a separate record in your final output table, where "word" is a token delimited&amp;nbsp;by a space.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What do you then want to do with the data? &amp;nbsp;If we knew that, perhaps there would be other approaches that wouldn't involve parsing your comments field into one record per word. &amp;nbsp;For example, if you need to get a count of each word, you could use a hash object to sum for each word, and your final dataset (and therefore I/O) would be much smaller.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If your source data has 1M records, and your comments field averages 100 words, you've just converted your 1M records into 100M.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For small datasets, inefficient approaches are ok - quick and dirty rules the day. &amp;nbsp;But, if your data is "large", and you're experiencing performance issues, you may need to apply more creative approaches.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So, forget the technicalities...what do you want to DO?&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2016 04:49:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/298853#M62906</guid>
      <dc:creator>ScottBass</dc:creator>
      <dc:date>2016-09-16T04:49:07Z</dc:date>
    </item>
    <item>
      <title>Re: Import not cleaned text string column into a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/299106#M62994</link>
      <description>&lt;P&gt;the dataset i have around 500k comments...i want to strip the comments into separate words..then i need to count how many times each word apperead. the code used above is taking too much time and generating around 100gb data..so i want &amp;nbsp;to reduce the time and space.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;after some research i came to know that i have intially assigend length for coments variable $5000 so&amp;nbsp;that all comments is loaded proerly into my dataset. i think this is taking my dataset around 100gb&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;below is the the output i am getting after running the code, lf you see for every word its taking comments ..is there anyway we can improve the code. or how can i know&amp;nbsp;max length of the variable before loading to dataset so that i can assign that length to the variable? or can i get the similar results another way? I hope you understand the issue. Thanks&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;TABLE width="223"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD width="159"&gt;comments&lt;/TD&gt;
&lt;TD width="64"&gt;words&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;I Talked to customer&lt;/TD&gt;
&lt;TD&gt;I&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;I Talked to customer&lt;/TD&gt;
&lt;TD&gt;Talked&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;I Talked to customer&lt;/TD&gt;
&lt;TD&gt;to&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;I Talked to customer&lt;/TD&gt;
&lt;TD&gt;customer&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;</description>
      <pubDate>Sat, 17 Sep 2016 16:09:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/299106#M62994</guid>
      <dc:creator>kumarK</dc:creator>
      <dc:date>2016-09-17T16:09:25Z</dc:date>
    </item>
    <item>
      <title>Re: Import not cleaned text string column into a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/299107#M62995</link>
      <description>&lt;P&gt;"What do you then want to do with the data? &amp;nbsp;If we knew that, perhaps there would be other approaches that wouldn't involve parsing your comments field into one record per word. &amp;nbsp;For example, if you need to get a count of each word, you could use a hash object to sum for each word, and your final dataset (and therefore I/O) would be much smaller.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If your source data has 1M records, and your comments field averages 100 words, you've just converted your 1M records into 100M."&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Exactly you got my problem. the program i written unnessarly creating 100M dataset like the above example.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My objective is to &amp;nbsp;&lt;SPAN&gt;to get a count of each word in comments variable.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;"you could use a hash object to sum for each word, and your final dataset (and therefore I/O) would be much smaller." &amp;nbsp;-- can you please help me with code&amp;nbsp;how we can get this using hash object.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 17 Sep 2016 16:15:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/299107#M62995</guid>
      <dc:creator>kumarK</dc:creator>
      <dc:date>2016-09-17T16:15:28Z</dc:date>
    </item>
    <item>
      <title>Re: Import not cleaned text string column into a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/299115#M62997</link>
      <description>&lt;P&gt;Try using a data step view to split the words. Then at least you do not need to save the table.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data words / view=words ;
  set have (keep=comment);
  do i=1 to countw(comment,' ');
     length word $100 ;
     word = scan(comment,i,' ');
     output;
  end;
  keep word ;
run;
proc summary nway data=words;
   class word ;
   output out=word_freqs ;
run;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sat, 17 Sep 2016 19:05:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/299115#M62997</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2016-09-17T19:05:01Z</dc:date>
    </item>
    <item>
      <title>Re: Import not cleaned text string column into a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/299124#M63005</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/15653"&gt;@kumarK&lt;/a&gt; wrote:&lt;BR /&gt;&lt;P&gt;&lt;SPAN&gt;"you could use a hash object to sum for each word, and your final dataset (and therefore I/O) would be much smaller." &amp;nbsp;-- can you please help me with code&amp;nbsp;how we can get this using hash object.&lt;/SPAN&gt;&lt;/P&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I don't have time right now to write example code, but the logic would go like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You would have a data _null_ step&lt;/P&gt;&lt;P&gt;Your set statement would have "end=eof"&lt;/P&gt;&lt;P&gt;Create a hash object when _n_=1&lt;/P&gt;&lt;P&gt;The hash object key is "word", the hash object var is "count"&lt;/P&gt;&lt;P&gt;Parse your words as you are doing now&lt;/P&gt;&lt;P&gt;I honestly can't remember if the keys in a hash object are case sensitive, and I'm not in front of SAS right now to test. &amp;nbsp;If so, convert the case of your words.&lt;/P&gt;&lt;P&gt;For each word (i.e. within your parsing loop):&lt;/P&gt;&lt;P&gt;Explicitly set count to missing&lt;/P&gt;&lt;P&gt;Find the existing word in your hash object&lt;/P&gt;&lt;P&gt;If the find is successful, count will be set to the current count from the hash object&lt;/P&gt;&lt;P&gt;If the find is unsuccessful, count will remain missing&lt;/P&gt;&lt;P&gt;Set count=sum(count,1). &amp;nbsp;This will set count to either 1, or count+1&lt;/P&gt;&lt;P&gt;Update the hash object. &amp;nbsp;This will either update the existing word, or add the new word&lt;/P&gt;&lt;P&gt;When end=eof, write the hash object to an output dataset&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Read the SAS doc on the hash object: &lt;A href="http://support.sas.com/documentation/cdl/en/lecompobjref/67221/HTML/default/viewer.htm#p0zjo5imt7rb9rn1npxm1rsl58b3.htm" target="_blank"&gt;http://support.sas.com/documentation/cdl/en/lecompobjref/67221/HTML/default/viewer.htm#p0zjo5imt7rb9rn1npxm1rsl58b3.htm&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;See the Add, Declare, Find, Output, Replace, and Sum methods. &amp;nbsp;See the examples. &amp;nbsp;If you don't understand the examples, run them until you do. &amp;nbsp;They can be confusing when you're first learning hash objects, but once you understand them, they are a powerful tool in your SAS kit bag.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I know the above approach would work. &amp;nbsp;However, the Sum method may be a an alternative approach, which may also work for you.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;HTH...&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Edit&lt;/STRONG&gt;: &amp;nbsp;See&amp;nbsp;&lt;A href="http://support.sas.com/documentation/cdl/en/lecompobjref/67221/HTML/default/viewer.htm#p00ilfw5pzcjvtn1nfya9863fozd.htm" target="_blank"&gt;http://support.sas.com/documentation/cdl/en/lecompobjref/67221/HTML/default/viewer.htm#p00ilfw5pzcjvtn1nfya9863fozd.htm&lt;/A&gt; (Example 5). &amp;nbsp;This should solve your your issue.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Edit #2&lt;/STRONG&gt;: &amp;nbsp;But you should also try Tom's approach and compare the performance. &amp;nbsp;If his approach performs well for you, it would be a simpler approach requiring less code. &amp;nbsp;Regardless of your final approach, use this as an opportunity to learn more about hash objects.&lt;/P&gt;</description>
      <pubDate>Sun, 18 Sep 2016 09:13:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/299124#M63005</guid>
      <dc:creator>ScottBass</dc:creator>
      <dc:date>2016-09-18T09:13:25Z</dc:date>
    </item>
    <item>
      <title>Re: Import not cleaned text string column into a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/299125#M63006</link>
      <description>&lt;P&gt;Have you considered SAS text analytics rather than using Base SAS?&lt;/P&gt;</description>
      <pubDate>Sat, 17 Sep 2016 22:55:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/299125#M63006</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2016-09-17T22:55:43Z</dc:date>
    </item>
    <item>
      <title>Re: Import not cleaned text string column into a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/299135#M63014</link>
      <description>&lt;P&gt;dont have license to SAS text analytics &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 18 Sep 2016 03:32:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Import-not-cleaned-text-string-column-into-a-dataset/m-p/299135#M63014</guid>
      <dc:creator>kumarK</dc:creator>
      <dc:date>2016-09-18T03:32:36Z</dc:date>
    </item>
  </channel>
</rss>

