09-13-2016 07:58 PM
How to import special characters (HIdden quotes,double quotes, next line) string around 2000 length in one of the column in csv file.(attached)
|11201||Entertainment||F||########||$50.65||I Talked to customer, but he is not interested|
|11205||Grocery||m||########||$90.00||1. he is not wiling
but he is will continue in future
2. he is good
|11206||Travel||m||########||$23.00||"not wiiling to continue" regards Diip|
@@ regards hiip
09-13-2016 08:10 PM
This is one of the few cases where I say keep the data in Excel since it will keep,relevant content in a single cell.
Note this assumes you started with an Excel file since going to Excel from a CSV will have the same issues.
09-13-2016 08:36 PM
unfortunately we dont have access to import excel files. we need to convert either csv or text file.
this is just sample data i have unstructured data with so much cooments wecan't controal that column..i need to load this data into a dataset.do you have any recommendations how can i import?
09-13-2016 10:19 PM
I created the attached txt file, simulating an export from Excel to CSV.
I imported it with this code:
data test; infile "\\UNC path to my file\test.txt" delimiter="09"x termstr="|" dsd; length col1 col2 $1000; input col1 col2; run; proc print; run;
If your data has embedded CRLF, then use termstr to change the character that SAS thinks marks the end-of-record.
Note: usually Excel doesn't use CRLF to embed line feeds, but only LF (I think it's "0A"x). So, you may not have to use termstr= or the pipe symbol to mark your end-of-record.
I have a Powershell script which runs a query against an Excel file and returns it as a CSV stream. You can use an infile pipe to dynamically read the stream in a data step.
You could modify it to write out your custom termstr character at the end of each line (something like SELECT [*,"|"] FROM [Sheet1$]).
Or, use another technique, such as adding the end-of-record character in your Excel file itself.
If you want to try the Powershell script, I don't have time to support further questions, but Google and the SAS documentation are your friend.
Hope this helps...
09-13-2016 10:03 PM
You are luck. The termstr is CRLF . Others are CR . after I checked by NotePad++ . filename x '/folders/myfolders/comments.csv' termstr=crlf; proc import datafile=x out=want dbms=csv replace;run;
09-13-2016 10:04 PM
You are luck. The termstr is CRLF . Others are CR . after I checked by NotePad++ . filename x '/folders/myfolders/comments.csv' termstr=crlf; proc import datafile=x out=want dbms=csv replace; guessingrows=32767; run;
09-13-2016 10:07 PM
I don't see any attached file.
Why do you say that you cannot read the Excel file? SAS can directly read both XLS and XLSX files.
libname mydata xlsx 'myfile.xlsx'; proc copy inlib=mydata outlib=work; run;
If you have a CSV file generated by Excel then you can probably read it into SAS by making sure to tell SAS to use CRLF as the end of line markers. That way it will read the embedded CR that Excel normally uses for line breaks inside of cells.
data want ; infile 'myfile.csv' dsd truncover lrecl=100000 truncover firstobs=2 termstr=crlf; length col1-col6 $32767; input col1-col6; run;
09-14-2016 08:51 PM
i have successfully able to import the file. now i want to strip the words in text. below code works fine for small datset.
however i have large datset it's taking too much time.. anyone have any suggestions to improve the code. Thanks
count =countw(comments," ");
data have2(drop=count i);
do i=1 to count;
words = scan(comments,i ," ");
09-14-2016 10:24 PM
You can combine these two data step into one : data have2(drop=count i); set have; do i=1 to countw(comments," "); words = scan(comments,i ," "); output; end; run;
09-15-2016 09:50 PM
09-17-2016 12:09 PM
the dataset i have around 500k comments...i want to strip the comments into separate words..then i need to count how many times each word apperead. the code used above is taking too much time and generating around 100gb data..so i want to reduce the time and space.
after some research i came to know that i have intially assigend length for coments variable $5000 so that all comments is loaded proerly into my dataset. i think this is taking my dataset around 100gb
below is the the output i am getting after running the code, lf you see for every word its taking comments ..is there anyway we can improve the code. or how can i know max length of the variable before loading to dataset so that i can assign that length to the variable? or can i get the similar results another way? I hope you understand the issue. Thanks
|I Talked to customer||I|
|I Talked to customer||Talked|
|I Talked to customer||to|
|I Talked to customer||customer|