Hi there - I am not 100% sure - if there is a better community I'm happy to move. Thanks..
rajeshitboys I took you sample dataset and ahs run that with your code using the UE. That one us running sas in utf8 mode.
There ere no errors generated all records are read. Hindi characters and smileys are shown while viewing that dataset.
To achieve this you could run the u8 version on Windows when you have installed that. There will be a dedicated directory for starting that u8 or en version somewhere (just review the installation dir) .
As the source is social media (all are using utf8) his approach looks me valid. I know it is a new one for most SAS people.
You can run those sas versions latin1 and utf8 side by side on the same machine.
There is however an other issue with the data. We should not use the cr/lf while reading the data as record separator. Some/several of those free text posting are also having those chars (yes they are just chars). As SAS input processing is not aware of that It breaks is at those lines. Also notepad++ is doing that. I do not know which program method you used to retrieve that data.
Is it possible to use the string: cr-lf-"1"; as record-separator? When you are really using Unix it would be: lf-"1"; Newline - Wikipedia, the free encyclopedia
Hi Jaap,
Our SAS system also runs in utf8 mode and we do have SAS in Windows server. Could you please share the code, Its really helpful for me.
I am not getting this. ". here will be a dedicated directory for starting that somewhere." If you elaborate it would be grateful.
Thanks
Rajesh Sundarraj
Did is what I have run:
The recoding of format/informat is just a matter of taste. the id Changed to length of 8 (may be needing more), looks to be a counter. It should be equal to the recordnumber in that case.
The last field is hat text field possible getting split up. Back-joined that one again as stopping at the first next good record. Count=10000 for that
Having lengths predefined they will get truncated when the real value is longer (no warning).
The input statement is having a truncover and endfile option. (not reading after last record).
Did not get my fingers on that 0d0a could be get hidden. The adjusted variables are having quotes around them. At the others those quotes are removed.
SAS(R) 9.4 Statements: Reference, Third Edition
44 filename test "/folders/myfolders/test";
45 filename macro "/folders/myfolders/macro";
46
47 filename work "%sysfunc(pathname(work))";
48 filename tmp "/tmp" ;
49 ;
50
51 /* https://communities.sas.com/thread/62151 */
52
53 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
54 data WORK.Test ;
55 infile test("SrBachchan.txt") delimiter = ';' truncOVER DSD lrecl=32767 firstobs=2 end=endfil;
56 informat level $3. id $8.
57 parent_id $3. object_id $20. object_type $6.
58 query_status $15. query_time $28. query_type $23.
59 created_at $32. user_screen_name $17. favorite_count $3. retweet_count $4.
60 entities_hashtags___text $12. entities_user_mentions___name $43. entities_urls___display_url $2.
61 in_reply_to_user_id $11. in_reply_to_screen_name $12. in_reply_to_status_id $2.
62 text $142.
63 ;
64 format level $3. id $8.
65 parent_id $3. object_id $20. object_type $6.
66 query_status $15. query_time $28. query_type $23.
67 created_at $32. user_screen_name $17. favorite_count $3. retweet_count $4.
68 entities_hashtags___text $12. entities_user_mentions___name $43. entities_urls___display_url $2.
69 in_reply_to_user_id $11. in_reply_to_screen_name $12. in_reply_to_status_id $2.
70 text $142.
71 ;
72 input
73 level id
74 parent_id object_id object_type
75 query_status query_time query_type
76 created_at user_screen_name favorite_count retweet_count
77 entities_hashtags___text entities_user_mentions___name entities_urls___display_url
78 in_reply_to_user_id in_reply_to_screen_name in_reply_to_status_id
79 text
80 ;
81 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
82 if (not endfil) then do;
83 input @@;
84 Do while ( _infile_ not =: '"1";' ) ;
85 text=trim(text) || '0D0A'x || trim(_infile_);
86 input / @@;
87 end;
88 end;
89
90 run;
NOTE: The infile library TEST is:
Directory=/folders/myfolders/test,
Owner Name=root,Group Name=root,
Access Permission=drwxrwxrwx,
Last Modified=20 oktober 2014 14:58:17 uur
NOTE: The infile TEST("SrBachchan.txt") is:
Filename=/folders/myfolders/test/SrBachchan.txt,
Owner Name=root,Group Name=root,
Access Permission=-rwxrwxrwx,
Last Modified=20 oktober 2014 09:13:10 uur,
File Size (bytes)=3390532
NOTE: A total of 11795 records were read from the infile library TEST.
The minimum record length was 0.
The maximum record length was 590.
NOTE: 11795 records were read from the infile TEST("SrBachchan.txt").
The minimum record length was 0.
The maximum record length was 590.
NOTE: The data set WORK.TEST has 10000 observations and 19 variables.
NOTE: DATA statement used (Total process time):
real time 0.07 seconds
cpu time 0.08 seconds
Hi Jaap,
If i Convert my file as unicode or txt file then i have no issues to import those. But the problem is, I cannot able to do one by one and I am creating an Automation process for the data flow. So Its really difficult to convert the file one by one. Is there any other way to approach this.
Convert? the source is delivering it in Unicode, that is twitter. That bom indicator is not always there, but is not really needed. As the default for SAS infile is ???
You can code the encoding option at the infile statement SAS(R) 9.4 Statements: Reference, Third Edition it should automatically utf-8 when running that SAS(R) 9.4 Statements: Reference, Third Edition (example 11). The naming of csv or txt as extension should not make any difference. Just my default windows programs are reacting on that.
Hi Jaap,
Actually, If i mention as .txt i am getting an error like file does not exist.
ERROR: Physical file does not exist, D:\Rajesh_sun\POCData\SrBachchan.txt.
But if we save as the file as Txt or Unicode, I hope my problem will resolved. But i need to do that through SAS.
ah, ... I changed the infile statement in the code to run it at UE. It is a Unix type using other conventions and is cases-sensitive.
To avoid the machine type dependicy I have set up a fileref named test that is pointing at the physical location.
In the infile statement you are seeing test("SrBachchan.txt") that could be macrotized. For this single file it is working. As it is Unix I kept the caps equal to what is stored on the system.
If your current dir on you system is: D:\Rajesh_sun\POCData\ and your data is located somewhere else this is just a confusing error message really meaning there was a type or misspelling in the filename
Hi Jaap,
There error is not about the macrotized. The error due to instead of .csv you mentioned as txt. If i convert that csv file to txt(unicode) Its automatically ignored all symbols from the file. Its meet out our requirement. But how we can able to do that through SAS.
The only reason for using txt instead of csv is that I didn't want Excel to open it by default. Running my program with a txt or csv does for me not make any difference (UE).
I cannot find anything special for Windows in SAS for that. SAS(R) 9.4 Companion for Windows, Third Edition.
the remark of macrotizing is for using different/variable naming for your inputfile. I cannot place your notion sas would process that file different on his naming. It would be a new surprise for me around utf-8
Hi Jaap,
Is there any feasible way to resolve my issue. If yes, Please help me.
I was thinking the issue was solved. Running utf8 sas session reading the data helped.
You mentioned the csv txt difference of the input file. Something I do not understand and are not able to replay.
If that is really a problem you could use the rename cm using g the x-cmd option. You know that trick?
Hi Jaap,
I am not aware of x-cmd. Is it possible to convert all the fillename which resides in the folder in Windows. If yes, Could you please share the command.
Thanks
The xcmd is the trick to used OS commands like the rename one under Windows.
For base SAS at windows it is open again (9.3). For BI/DI workspace servers and others this is admin role.
By default SAS institute is assuming there are bad sas installations and just jerks users working at those, so they made the assumption to disable that.
Filename wincmd pipe "rename D:\test\*.cvs *.txt " ; /* adjust command as needed */
data _null_ ;
infile wincmd ; input ; put _infile_ ;
run;
Hi Jaap,
Thanks its really helpful
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.