BookmarkSubscribeRSS Feed
Community_Help
SAS Employee

Hi there - I am not 100% sure - if there is a better community I'm happy to move. Thanks.. Smiley Happy

jakarman
Barite | Level 11

rajeshitboys   I took you sample dataset and ahs run that with your code using the UE. That one us running sas in utf8 mode.

There ere no errors generated all records are read. Hindi characters and smileys are shown while viewing that dataset.

To achieve this you could run the u8 version on Windows when you have installed that. There will be a dedicated directory for starting that u8 or en version somewhere (just review the installation dir) .

As the source is social media (all are using utf8) his approach looks me valid. I know it is a new one for most SAS people.

You can run those sas versions latin1 and utf8 side by side on the same machine.

There is however an other issue with the data. We should not use the cr/lf while reading the data as record separator. Some/several of those  free text posting are also having those chars (yes they are just chars).  As SAS input processing is not aware of that It breaks is at those lines. Also notepad++ is doing that.  I do not know which program method you used to retrieve that data.

Is it possible to use the string:  cr-lf-"1";    as record-separator? When you are really using Unix it would be:  lf-"1";    Newline - Wikipedia, the free encyclopedia        

---->-- ja karman --<-----
rajeshitboys
Calcite | Level 5

Hi Jaap,

Our SAS system also runs in utf8 mode and we do have SAS in Windows server. Could you please share the code, Its really helpful for me. 

I am not getting this. ". here will be a dedicated directory for starting that somewhere." If you elaborate it would be grateful.

Thanks

Rajesh Sundarraj

jakarman
Barite | Level 11

Did is what I have run:

The recoding of format/informat is just a matter of taste. the id Changed to length of 8 (may be needing more), looks to be a counter. It should be equal to the recordnumber in that case.

The last field is hat text field possible getting split up. Back-joined that one again as stopping at the first next good record. Count=10000 for that

Having lengths predefined they will get truncated when the real value is longer (no warning). 

The input statement is having a truncover and endfile option. (not reading after last record).

Did not get my fingers on that 0d0a could be get hidden. The adjusted variables are having quotes around them. At the others those quotes are removed.

SAS(R) 9.4 Statements: Reference, Third Edition

44          filename test "/folders/myfolders/test";

45          filename macro "/folders/myfolders/macro";

46        

47          filename work "%sysfunc(pathname(work))";

48          filename tmp  "/tmp" ;

49          ;

50        

51          /* https://communities.sas.com/thread/62151 */

52        

53          %let _EFIERR_ = 0; /* set the ERROR detection macro variable */

54          data WORK.Test    ;

55          infile test("SrBachchan.txt") delimiter = ';' truncOVER DSD lrecl=32767 firstobs=2 end=endfil;

56          informat level $3.        id $8.

57                 parent_id $3.      object_id $20.        object_type $6.

58                 query_status $15.  query_time $28.       query_type $23.

59                 created_at $32.    user_screen_name $17. favorite_count $3.       retweet_count $4.

60                 entities_hashtags___text $12.      entities_user_mentions___name $43.  entities_urls___display_url $2.

61                 in_reply_to_user_id $11.           in_reply_to_screen_name $12.        in_reply_to_status_id $2.

62                 text $142.

63           ;

64           format level $3.         id $8.

65                 parent_id $3.      object_id $20.        object_type $6.

66                 query_status $15.  query_time $28.       query_type $23.

67                 created_at $32.    user_screen_name $17. favorite_count $3.       retweet_count $4.

68                 entities_hashtags___text $12.      entities_user_mentions___name $43.  entities_urls___display_url $2.

69                 in_reply_to_user_id $11.           in_reply_to_screen_name $12.        in_reply_to_status_id $2.

70                 text $142.

71           ;

72           input

73                 level              id

74                 parent_id          object_id              object_type

75                 query_status       query_time             query_type

76                 created_at         user_screen_name       favorite_count          retweet_count

77                 entities_hashtags___text           entities_user_mentions___name        entities_urls___display_url

78                 in_reply_to_user_id                in_reply_to_screen_name              in_reply_to_status_id

79                 text

80              ;

81             if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */

82             if (not endfil) then do;

83               input @@;

84               Do while ( _infile_ not =: '"1";' ) ;

85                 text=trim(text) || '0D0A'x ||  trim(_infile_);

86               input / @@;

87               end;

88             end;

89        

90           run;

NOTE: The infile library TEST is:

       Directory=/folders/myfolders/test,

       Owner Name=root,Group Name=root,

       Access Permission=drwxrwxrwx,

       Last Modified=20 oktober 2014 14:58:17 uur

NOTE: The infile TEST("SrBachchan.txt") is:

       Filename=/folders/myfolders/test/SrBachchan.txt,

       Owner Name=root,Group Name=root,

       Access Permission=-rwxrwxrwx,

       Last Modified=20 oktober 2014 09:13:10 uur,

       File Size (bytes)=3390532

NOTE: A total of 11795 records were read from the infile library TEST.

       The minimum record length was 0.

       The maximum record length was 590.

NOTE: 11795 records were read from the infile TEST("SrBachchan.txt").

       The minimum record length was 0.

       The maximum record length was 590.

NOTE: The data set WORK.TEST has 10000 observations and 19 variables.

NOTE: DATA statement used (Total process time):

       real time           0.07 seconds

       cpu time            0.08 seconds

---->-- ja karman --<-----
rajeshitboys
Calcite | Level 5

Hi Jaap,

If i Convert my file as unicode or txt file then i have no issues to import those. But the problem is, I cannot able to do one by one and I am creating an Automation process for the data flow. So Its really difficult to convert the file one by one. Is there any other way to approach this.

jakarman
Barite | Level 11

Convert? the source is delivering it in Unicode, that is twitter. That bom indicator is not always there, but is not really needed. As the default for SAS infile is ???

You can code the encoding option at the infile statement SAS(R) 9.4 Statements: Reference, Third Edition it should automatically utf-8 when running that SAS(R) 9.4 Statements: Reference, Third Edition (example 11). The naming of csv or txt as extension should not make any difference. Just my default windows programs are reacting on that.  

---->-- ja karman --<-----
rajeshitboys
Calcite | Level 5

Hi Jaap,

Actually, If i mention as .txt i am getting an error like file does not exist.

ERROR: Physical file does not exist, D:\Rajesh_sun\POCData\SrBachchan.txt.

But if we save as the file as Txt or Unicode, I hope my problem will resolved. But i need to do that through SAS.

jakarman
Barite | Level 11

ah, ... I changed the infile statement in the code to run it at UE. It is a Unix type using other conventions and is cases-sensitive.

To avoid the machine type dependicy I have set up a fileref named test that is pointing at the physical location.

In the infile statement you are seeing   test("SrBachchan.txt")    that could be macrotized. For this single file it is working. As it is Unix I kept the caps equal to what is stored on the system.
If your current dir on you system is:   D:\Rajesh_sun\POCData\   and your data is located somewhere else this is just a confusing error message really meaning there was a type or misspelling in the filename      

---->-- ja karman --<-----
rajeshitboys
Calcite | Level 5

Hi Jaap,

There error is not about the macrotized. The error due to instead of .csv you mentioned as txt. If i convert that csv file to txt(unicode) Its automatically ignored all symbols from the file. Its meet out our requirement. But how we can able to do that through SAS.

jakarman
Barite | Level 11

The only reason for using txt instead of csv is that I didn't want Excel to open it by default. Running my program with a txt or csv does for me not make any difference (UE).

I cannot find anything special for Windows in SAS for that. SAS(R) 9.4 Companion for Windows, Third Edition.    

the remark of macrotizing is for using different/variable naming for your inputfile. I cannot place your notion sas would process that file different on his naming. It would be a new surprise for me around utf-8    

---->-- ja karman --<-----
rajeshitboys
Calcite | Level 5

Hi Jaap,

Is there any feasible way to resolve my issue. If yes, Please help me.

jakarman
Barite | Level 11

I was thinking the issue was solved. Running utf8 sas session reading the data helped.

You mentioned the csv txt difference of the input file.  Something I do not understand and are not able to replay.

If that is really a problem you could use the rename cm using g the x-cmd option. You know that trick?

---->-- ja karman --<-----
rajeshitboys
Calcite | Level 5

Hi Jaap,

I am not aware of x-cmd. Is it possible to convert all the fillename which resides in the folder in Windows. If yes, Could you please share the command.

Thanks

jakarman
Barite | Level 11

The xcmd is the trick to used OS commands like the rename one under Windows.

For base SAS at windows it is open again (9.3). For BI/DI workspace servers and others this is admin role.

By default SAS institute is assuming there are bad sas installations and just jerks users working at those, so they made the assumption to disable that.

Filename wincmd pipe "rename D:\test\*.cvs *.txt " ;   /* adjust command as needed */

data _null_ ;

infile wincmd ; input ; put _infile_ ;

run;

---->-- ja karman --<-----
rajeshitboys
Calcite | Level 5

Hi Jaap,

Thanks its really helpful

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 29 replies
  • 12266 views
  • 3 likes
  • 6 in conversation