DATA Step, Macro, Functions and more

Writing fixed-width text files with Chinese characters

Reply
Occasional Contributor
Posts: 11

Writing fixed-width text files with Chinese characters

We are using PC SAS 9.3 to create extracts that contain Chinese characters using UTF-8 encoding.  The expected output is a fixed-width text file.  No issues reading or manipulating the data.  We are able to produce the txt file and read it.  All of the columns align until you get to a column that contains Chinese characters (i.e., name field) ... anything after that is out of alignment.

 

My question ... Is there a way to add trailing spaces to the field so that each value is exactly 80 characters?  Any advice would be greatly appreciated.  We have never worked with Chinese data before so this was a surprise.  The output file will be going to the Chinese government and they must have a fixed-width text file with these exact field lengths.  -- Thank you!

 

put

@1 record_type $1. -L

@2 account_number $40. -L

@42 fafc_code $20. -L

@62 account_type $1. -L

@63 org_credit_cd $18. -L

@81 org_code_12_x $10. -L

@91 reg_code_type $2. -L

@93 reg_code $20. -L

@113 nat_tax_reg_11_x $20. -L

@133 local_tax_reg_09_x $20. -L

@153 pboc_acct_appr $20. -L

@173 loan_card_nb_8_x $16. -L

@189 extract_date YYMMDDN8. -L

@197 reserved_field_1 $40. -L

@237 record_type_2 $1. -L

@238 ENT_NATL_LANG_NM $80. -L

@318 dealer_eng_nm $80. -L

Trusted Advisor
Posts: 1,574

Re: Writing fixed-width text files with Chinese characters

what is the full datastep code, including FILENAME statemant, that you are using within the PUT statement posted ?

 

YAs mutch as I know, chinese letters are 2 bytes long, so a string of x chinese characters needs 2x bytes;

 

Please post your full code and the log;

Super User
Posts: 10,041

Re: Writing fixed-width text files with Chinese characters

I don't understand your question.

A Chinese character occupied two bytes that means if you spcify $80. , you can only include 40 chinese character, Couldn't double it as $160. ?

Occasional Contributor
Posts: 11

Re: Writing fixed-width text files with Chinese characters

The government has specific specifications and the field must be 80 characters long.  Currently, our group in China is using another system (that is being decommissioned) to produce a report with the same data so we are confident that 80 is sufficient to contain the values.  We have never worked with Chinese characters before in the U.S. so we are a little inexperienced with the formatting.

Trusted Advisor
Posts: 1,574

Re: Writing fixed-width text files with Chinese characters

[ Edited ]

@acairns you haven't answered my previous post.

How did you code the FILENAME and its options - there you can define the RECFM and LRECL options;

 

To pad the field with spaces you can do:

 

length temp_var  blanks $80;     

format blanks $char80.;

retain blanks '  ';

 

len = length(trim(ENT_NATL_LANG_NM));

temp_var = trim(ENT_NATL_LANG_NM) || substr(blanks,1,80-len);

put @238 temp_var $char80. -L

 

 

 

 

Occasional Contributor
Posts: 11

Re: Writing fixed-width text files with Chinese characters

Here is that section of code:

 

data count (KEEP=REC_CNT);

set twenty_five END=EOF;

retain REC_CNT 0;

file "\\fmc9020101\proj\KM\Library\SAS Code\Master Code\Enterprise Guide\Master Code\China_Wholesale\UC03_OBI\Output_Files\&filename" mod lrecl=8000;

put

@1 record_type $1. -L

@2 account_number $40. -L

@42 fafc_code $20. -L

@62 account_type $1. -L

@63 org_credit_cd $18. -L

@81 org_code_12_x $10. -L

@91 reg_code_type $2. -L

@93 reg_code $20. -L

@113 nat_tax_reg_11_x $20. -L

@133 local_tax_reg_09_x $20. -L

@153 pboc_acct_appr $20. -L

@173 loan_card_nb_8_x $16. -L

@189 extract_date YYMMDDN8. -L

@197 reserved_field_1 $40. -L

@237 record_type_2 $1. -L

@238 ENT_NATL_LANG_NM $80. -L

@318 dealer_eng_nm $80. -L

@398 address $80. -L

@478 CNTRY_CD $3. -L

@481 PBOC_RGN_NM $6. -L

@487 ENT_ORGN_DT YYMMDDN8. -L

@495 BUS_LCNS_EXPIR_DT YYMMDDN8. -L

@503 business_scope $400. -L

@903 currency $3. -L

@906 reg_capital $10. -L

@916 org_type $1. -L

@917 org_class_type $2. -L

@919 class_nat_econ $5. -L

@924 econ_type $2. -L

@926 refresh_date_1 YYMMDDN8. -L

@934 reserved_field_2 $40. -L

@974 record_type_3 $1. -L

@975 dealer_status $1. -L

@976 ent_size_type_cd_x $1. -L

@977 org_status $1. -L

@978 refresh_date_2 YYMMDDN8. -L

@986 reserved_field_3 $40. -L

@1026 record_type_4 $1. -L

@1027 business_address $80. -L

@1107 contact_number $35. -L

@1142 PHN_NB_TX $35. -L

@1177 refresh_date_3 YYMMDDN8. -L

@1185 reserved_field_4 $40. -L

@1225 record_type_5 $1. -L

@1226 ke_bus_cd1 $1. -L

@1227 ke_last_nm1 $80. -L

@1307 ke_ntnl_id1_x $2. -L

@1309 ke_gov_iss1_x $20. -L

@1329 ke_bus_cd2 $1. -L

@1330 ke_last_nm2 $80. -L

@1410 ke_ntnl_id2_x $2. -L

@1412 ke_gov_iss2_x $20. -L

@1432 ke_bus_cd3 $1. -L

@1433 ke_last_nm3 $80. -L

@1513 ke_ntnl_id3_x $2. -L

@1515 ke_gov_iss3_x $20. -L

@1535 ke_bus_cd4 $1. -L

@1536 ke_last_nm4 $80. -L

@1616 ke_ntnl_id4_x $2. -L

@1618 ke_gov_iss4_x $20. -L

@1638 refresh_date_4 YYMMDDN8. -L

@1646 reserved_field_5 $40. -L

@1686 record_type_6 $1. -L

@1687 sh_typ_cd1_x $2. -L

@1689 sh_nm1 $30. -L

@1719 sh_ntnl_id1_x $2. -L

@1721 sh_id_reg1_x $20. -L

@1741 sh_gov_iss1_x $10. -L

@1751 org_credit_cd1_2 $18. -L

@1769 sh_prop1_x $10. -L

@1779 sh_typ_cd2_x $2. -L

@1781 sh_nm2 $30. -L

@1811 sh_ntnl_id2_x $2. -L

@1813 sh_id_reg2_x $20. -L

@1833 sh_gov_iss2_x $10. -L

@1843 org_credit_cd2_2 $18. -L

@1861 sh_prop2_x $10. -L

@1871 sh_typ_cd3_x $2. -L

@1873 sh_nm3 $30. -L

@1903 sh_ntnl_id3_x $2. -L

@1905 sh_id_reg3_x $20. -L

@1925 sh_gov_iss3_x $10. -L

@1935 org_credit_cd3_2 $18. -L

@1953 sh_prop3_x $10. -L

@1963 refresh_date_5 YYMMDDN8. -L

@1971 reserved_field_6 $40. -L

@2011 record_type_7 $1. -L

@2012 af_typ_cd1_x $2. -L

@2014 af_nm1 $30. -L

@2044 type_of_reg_nb1 $2. -L

@2046 af_reg_nb1_x $20. -L

@2066 org_code1 $10. -L

@2076 credit_inst_code1 $18. -L

@2094 share_portion1 $10. -L

@2104 af_typ_cd2_x $2. -L

@2106 af_nm2 $30. -L

@2136 type_of_reg_nb2 $2. -L

@2138 af_reg_nb2_x $20. -L

@2158 org_code2 $10. -L

@2168 credit_inst_code2 $18. -L

@2186 share_portion2 $10. -L

@2196 refresh_date_6 YYMMDDN8. -L

@2204 reserved_field_7 $40. -L;

 

REC_CNT + 1;

IF EOF THEN OUTPUT COUNT;

RUN;

Trusted Advisor
Posts: 1,574

Re: Writing fixed-width text files with Chinese characters

I see that you define the output through FILE statement instead FILENAME. That's possible.

 

Your code:

file "\\fmc9020101\proj\KM\Library\SAS Code\Master Code\Enterprise Guide\Master Code\China_Wholesale\UC03_OBI\Output_Files\&filename" mod lrecl=8000;

 

Maybe add recfm=fb to the options beyond the code to pad trailing spaces to the text.

 

 

 

 

Super User
Posts: 10,041

Re: Writing fixed-width text files with Chinese characters

"The government has specific specifications and the field must be 80 characters long."

That means 80 Chinese Characters? So you need $160. to print it.

Or try $varying. , I am not sure if it could work.

P.S. I stil don't know about your unalign ,can you post a picture ?

 

len=80;

put

........

@238 ENT_NATL_LANG_NM $varying200. len 

Ask a Question
Discussion stats
  • 7 replies
  • 408 views
  • 0 likes
  • 3 in conversation