BookmarkSubscribeRSS Feed
Alexxxxxxx
Pyrite | Level 9

Hello, 

 

I am trying to import excel. file into SAS and set the informat of the variable. by using the following codes,

 

 

/*the result of manual match*/
proc import datafile = "C:\Users\70660\Desktop\manual_check_201910.xls"
		OUT=result_of_manual_match DBMS=XLSX replace;
		Sheet="Sheet1";
		RANGE="A1:I484";
		GETNAMES=YES;
RUN;

proc sql;
		create table result_of_manual_match1 as
		select distinct
		*
		from result_of_manual_match
		;
	quit;

data result_of_manual_match1 ;
	set  result_of_manual_match1 ;
	Note=tranwrd(Note, ';', ','); 
run;

	PROC EXPORT DATA=result_of_manual_match1
		OUTFILE= "C:\Users\70660\Desktop\manual_check_201910.csv"
		DBMS=csv REPLACE;
		delimiter=";";
	RUN;

	data result_of_manual_match2;
		infile "C:\Users\70660\Desktop\manual_check_201910.csv"
		DLM = ";"
		DSD
		missover
		lrecl=32767
		firstobs = 2;
		input
		HRM_L2 :$1000.
		Type :$50.
		STATUS :$50.
		Parant_Company :$1000.
		Type_of_parant_company :$50.
		Status_of_parant_company :$50.
		person_ctry_code :$1000.
		Ref_of_Note :$1000.
		Note :$2000.
		;
	run;

data result_of_manual_match3;
	set result_of_manual_match2;
	if STATUS='RIGHT' then output;
	else if STATUS='NO' and person_ctry_code ^= ' ' then output;
run;

  However, only 39 observations been read. The original data is added in the attachment. Could you please give me some suggestions about this?

7 REPLIES 7
Reeza
Super User

No attachment. Excel will garble your data,have you ensured that your imported data from Excel has the fields and formats you need? For example, if it interpreted a field as numeric, any character variables in that field would have been set to missing.

Alexxxxxxx
Pyrite | Level 9

Hello @Reeza 

the attachment has been added. all variable is character variable. I expect to import variables in excel. file with a specific format.is there any simple code to do it?

Reeza
Super User

No, Excel doesn't enforce types on it's data so you can never rely on it, so all programs that read excel files have to guess at types. 

Even R and Python do this and then you can change after the fact but you have similar issues. 

 

Instead, convert your XLSX to CSV and then you'll have full control. There are VBS scripts here or on my Githuub page that show how to do that. That's the easiest way to simplify your process. 

 

Excel is a bad data entry tool, but also one of the most common...

 


@Alexxxxxxx wrote:

Hello @Reeza 

the attachment has been added. all variable is character variable. I expect to import variables in excel. file with a specific format.is there any simple code to do it?


 

ballardw
Super User

@Alexxxxxxx wrote:

Hello, 

 

I am trying to import excel. file into SAS and set the informat of the variable. by using the following codes,

 

 

  However, only 39 observations been read. The original data is added in the attachment. Could you please give me some suggestions about this?


How many records do you expect to have?

This step in code is very likely to reduce the number records from the imported data:

proc sql;
	create table result_of_manual_match1 as
	select distinct
	*
	from result_of_manual_match
	;
quit;

If the intent of the distinct was not remove duplicate records you will have to explain what it is doing there.

 

Note that habitual reuse of the same data set name when changing values means that it can be extremely difficult to identify which specific step is causing an issue.

 

Have you tried using the Excel File Save As to CSV and reading that instead of this import/export/reread approach?

 

Also I am a tad concerned with possible mismatch of file type and extension:

proc import datafile = "C:\Users\70660\Desktop\manual_check_201910.xls"
		OUT=result_of_manual_match DBMS=XLSX replace;
		Sheet="Sheet1";
		RANGE="A1:I484";
		GETNAMES=YES;
RUN;
Alexxxxxxx
Pyrite | Level 9

hello @ballardw 

 

by using the code the result is 

12266  /*the result of manual match*/
12267  proc import datafile = "C:\Users\70660\Desktop\manual_check_201910.xls"
12268          OUT=result_of_manual_match DBMS=XLSX replace;
12269          Sheet="Sheet1";
12270          RANGE="A1:I484";
12271          GETNAMES=YES;
12272  RUN;

WARNING: Some character data was lost during transcoding in column: Note at obs 38.
WARNING: Some character data was lost during transcoding in column: Note at obs 39.
WARNING: Some character data was lost during transcoding in column: Note at obs 40.
WARNING: Some character data was lost during transcoding in column: Note at obs 41.
WARNING: Some character data was lost during transcoding in column: Note at obs 42.
WARNING: Some character data was lost during transcoding in column: Note at obs 43.
WARNING: Some character data was lost during transcoding in column: Note at obs 97.
WARNING: Some character data was lost during transcoding in column: Note at obs 114.
WARNING: Some character data was lost during transcoding in column: Note at obs 154.
WARNING: Some character data was lost during transcoding in column: Note at obs 324.
WARNING: Some character data was lost during transcoding in column: Note at obs 368.
WARNING: Some character data was lost during transcoding in column: Note at obs 426.
WARNING: Some character data was lost during transcoding in column: Note at obs 451.
NOTE: The import data set has 483 observations and 9 variables.
NOTE: WORK.RESULT_OF_MANUAL_MATCH data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
      real time           0.02 seconds
      cpu time            0.03 seconds


12273
12274  data result_of_manual_match1 ;
12275      set  result_of_manual_match ;
12276      Note=tranwrd(Note, ';', ',');
12277  run;

NOTE: There were 483 observations read from the data set WORK.RESULT_OF_MANUAL_MATCH.
NOTE: The data set WORK.RESULT_OF_MANUAL_MATCH1 has 483 observations and 9 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds


12278
12279      PROC EXPORT DATA=result_of_manual_match1
12280          OUTFILE= "C:\Users\70660\Desktop\manual_check_201910.csv"
12281          DBMS=csv REPLACE;
12282          delimiter=";";
12283      RUN;

12284   /**********************************************************************
12285   *   PRODUCT:   SAS
12286   *   VERSION:   9.4
12287   *   CREATOR:   External File Interface
12288   *   DATE:      22OCT19
12289   *   DESC:      Generated SAS Datastep Code
12290   *   TEMPLATE SOURCE:  (None Specified.)
12291   ***********************************************************************/
12292      data _null_;
12293      %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
12294      %let _EFIREC_ = 0;     /* clear export record count macro variable */
12295      file 'C:\Users\70660\Desktop\manual_check_201910.csv' delimiter=';' DSD DROPOVER
12295! lrecl=32767;
12296      if _n_ = 1 then        /* write column names or labels */
12297       do;
12298         put
12299            "HRM_L2"
12300         ';'
12301            "Type"
12302         ';'
12303            "Status"
12304         ';'
12305            "Parant_Company"
12306         ';'
12307            "Type_of_parant_company"
12308         ';'
12309            "Status_of_parant_company"
12310         ';'
12311            "person_ctry_code"
12312         ';'
12313            "Ref_of_Note"
12314         ';'
12315            "Note"
12316         ;
12317       end;
12318     set  RESULT_OF_MANUAL_MATCH1   end=EFIEOD;
12319         format HRM_L2 $99. ;
12320         format Type $6. ;
12321         format Status $16. ;
12322         format Parant_Company $31. ;
12323         format Type_of_parant_company $6. ;
12324         format Status_of_parant_company $12. ;
12325         format person_ctry_code $2. ;
12326         format Ref_of_Note $9. ;
12327         format Note $1345. ;
12328       do;
12329         EFIOUT + 1;
12330         put HRM_L2 $ @;
12331         put Type $ @;
12332         put Status $ @;
12333         put Parant_Company $ @;
12334         put Type_of_parant_company $ @;
12335         put Status_of_parant_company $ @;
12336         put person_ctry_code $ @;
12337         put Ref_of_Note $ @;
12338         put Note $ ;
12339         ;
12340       end;
12341      if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
12342      if EFIEOD then call symputx('_EFIREC_',EFIOUT);
12343      run;

NOTE: The file 'C:\Users\70660\Desktop\manual_check_201910.csv' is:
      Filename=C:\Users\70660\Desktop\manual_check_201910.csv,
      RECFM=V,LRECL=32767,File Size (bytes)=0,
      Last Modified=23 October 2019 23:38:51 o'clo,
      Create Time=23 October 2019 23:38:51 o'clo

NOTE: 484 records were written to the file 'C:\Users\70660\Desktop\manual_check_201910.csv'.
      The minimum record length was 11.
      The maximum record length was 1393.
NOTE: There were 483 observations read from the data set WORK.RESULT_OF_MANUAL_MATCH1.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


483 records created in C:\Users\70660\Desktop\manual_check_201910.csv from RESULT_OF_MANUAL_MATCH1.


NOTE: "C:\Users\70660\Desktop\manual_check_201910.csv" file was successfully created.
NOTE: PROCEDURE EXPORT used (Total process time):
      real time           0.06 seconds
      cpu time            0.07 seconds


12344
12345      data result_of_manual_match2;
12346          infile "C:\Users\70660\Desktop\manual_check_201910.csv"
12347          DLM = ";"
12348          DSD
12349          missover
12350          lrecl=32767
12351          firstobs = 2;
12352          input
12353          HRM_L2 :$1000.
12354          Type :$50.
12355          STATUS :$50.
12356          Parant_Company :$1000.
12357          Type_of_parant_company :$50.
12358          Status_of_parant_company :$50.
12359          person_ctry_code :$1000.
12360          Ref_of_Note :$1000.
12361          Note :$2000.
12362          ;
12363      run;

NOTE: The infile "C:\Users\70660\Desktop\manual_check_201910.csv" is:
      Filename=C:\Users\70660\Desktop\manual_check_201910.csv,
      RECFM=V,LRECL=32767,File Size (bytes)=71945,
      Last Modified=23 October 2019 23:38:52 o'clo,
      Create Time=23 October 2019 23:38:51 o'clo

NOTE: 40 records were read from the infile "C:\Users\70660\Desktop\manual_check_201910.csv".
      The minimum record length was 25.
      The maximum record length was 348.
NOTE: The data set WORK.RESULT_OF_MANUAL_MATCH2 has 40 observations and 9 variables.
NOTE: DATA statement used (Total process time):
      real time           0.04 seconds
      cpu time            0.01 seconds


12364
12365  data result_of_manual_match3;
12366      set result_of_manual_match2;
12367      if STATUS='RIGHT' then output;
12368      else if STATUS='NO' and person_ctry_code ^= ' ' then output;
12369  run;

NOTE: There were 40 observations read from the data set WORK.RESULT_OF_MANUAL_MATCH2.
NOTE: The data set WORK.RESULT_OF_MANUAL_MATCH3 has 21 observations and 9 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds



The main problem happens when I read csv. file into SAS. I do not know why.

ballardw
Super User

First

Transcoding is the process of converting data from one encoding to another. Transcoding is necessary when the SAS session encoding and the encoding of the data are different. Transcoding is often necessary when you move data between operating environments that use different locales and encoding.

 

Typically that means that you have data that is some format other than expected and possibly a mix of encoding types Unicode, WLatin, and such. So a likely issue is the actual content of your data.

 

I am still not sure what your actual question may be.

 

Did you ever try starting in Excel, do the file  save to CSV and attempt to read that CSV file? Ever?

 

Multiple character sets. Obs 38 through 43 have

Amazon.com, Inc.[6] (/ˈæməzɒn/), is an American multinational technology company based in Seattle, Washington, that focuses on e-commerce, cloud computing, digital streaming, and artificial intelligence.

Some of your data contains Chinese characters and is very likely to be part of your issue related to transcoding.

This would be in OBS 114

China National Offshore Oil Corporation, or CNOOC Group (Chinese: 中国海洋石油总公司 Pinyin: Zhōngguó Háiyáng Shíyóu Zǒnggōngsī), is one of the largest national oil companies in China.

 

With multiple character sets you need spend some time deciding what you expect for your output and may get to spend a lot time correcting things. And simply setting an Informat is not going to fix transcoding.

 

 

Tom
Super User Tom
Super User

Your file works fine for me. Make sure that your SAS sessions is using unicode and not some single byte encoding.

4     proc import datafile="&path.&fname" dbms=xlsx
5       out=test replace
6     ;
7     run;

NOTE: One or more variables were converted because the data type is not supported by the V9 engine. For more
      details, run with options MSGLEVEL=I.
NOTE: The import data set has 483 observations and 9 variables.
NOTE: WORK.TEST data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
      real time           0.25 seconds
      cpu time            0.06 seconds


8
9     data test2(encoding='latin1');
10      set test;
11    run;

NOTE: Data file WORK.TEST2.DATA is in a format that is native to another host, or the file encoding does not match
      the session encoding. Cross Environment Data Access will be used, which might require additional CPU
      resources and might reduce performance.
ERROR: Some character data was lost during transcoding in the dataset WORK.TEST2. Either the data contains
       characters that are not representable in the new encoding or truncation occurred during transcoding.
NOTE: The DATA step has been abnormally terminated.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 38 observations read from the data set WORK.TEST.
WARNING: The data set WORK.TEST2 may be incomplete.  When this step was stopped there were 37 observations and 9
         variables.
NOTE: DATA statement used (Total process time):
      real time           0.17 seconds
      cpu time            0.03 seconds

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 3324 views
  • 0 likes
  • 4 in conversation