Hello,
I am trying to import excel. file into SAS and set the informat of the variable. by using the following codes,
/*the result of manual match*/
proc import datafile = "C:\Users\70660\Desktop\manual_check_201910.xls"
OUT=result_of_manual_match DBMS=XLSX replace;
Sheet="Sheet1";
RANGE="A1:I484";
GETNAMES=YES;
RUN;
proc sql;
create table result_of_manual_match1 as
select distinct
*
from result_of_manual_match
;
quit;
data result_of_manual_match1 ;
set result_of_manual_match1 ;
Note=tranwrd(Note, ';', ',');
run;
PROC EXPORT DATA=result_of_manual_match1
OUTFILE= "C:\Users\70660\Desktop\manual_check_201910.csv"
DBMS=csv REPLACE;
delimiter=";";
RUN;
data result_of_manual_match2;
infile "C:\Users\70660\Desktop\manual_check_201910.csv"
DLM = ";"
DSD
missover
lrecl=32767
firstobs = 2;
input
HRM_L2 :$1000.
Type :$50.
STATUS :$50.
Parant_Company :$1000.
Type_of_parant_company :$50.
Status_of_parant_company :$50.
person_ctry_code :$1000.
Ref_of_Note :$1000.
Note :$2000.
;
run;
data result_of_manual_match3;
set result_of_manual_match2;
if STATUS='RIGHT' then output;
else if STATUS='NO' and person_ctry_code ^= ' ' then output;
run;
However, only 39 observations been read. The original data is added in the attachment. Could you please give me some suggestions about this?
No attachment. Excel will garble your data,have you ensured that your imported data from Excel has the fields and formats you need? For example, if it interpreted a field as numeric, any character variables in that field would have been set to missing.
Hello @Reeza
the attachment has been added. all variable is character variable. I expect to import variables in excel. file with a specific format.is there any simple code to do it?
No, Excel doesn't enforce types on it's data so you can never rely on it, so all programs that read excel files have to guess at types.
Even R and Python do this and then you can change after the fact but you have similar issues.
Instead, convert your XLSX to CSV and then you'll have full control. There are VBS scripts here or on my Githuub page that show how to do that. That's the easiest way to simplify your process.
Excel is a bad data entry tool, but also one of the most common...
@Alexxxxxxx wrote:
Hello @Reeza
the attachment has been added. all variable is character variable. I expect to import variables in excel. file with a specific format.is there any simple code to do it?
@Alexxxxxxx wrote:
Hello,
I am trying to import excel. file into SAS and set the informat of the variable. by using the following codes,
However, only 39 observations been read. The original data is added in the attachment. Could you please give me some suggestions about this?
How many records do you expect to have?
This step in code is very likely to reduce the number records from the imported data:
proc sql; create table result_of_manual_match1 as select distinct * from result_of_manual_match ; quit;
If the intent of the distinct was not remove duplicate records you will have to explain what it is doing there.
Note that habitual reuse of the same data set name when changing values means that it can be extremely difficult to identify which specific step is causing an issue.
Have you tried using the Excel File Save As to CSV and reading that instead of this import/export/reread approach?
Also I am a tad concerned with possible mismatch of file type and extension:
proc import datafile = "C:\Users\70660\Desktop\manual_check_201910.xls" OUT=result_of_manual_match DBMS=XLSX replace; Sheet="Sheet1"; RANGE="A1:I484"; GETNAMES=YES; RUN;
hello @ballardw
by using the code the result is
12266 /*the result of manual match*/
12267 proc import datafile = "C:\Users\70660\Desktop\manual_check_201910.xls"
12268 OUT=result_of_manual_match DBMS=XLSX replace;
12269 Sheet="Sheet1";
12270 RANGE="A1:I484";
12271 GETNAMES=YES;
12272 RUN;
WARNING: Some character data was lost during transcoding in column: Note at obs 38.
WARNING: Some character data was lost during transcoding in column: Note at obs 39.
WARNING: Some character data was lost during transcoding in column: Note at obs 40.
WARNING: Some character data was lost during transcoding in column: Note at obs 41.
WARNING: Some character data was lost during transcoding in column: Note at obs 42.
WARNING: Some character data was lost during transcoding in column: Note at obs 43.
WARNING: Some character data was lost during transcoding in column: Note at obs 97.
WARNING: Some character data was lost during transcoding in column: Note at obs 114.
WARNING: Some character data was lost during transcoding in column: Note at obs 154.
WARNING: Some character data was lost during transcoding in column: Note at obs 324.
WARNING: Some character data was lost during transcoding in column: Note at obs 368.
WARNING: Some character data was lost during transcoding in column: Note at obs 426.
WARNING: Some character data was lost during transcoding in column: Note at obs 451.
NOTE: The import data set has 483 observations and 9 variables.
NOTE: WORK.RESULT_OF_MANUAL_MATCH data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds
12273
12274 data result_of_manual_match1 ;
12275 set result_of_manual_match ;
12276 Note=tranwrd(Note, ';', ',');
12277 run;
NOTE: There were 483 observations read from the data set WORK.RESULT_OF_MANUAL_MATCH.
NOTE: The data set WORK.RESULT_OF_MANUAL_MATCH1 has 483 observations and 9 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
12278
12279 PROC EXPORT DATA=result_of_manual_match1
12280 OUTFILE= "C:\Users\70660\Desktop\manual_check_201910.csv"
12281 DBMS=csv REPLACE;
12282 delimiter=";";
12283 RUN;
12284 /**********************************************************************
12285 * PRODUCT: SAS
12286 * VERSION: 9.4
12287 * CREATOR: External File Interface
12288 * DATE: 22OCT19
12289 * DESC: Generated SAS Datastep Code
12290 * TEMPLATE SOURCE: (None Specified.)
12291 ***********************************************************************/
12292 data _null_;
12293 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
12294 %let _EFIREC_ = 0; /* clear export record count macro variable */
12295 file 'C:\Users\70660\Desktop\manual_check_201910.csv' delimiter=';' DSD DROPOVER
12295! lrecl=32767;
12296 if _n_ = 1 then /* write column names or labels */
12297 do;
12298 put
12299 "HRM_L2"
12300 ';'
12301 "Type"
12302 ';'
12303 "Status"
12304 ';'
12305 "Parant_Company"
12306 ';'
12307 "Type_of_parant_company"
12308 ';'
12309 "Status_of_parant_company"
12310 ';'
12311 "person_ctry_code"
12312 ';'
12313 "Ref_of_Note"
12314 ';'
12315 "Note"
12316 ;
12317 end;
12318 set RESULT_OF_MANUAL_MATCH1 end=EFIEOD;
12319 format HRM_L2 $99. ;
12320 format Type $6. ;
12321 format Status $16. ;
12322 format Parant_Company $31. ;
12323 format Type_of_parant_company $6. ;
12324 format Status_of_parant_company $12. ;
12325 format person_ctry_code $2. ;
12326 format Ref_of_Note $9. ;
12327 format Note $1345. ;
12328 do;
12329 EFIOUT + 1;
12330 put HRM_L2 $ @;
12331 put Type $ @;
12332 put Status $ @;
12333 put Parant_Company $ @;
12334 put Type_of_parant_company $ @;
12335 put Status_of_parant_company $ @;
12336 put person_ctry_code $ @;
12337 put Ref_of_Note $ @;
12338 put Note $ ;
12339 ;
12340 end;
12341 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
12342 if EFIEOD then call symputx('_EFIREC_',EFIOUT);
12343 run;
NOTE: The file 'C:\Users\70660\Desktop\manual_check_201910.csv' is:
Filename=C:\Users\70660\Desktop\manual_check_201910.csv,
RECFM=V,LRECL=32767,File Size (bytes)=0,
Last Modified=23 October 2019 23:38:51 o'clo,
Create Time=23 October 2019 23:38:51 o'clo
NOTE: 484 records were written to the file 'C:\Users\70660\Desktop\manual_check_201910.csv'.
The minimum record length was 11.
The maximum record length was 1393.
NOTE: There were 483 observations read from the data set WORK.RESULT_OF_MANUAL_MATCH1.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
483 records created in C:\Users\70660\Desktop\manual_check_201910.csv from RESULT_OF_MANUAL_MATCH1.
NOTE: "C:\Users\70660\Desktop\manual_check_201910.csv" file was successfully created.
NOTE: PROCEDURE EXPORT used (Total process time):
real time 0.06 seconds
cpu time 0.07 seconds
12344
12345 data result_of_manual_match2;
12346 infile "C:\Users\70660\Desktop\manual_check_201910.csv"
12347 DLM = ";"
12348 DSD
12349 missover
12350 lrecl=32767
12351 firstobs = 2;
12352 input
12353 HRM_L2 :$1000.
12354 Type :$50.
12355 STATUS :$50.
12356 Parant_Company :$1000.
12357 Type_of_parant_company :$50.
12358 Status_of_parant_company :$50.
12359 person_ctry_code :$1000.
12360 Ref_of_Note :$1000.
12361 Note :$2000.
12362 ;
12363 run;
NOTE: The infile "C:\Users\70660\Desktop\manual_check_201910.csv" is:
Filename=C:\Users\70660\Desktop\manual_check_201910.csv,
RECFM=V,LRECL=32767,File Size (bytes)=71945,
Last Modified=23 October 2019 23:38:52 o'clo,
Create Time=23 October 2019 23:38:51 o'clo
NOTE: 40 records were read from the infile "C:\Users\70660\Desktop\manual_check_201910.csv".
The minimum record length was 25.
The maximum record length was 348.
NOTE: The data set WORK.RESULT_OF_MANUAL_MATCH2 has 40 observations and 9 variables.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
cpu time 0.01 seconds
12364
12365 data result_of_manual_match3;
12366 set result_of_manual_match2;
12367 if STATUS='RIGHT' then output;
12368 else if STATUS='NO' and person_ctry_code ^= ' ' then output;
12369 run;
NOTE: There were 40 observations read from the data set WORK.RESULT_OF_MANUAL_MATCH2.
NOTE: The data set WORK.RESULT_OF_MANUAL_MATCH3 has 21 observations and 9 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
The main problem happens when I read csv. file into SAS. I do not know why.
First
Transcoding is the process of converting data from one encoding to another. Transcoding is necessary when the SAS session encoding and the encoding of the data are different. Transcoding is often necessary when you move data between operating environments that use different locales and encoding.
Typically that means that you have data that is some format other than expected and possibly a mix of encoding types Unicode, WLatin, and such. So a likely issue is the actual content of your data.
I am still not sure what your actual question may be.
Did you ever try starting in Excel, do the file save to CSV and attempt to read that CSV file? Ever?
Multiple character sets. Obs 38 through 43 have
Amazon.com, Inc.[6] (/ˈæməzɒn/), is an American multinational technology company based in Seattle, Washington, that focuses on e-commerce, cloud computing, digital streaming, and artificial intelligence. |
Some of your data contains Chinese characters and is very likely to be part of your issue related to transcoding.
This would be in OBS 114
China National Offshore Oil Corporation, or CNOOC Group (Chinese: 中国海洋石油总公司 Pinyin: Zhōngguó Háiyáng Shíyóu Zǒnggōngsī), is one of the largest national oil companies in China. |
With multiple character sets you need spend some time deciding what you expect for your output and may get to spend a lot time correcting things. And simply setting an Informat is not going to fix transcoding.
Your file works fine for me. Make sure that your SAS sessions is using unicode and not some single byte encoding.
4 proc import datafile="&path.&fname" dbms=xlsx 5 out=test replace 6 ; 7 run; NOTE: One or more variables were converted because the data type is not supported by the V9 engine. For more details, run with options MSGLEVEL=I. NOTE: The import data set has 483 observations and 9 variables. NOTE: WORK.TEST data set was successfully created. NOTE: PROCEDURE IMPORT used (Total process time): real time 0.25 seconds cpu time 0.06 seconds 8 9 data test2(encoding='latin1'); 10 set test; 11 run; NOTE: Data file WORK.TEST2.DATA is in a format that is native to another host, or the file encoding does not match the session encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce performance. ERROR: Some character data was lost during transcoding in the dataset WORK.TEST2. Either the data contains characters that are not representable in the new encoding or truncation occurred during transcoding. NOTE: The DATA step has been abnormally terminated. NOTE: The SAS System stopped processing this step because of errors. NOTE: There were 38 observations read from the data set WORK.TEST. WARNING: The data set WORK.TEST2 may be incomplete. When this step was stopped there were 37 observations and 9 variables. NOTE: DATA statement used (Total process time): real time 0.17 seconds cpu time 0.03 seconds
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.