All -
I've seen here in Communities that this error happens when using PROC EXPORT and when reading from an Excel file using LIBNAME, but I'm trying to write to Excel.
Note that the Excel file was created and that I can open it. The problem is that the shl script I use to run the program will send a "Job ended in error" email to the user.
Any ideas? Thanks!
LOG Snippet:
NOTE: Libref XLOUT was successfully assigned as follows:
Engine: XLSX
Physical Name: /fmacdata/utility/npa/dev/Profiling/Monthly/Output/Profiling_1910_1936.xlsm
3217 /*****************************************************************************/
3218 /* Send the profile data to XL */
3219 /*****************************************************************************/
3220
3221 data xlout.NUM_ALL;
3222 length id_bus_execn $9.;
3223 set fat.NUM_ALL_&ID_PRIOR_EXECN
3224 fat.NUM_ALL_&id_bus_execn
3225 fat.rolling_&benchmark.mos;
3226 run;
ERROR: string is duplicated |><t/>|
NOTE: There were 15876 observations read from the data set FAT.NUM_ALL_1910.
NOTE: There were 16200 observations read from the data set FAT.NUM_ALL_1936.
NOTE: There were 16848 observations read from the data set FAT.ROLLING_12MOS.
NOTE: The data set XLOUT.NUM_ALL has . observations and 17 variables.
NOTE: The export data set has 48924 observations and 17 variables.
NOTE: DATA statement used (Total process time):
real time 3.46 seconds
cpu time 3.39 seconds
I stumbled onto what I think is the solution and thought I'd share it - very interesting and worth thinking about if you get the same error.
Log snippet (from above):
3221 data xlout.NUM_ALL;
3222 length id_bus_execn $9.;
3223 set fat.NUM_ALL_&ID_PRIOR_EXECN
3224 fat.NUM_ALL_&id_bus_execn
3225 fat.rolling_&benchmark.mos;
3226 run;
All three datasets have a column, disclosed_element, that is created by putting a column named variable into a format I created $LLD. put(variable,$LLD.). For the green datasets, $LLD. translates variable into a Yes/No flag; for the red dataset (from an earlier iteration of development), $LLD. translates variable into verbiage, for example: NET SALES PROCEEDS.
When I went back and updated the red dataset so that it also had disclosed_element as a Yes/No flag, the problem went away.
Not sure what's going on here under the covers and I found the solution by accident. 😕
Is the FAT libref using the XLSX libname engine also?
Can you tell which of the three datasets is generating the error? Try reading them individually instead of all at once. Try writing them individually to the separate sheets. Try to isolate which record is causing the issue. Or which variable? Or perhaps the individual value?
What encoding is the data using?
What encoding is your SAS session using?
Can you create a minimal example dataset that replicates the error?
Hi, Tom!
FAT is a SAS library and the LIBNAME points to a Unix directory - it's not using the XLSX engine.
I will try reading the datasets one at a time and see if that tells me anything.
Not sure how to see what encoding either the data or SAS session are using?
Thanks!
You can open a ticket with SAS support and they might be able to help.
You can try using binary search to see if you can isolate the data point that is causing the issue.
Try each dataset individually. If one produces the issue then concentrate on that.
Divide the data in half and try each half. Concentrate on the that produces the error. Repeat until you have the point.
You can divide by observations using FIRSTOBS= and OBS= dataset options.
You can divide by variables using the KEEP= or DROP= dataset options.
This error is reported as a bug in SAS 9.4M3: 62655 - "ERROR: string is duplicated" occurs when you use the EXPORT procedure with the XLSX engine ...
What SAS version are you using?
If you are using SAS 9.4M3 or M4 then I'd say its the same problem. You can always check this with SAS Tech Support.
Try PROC EXPORT as well and see if you get the same error. If so it will be the same bug.
I stumbled onto what I think is the solution and thought I'd share it - very interesting and worth thinking about if you get the same error.
Log snippet (from above):
3221 data xlout.NUM_ALL;
3222 length id_bus_execn $9.;
3223 set fat.NUM_ALL_&ID_PRIOR_EXECN
3224 fat.NUM_ALL_&id_bus_execn
3225 fat.rolling_&benchmark.mos;
3226 run;
All three datasets have a column, disclosed_element, that is created by putting a column named variable into a format I created $LLD. put(variable,$LLD.). For the green datasets, $LLD. translates variable into a Yes/No flag; for the red dataset (from an earlier iteration of development), $LLD. translates variable into verbiage, for example: NET SALES PROCEEDS.
When I went back and updated the red dataset so that it also had disclosed_element as a Yes/No flag, the problem went away.
Not sure what's going on here under the covers and I found the solution by accident. 😕
So was the fix to the content of the variable? What content did it have before instead of Yes/No? Did you try just writing another file with that content to XLSX engine? Did it cause the same issue?
Or was the fix to something about the structure of the dataset? The length of the variable? The display format attached to it?
One thing to look at is to look at the lengths of that variable in the three input datasets and whether that variable has formats attached to it also. So when you combine multiple datasets like that SAS will set the type and storage length based on how it first appears. It will attach the first non-empty format that it sees to the variable.
Here are some of the problems that can cause.
If the first instance has a shorter length than the others then the values could get truncated.
If the first format attached is shorter than the first variables length then the printing of the values could be truncated, even if the stored values are not. I am not sure if the format attached to a variable impacts how the variable is converted into XML to store into the XLSX file when using the XLSX libname engine, but if if causes the engine to generate invalid XML that might be the source of that error message.
The answers to your questions are, for the most part, in my post.
@PatMalarkey wrote:
The answers to your questions are, for the most part, in my post.
Not really. All you said is that you change the format used in the PUT() statement you used to populate the value of the variable. Not much about how the actual variable are defined. We can assume they are character, but we don't know the length or if any formats where attached to them. We don't really know that values changes.
Glad you got it to work, but I am not sure if we really know what it is we need to avoid to prevent it from happening to others.
Like I said I am glad your problem is gone. But I have not learned anything about how to avoid the issue. I cannot recreate the issue from your description. So I was hoping you might have the time to research it more so that others could benefit from what you discover.
828 libname out clear; NOTE: Libref OUT has been deassigned. 829 data one; 830 x='NET SALES PROCEEDS'; 831 run; NOTE: The data set WORK.ONE has 1 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 832 data two; 833 x='yes'; 834 run; NOTE: The data set WORK.TWO has 1 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 835 data three; 836 x='no'; 837 run; NOTE: The data set WORK.THREE has 1 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 838 839 libname out xlsx 'c:\downloads\test_strings.xlsx'; NOTE: Libref OUT was successfully assigned as follows: Engine: XLSX Physical Name: c:\downloads\test_strings.xlsx 840 841 data out.sheet1; 842 set one two three; 843 run; NOTE: There were 1 observations read from the data set WORK.ONE. NOTE: There were 1 observations read from the data set WORK.TWO. NOTE: There were 1 observations read from the data set WORK.THREE. NOTE: The data set OUT.sheet1 has 3 observations and 1 variables. NOTE: The export data set has 3 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.06 seconds cpu time 0.03 seconds 844 845 data out.sheet2; 846 set three two one; 847 run; WARNING: Multiple lengths were specified for the variable x by input data set(s). This can cause truncation of data. NOTE: There were 1 observations read from the data set WORK.THREE. NOTE: There were 1 observations read from the data set WORK.TWO. NOTE: There were 1 observations read from the data set WORK.ONE. NOTE: The data set OUT.sheet2 has 3 observations and 1 variables. NOTE: The export data set has 3 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.05 seconds cpu time 0.03 seconds 848 849 libname out clear; NOTE: Libref OUT has been deassigned.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.