BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PavelD
Obsidian | Level 7

Hello!

Since few days ago I have a problem with a SQL procedure.

Basically I have two identical SQL procedures, which differ only in one dimension - period.

 

The SQL procedures below create two tables: test_A (period 201601-201712) and test_B (period 201712).

There is an error produced only for the big table (test_A):

ERROR: Invalid value for width specified - width out of range

 

When I remove the NOSIG dimension, the error disappears. Thus I suspect it comes from the NOSIG column, which is a concatenation of two other columns (e.g. it creates B002 out of "B" and 2 or B182 out of "B" and 182). But I do not understand why it only concerns the bigger table. The ENTITE is always one letter (e.g. "B", "W") and t1.NOSIG is always a number from 1 to 400. Why would the error only show up in the bigger table? Could it be a problem of format which I did not specify for the created NOSIG column?

 

I tried to Google a bit on this error but found very little info, hence it is hard to guess where it really comes from. Anyway, after filtering on period=201712 from both tables, the data are not the same. Only the test_B (the smaller table) contains correct and full data.

 

Many thanks in advance for any ideas!

 

PROC SQL /* long period */ ;
	CREATE TABLE ISP_lib.test_A (compress=yes) AS
		SELECT 	
				CATX("", t1.ENTITE, put(t1.NOSIG, z3.)) AS NOSIG, 
				CATX(".", CATX("", t1.SERVICE, t1.NOTAB), t1.SSTAB) AS TAB, 
				t1.PERIODE, 
				t1.COLONNE,
				 
				case /* VALEUR en EUR needs conversion but not VOLUME */
					when t1.COLONNE in ("VOLU", "VOTI", "VOCL") then t1.valeur 		/* volume=valeur */
					when t1.COLONNE in ("VALE", "FLOA") then (t1.valeur/t1.taux) 	/* valeur eur=  valeur / taux */
				end as val,
				t1.RUB 									as Dim1,
				t1.PAYS 								as Dim2,
				t1.DEVISE 							as Dim3,
				t1.SECTEUR 							as Dim4,
				scan(t1.LIGNE, 5, "-") 	as Dim5,
				scan(t1.LIGNE, 6, "-") 	as Dim6,
				scan(t1.LIGNE, 7, "-") 	as Dim7
				
	FROM OLMBSPRD.V_DATA_ENTITE t1 /* name of the database */
		WHERE 
					t1.LIVRE = 'O' 
			AND t1.SERVICE = 'V' 
			AND t1.ETAT = 'En production'
			AND t1.NOTAB = "1.5" 
			AND t1.PERIODE between 201601 and 201712
;
QUIT;

PROC SQL /* short period */;
	CREATE TABLE ISP_lib.test_B (compress=yes) AS
		SELECT 	
				CATX("", t1.ENTITE, put(t1.NOSIG, z3.)) AS NOSIG, 
				CATX(".", CATX("", t1.SERVICE, t1.NOTAB), t1.SSTAB) AS TAB, 
				t1.PERIODE, 
				t1.COLONNE,
				 
				case /* VALEUR en EUR needs conversion but not VOLUME */
					when t1.COLONNE in ("VOLU", "VOTI", "VOCL") then t1.valeur 		/* volume=valeur */
					when t1.COLONNE in ("VALE", "FLOA") then (t1.valeur/t1.taux) 	/* valeur eur=  valeur / taux */
				end as val,
				t1.RUB 									as Dim1,
				t1.PAYS 								as Dim2,
				t1.DEVISE 							as Dim3,
				t1.SECTEUR 							as Dim4,
				scan(t1.LIGNE, 5, "-") 	as Dim5,
				scan(t1.LIGNE, 6, "-") 	as Dim6,
				scan(t1.LIGNE, 7, "-") 	as Dim7
				
	FROM OLMBSPRD.V_DATA_ENTITE t1 /* name of the database */
		WHERE 
					t1.LIVRE = 'O' 
			AND t1.SERVICE = 'V' 
			AND t1.ETAT = 'En production'
			AND t1.NOTAB = "1.5" 
			AND t1.PERIODE = 201712
;
QUIT;
1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

I would check a couple of things. 

First, start a new session, then run your code.  Errors have a tendancy to follow over into the next run, so you could see an error from previous submit impacts your current run.

Second, is it possible that in t1 or t2 the number isn't 3 decimals maximum?  Numbers can appear to be one thing - by the use of formats for instance - or could have a very tiny decimal part (0.000000000013 for instance), this is down to how the computer stores numbers and can happen after divisions and such like.  Maybe add:

cats(t1.entite,put(int(t1.nosig),z3.)) as nosig,

Or round() function.  Its pretty difficult for us to debug without seeing example data which causes the issue, SAS is a data based programming language. 

View solution in original post

9 REPLIES 9
RW9
Diamond | Level 26 RW9
Diamond | Level 26

First off, you are using the wrong concatenation function:
catx() expects a character in the first paramter, then other strings after that.  Use the appropriate function:

cats(t1.entite,put(t1.nosig,z3.)) as nosig,

Note how I don't code ALL IN UPPERCASE as that would be shouting and make your code far harder to read!

You can try that, but without test data in the form of a datastep, for data which causes your error, we can't debug it.  See:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

 

only need a row or two just to simulate the error you get.

PavelD
Obsidian | Level 7
Hello RW9. Many thanks, all noted. Run with cats instead of catx and the thing returned no error!

I would be happy to post sample data but, as explained, the error only occured in the big dataset, for unclear reasons. Sharing the complete data (even if technically possible) is out of question due to confidentiality.

So I guess the main observation, at least for now, is that CATX, if used wrongly without the concat paramater, still works as expected for small tables but not for large tables. Given how much time I spent trying to figure out what the problem was, this is a ridiculous revelation 🙂 Thank you!

I will run few more tests to ensure the CATX was the main culprit and if ok I will mark your answer as a solution.
FreelanceReinh
Jade | Level 19

Also, if you know that NOSIG will only contain strings of length 4, why don't you specify this or another reasonable length for this variable (e.g. ... as nosig length=4)? Currently, you let SAS assign the default length 200 (!) of results of the CATX or CATS function and then you try to counteract this waste of disk space by using the compress=yes dataset option.

PavelD
Obsidian | Level 7

Thanks, @FreelanceReinh, another valid point. I admit I never paid much attention to specifying LENGTH for the columns.

 

So I changed the SQL procs - CATX replaced by CATS where applicable, and specified LENGTH correctly for all columns.

Now both datasets, when filtered on 201712, yield the same data in total.

That is already good news! Still, the error persists... but this time the large database had correct NOSIG column but not the small dataset. See picture below (actual numeric data removed). Confusing!! As you can see sometimes it is just the first part of concatenation that was done.

 

It seems the results change every time I run the code. Sometimes the error shows for the big database, sometimes for the smaller one, sometimes for both. 

 

sas_compared.png

 

Here, part of the last log:

ERROR: Invalid value for width specified - width out of range

NOTE: Compressing data set ISP_LIB.TEST_B increased size by 5.00 percent.

Compressed is 42 pages; un-compressed would require 40 pages.

NOTE: Table ISP_LIB.TEST_B created, with 40254 rows and 12 columns.

100 QUIT;

NOTE: At least one W.D format was too small for the number to be printed. The decimal may be shifted by the "BEST" format.

NOTE: PROCEDURE SQL used (Total process time):

real time 1:36.50

cpu time 3.84 seconds

RW9
Diamond | Level 26 RW9
Diamond | Level 26

I would check a couple of things. 

First, start a new session, then run your code.  Errors have a tendancy to follow over into the next run, so you could see an error from previous submit impacts your current run.

Second, is it possible that in t1 or t2 the number isn't 3 decimals maximum?  Numbers can appear to be one thing - by the use of formats for instance - or could have a very tiny decimal part (0.000000000013 for instance), this is down to how the computer stores numbers and can happen after divisions and such like.  Maybe add:

cats(t1.entite,put(int(t1.nosig),z3.)) as nosig,

Or round() function.  Its pretty difficult for us to debug without seeing example data which causes the issue, SAS is a data based programming language. 

PavelD
Obsidian | Level 7

@RW9, @FreelanceReinh, you are both right.

 

To my great horror, it turned out that t1.nosig (the numeric part in the concatenation) can be, under some strange circumstances, a complete nonsense (picture):

 

negative_nosig.png

Only numbers from 1 to 400 are expected for t1.nosig.

What is worse, as mentioned before, these nonsensical numbers appear to show only after longer-running SQL queries (or in larger output tables, perhaps). What is even worse, the number of wrong observations and the nosig numbers are different each time the (same) query is run.

 

But that is perhaps a different topic. Your suggestions helped me to treat the original problem, which was the ERROR in the log.

Thank you!

 

EDIT: As I said I cannot share the data with you, apologies. Since the error only appears in large datasets, it would not make much sense to create some kind of toy dataset with hundreds of thousands of lines.. or would it?

FreelanceReinh
Jade | Level 19

@PavelD,

 

Your "nonsense" NOSIG values remind me of a bug in PROC SQL which I stumbled across last year (where, for example, the value 1 turned into -311E229, see this post). It may be something else, but do any of the variables involved in your data have length <8 bytes by any chance? Values "known" to range only from 1 to 400 would normally "qualify" for such a length reduction ...

PavelD
Obsidian | Level 7

That's an interesting bug. To be honest, I don't know if it is related. I am querying some kind of an SQL view (OLMBSPRD.V_DATA_ENTITE). I see neither the definition of the view, nor the database behind. So it is a bit of black box. I do not use any subqueries in my code. I tried to shorten t1.nosig but I get strange results for unshortened nosig as well.

 

A priori, I simply do not know whether the query will give me correct numbers or not. Sometimes it runs without issues and sometimes it does not. On session restart, it always runs correctly. On repeated runs, it sometimes gives incorrect numbers. I don't know how else to test under which conditions it yields wrong results. I will try to do more tests today.

FreelanceReinh
Jade | Level 19

For example, negative 3-digit integers would lead to the combination of blank numeric parts of NOSIG, the error message and the notorious "W.D format" note that you observed.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 10893 views
  • 5 likes
  • 3 in conversation