SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Warning in DI Studio

Reply
Contributor
Posts: 58

Warning in DI Studio

When I run a JOB in DI the following warning appears:

- WARNING: Variable FLAG has different lengths on BASE and DATA files (BASE 3 DATA 8).    

Anyone know how to resolve this warning?

This field is declared in the table as numeric (3).

I have physically deleted the table and warning continues.

Respected Advisor
Posts: 3,799

Re: Warning in DI Studio

Posted in reply to DavidCaliman

Maybe you need to do something to the DATA file so that it will have FLAG with length 3 also.

Regular Contributor
Posts: 180

Re: Warning in DI Studio

Posted in reply to DavidCaliman

Please provide more information:

What transformation are you using?

Are you using SAS Tables? Oracle?

Contributor
Posts: 58

Re: Warning in DI Studio

CTorres,

I´m using SCD Type 1 Loader.

My target is as SAS Table.

The value for FLAG is 0 or 1.

Regular Contributor
Posts: 180

Re: Warning in DI Studio

Posted in reply to DavidCaliman

Well, I dont know much about SCD Type 1 Loader but the Warning seems to be produced by a Proc Append with Force.

The following small SAS program generates the same Warning message:

data base;

  length flag 3;

  do flag=1 to 5;

    output;

  end;

run;

data data;

  length flag 8;

  do flag=6 to 8;

    output;

  end;

run;

proc append base=base data=data force;

run;

17   data base;
18     length flag 3;
19     do flag=1 to 5;
20       output;
21     end;
22   run;

NOTE: The data set WORK.BASE has 5 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


23   data data;
24     length flag 8;
25     do flag=6 to 8;
26       output;
27     end;
28   run;

NOTE: The data set WORK.DATA has 3 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


29   proc append base=base data=data force;
30   run;

NOTE: Appending WORK.DATA to WORK.BASE.
WARNING: Variable flag has different lengths on BASE and DATA files (BASE 3 DATA 8).
NOTE: FORCE is specified, so dropping/truncating will occur.
NOTE: There were 3 observations read from the data set WORK.DATA.
NOTE: 3 observations added.
NOTE: The data set WORK.BASE has 8 observations and 1 variables.
NOTE: PROCEDURE APPEND used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

look at the code generated by the transformation and If this is the case then you should follow the suggestion given by data_null_;

Regards,

Respected Advisor
Posts: 4,173

Re: Warning in DI Studio

Posted in reply to DavidCaliman

Even though it's possible I wouldn't use anything else than 8 bytes for a numeric SAS variable. If it's a flag then you could define the column instead as character with a length of '1'.

You shouldn't get the warning if:

- The length of the flag variable on metadata level is '3' from source to target

- You delete the underlying physical target table so that it gets recreated with the variable attributes as defined in the metadata target table.

If the warning doesn't go away: Investigate the code generated by the transformation, especially the bit which creates the target variable. If there shouldn't be a length statement for the flag variable then it would get created with a length of '8'.

PROC Star
Posts: 1,167

Re: Warning in DI Studio

I disagree. I have had excellent experience using shorter SAS numeric variables for storing large numbers of small integers. However, you MUST know what you're doing.

Tom

Trusted Advisor
Posts: 3,214

Re: Warning in DI Studio

Posted in reply to DavidCaliman

@Tom i disagree with you for the freedom of length of numeric variables. See: 41214 - Observation length, alignment, and padding of a SAS data set

and the other is SAS(R) 9.4 Language Reference: Concepts, Second Edition (numeric precision) as most people even do not understand the basic mathematics on that. (using partially 2's complemen). By not understanding those there are somtimes failures in expectations.

The worst thing I have seen is moving data between mainframe and Windows and losing the precision using with 4 digits.

As it were region-numbers not even to be meany to do calculations on they would be better character defined.

There is already a common mistake account-numbers gender (F/M using as 0-1) is a misperception coming from the hollerith-card still being common practic. But the IBAN SEPA banking-number is a joke for numbers International Bank Account Number - Wikipedia, the free encyclopedia containing letters. Calculating mod-97 is requiring handling precision wiht 30-digits  (decimal not binary). Proc DS2 for the rescue.  No flaoting gpu or whatever.     

---->-- ja karman --<-----
PROC Star
Posts: 1,167

Re: Warning in DI Studio

Hi, and

Thank you for the reference to the SAS Note...very interesting!

And yes, you're pretty much correct on everything else in your note. I too had the "joy" of transferring large numbers of numeric variables between mainframe, Unix, and Windows. What fun!

How about if I amend my comment to the following:

I believe that there are circumstances in which using shorter SAS numeric variables can be very beneficial, such as instances of storing large numbers of small integers. However, before doing so ensure that you are VERY familiar with the underlying technologies, and the implications of doing this. If you're uncertain, use full length numerics.

Tom

Trusted Advisor
Posts: 3,214

Re: Warning in DI Studio

Posted in reply to DavidCaliman

Agree with that one.
And to add: do not mix up numbers (floating) wiht characters being limited to a dedecicated range as those are constraints

---->-- ja karman --<-----
PROC Star
Posts: 1,167

Re: Warning in DI Studio

A topic to have some fun with!

Starting with the black and white ends of the spectrum:

  • A wavelength of light;
  • The mass of an object;

Both numbers, with potentially infinite numbers of digits and infinite numbers of decimal places, to the point of measurement error.

  • Gender, coded as 1 for female, 2 for male;
  • "Do you like the president?", coded as 1 for no, 2 for yes.

Classifications with values being limited to two; the fact that they are integers is only coincidental, they could also be alphabetic, funny characters, Greek letters, etc.

In the first cases, they can clearly be used as the subject of mathematical operations and analysis, and in the second they can't.

Moving into gray:

  • Age, represented as an integer (which is very frequently done, and fair);
  • Geographic lat/long references, and map coordinates; sometimes integers, sometimes they have decimal places;
  • One of my favourites is financial data, where dollar values exist only to two decimal places, and would always be integers if measured in cents.

These are both frequently used as classification variables, and as subjects of analysis (in some cases both in the same statistical result).

One horrible misuse; using floating point number to store keys that have decimal places, like library book numbers. Thereby ensuring that they will NEVER match (yes, I got to live the adventure once. I just about died when I found out what they'd done.)

I've never tripped over a theoretical treatment of this topic, but I think that there are subtleties that go beyond having only two types of data under consideration.

I'd be interested in everyone's opinion on whether there is a conceptual treatment of this issue, or if we're doomed always to fly by the seat of our pants.

Tom

Super User
Posts: 5,429

Re: Warning in DI Studio

Assuming SAS data types:

  • A wavelength of light; NUM(8)
  • The mass of an object; NUM 8
  • Gender, coded as 1 for female, 2 for male; CHAR(1)  - but rather M and F, which are more self-explanatory (and does not rank the the sexes)
  • "Do you like the president?", coded as 1 for no, 2 for yes. CHAR(1), but rather N and Y for the same reasons as above.
  • Age, represented as an integer (which is very frequently done, and fair); NUM(3) (or 4 if you are caring about variable alignment). This could be used for calculations.
  • Geographic lat/long references, and map coordinates; sometimes integers, sometimes they have decimal places; NUM(8)
  • One of my favorites is financial data, where dollar values exist only to two decimal places, and would always be integers if measured in cents. NUM(8) - not sure if I encountered any problems with values with two decimals, but perhaps there could be situations where storing as cents coluld be beneficial.
Data never sleeps
Trusted Advisor
Posts: 3,214

Re: Warning in DI Studio

Nice getting reactions.......

The first black/white points should be clear they can terrible fail.   The number(8) just does not support that number of numbers.

Assume  you would asked to print the first 100 digits of pi. Yes for normal human environments 5 digits would be sufficient, but in some areas you want more.  Measurements with high numbers of accuracy History of the metre - Wikipedia, the free encyclopedia  

The old greek did not have numbers but were using the letter - alfabet for that. We are just using them for som couple of ages (French revolution).

Greek/Roman did not use numbers as we know. It is Arabic - Indian inheritance inlcude the masterpiec of the number 0.


It is the hollerith card Punched card - Wikipedia, the free encyclopedia that caused the misperception of being only allowed coding 0-9 by many people. Indeed it is better to use Char-type and than using letters for that  not being trapped to the tempatation of doing calcuations. Specfying the The expected value of Seks being 0.5. Getting to wonder why this value is not in the dataset (....question was done in real life).


Age numbers of length 4 would be sufficient but what value you are using Years/days. In healtcare days would be more applicable measuring effects. Years on the other hand would do for binnig the age of a person.  Going for seconds an beyond keep it on the 8 bytes.

card. Seen people introducing dates like 32 may as indicator of product status alteration. not ware they could also use letters in the product status field.

Geography that is a nice area with measures. Would go for my holidays navigating on a sailing-boat. Earht big-circle is about 40.000km as being defined by Napoleon with 100degr for a right angle. But we are using 90-degrees for that with minutes (divided by 60) and seconds(divided by 60) giving 1 Nm (1852Nm) leaving out feet fathom landmile etc. Use the NS lines for that not the E-W as they are not following a big circle.  From mathematics the angles are measured in rad. Terrible calculations on that. SAS/graph map datasets are given in rad. Projections are transforming everything. In a mercator projection you will fail in about 10m at the edges for a map representing 300km    
  
Ah Financial also nice. They are given it by two digits as cents. But what happend when calculating interests. It should be calculated as a continous one not being made to discrete intervals. But working with continous interest is actually processing it is  floating. Compound interest - Wikipedia, the free encyclopedia. What Every Computer Scientist Should Know About Floating-Point Arithmetic     There are some anecdotes on working with those roundings. snopes.com: The Salami Embezzlement Technique. Real experience is getting hard to proof that.

I remember an example of a disapproved conversion as there was some differenc of 3,67 after working on ammounts into billions. It was really wrong after carefull examination.

Another example was the complaint as the ammount of 50,00 was not the same as 50,00 after having done some calculations (not being rounded before comparision).

O yes the obvious integers of ammounts still have their challenges.  The same kind of calculations failing to accomplish human expectations at the first calculators (1970). These are masking that today so you are not aware of them anymore.

---->-- ja karman --<-----
Ask a Question
Discussion stats
  • 12 replies
  • 1622 views
  • 1 like
  • 7 in conversation