BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SASKiwi
PROC Star

If you are relying on trailing blanks to enforce multiple rows/values in Teradata then you are treading on dangerous ground - just my 2 cents worth.

Perhaps you could explore the idea of adding an extra column to load into Teradata that is just a row number or count to cater for UPI. That way you can ensure you will always have the same number of rows/values in a much more obvious and reliable way. And it is still not clear to me what 026 versus 026 with blanks might actually represent. From a logical point of view I see no difference - if you print these out in a report there is no difference.

jakarman
Barite | Level 11

SAS(R) 9.4 Functions and CALL Routines: Reference, Third Edition finding the position of the last non-blank is length.  There are more functions like the c (character) m (bytes)

The length of missing character  all blank is either 0 or 1.  That could be a confusing one as is with NULLS.

---->-- ja karman --<-----
Hercules
Fluorite | Level 6

Yes Jaap, that is helpful.

But what if I want to select distinct values?

If I am using this:

Proc SQL;

select distinct col1 from data1;

quit;

Then the output is:

Col1

Anant

Which actually should be:

Col1

Anant

Anant...... (Considering dots as spaces)

Ksharp
Super User

You can't ask to do that . why not replace the trailing blanks with other blank character like TAB ?

if <condition is true>  then  column1=Tranwrd(column1,"20"x,"09"x)

Hercules
Fluorite | Level 6

That's what I was trying to do.

See my second post.

When I ma trying to replace spaces by # then its is giving me all the hashes.

Like for col1 length 10 (representing dots as spaces)

Col1

Anant...

Sas

Hello..

If I am using tranwrd then:

Col1

Anant#####

Sas#######

Hello######

But ideally it should be

Anant ###

Sas

Hello##

Ksharp
Super User

You need specify a condition . Which one of Anant should be distinguished ? sas take all of Anant as the same one .

Hercules
Fluorite | Level 6

Hmmmm ...

that can be a logic.

let me try out this one

thanks

Hercules
Fluorite | Level 6

No Xia,

Its still not working.

For eg:

If my data set is;

data saslib.tmp;

length a $ 20;

a='anant'; output;

a='anant';output;

a=  'anant       '  ;output;

run;

And I am using:

data test;

set saslib.tmp;

chck=anyspace(a);

keep a chck;

run;

Then the output is:

anant6
anant6
anant6

which ideally should be :

anant0
anant0
anant6
jakarman
Barite | Level 11

No anant it will not work that makes sense as it cannot work.
With SAS and Teradata the character fields are either extended with spaces or are all left out. There is no way you can change that behavior.
The reason is thos concept is that basic and everywhere implemented as being convention it is meaningless.  comparisions assignments everywhere.

That brings us back to SASKiwi's question. Why would you do that what is the source you have that it is meaningful information.   

If you have some source where it is meaningfull that could another type of column information.

---->-- ja karman --<-----
Hercules
Fluorite | Level 6

Ok so the whole scenario is, I have a table in Oracle.

I want to load its data in Teradata through SAS.

In Teradata, one of the column of my target table is a Unique Primary Index.

Now in oracle my source column has a value as

026 and 026 with a trailing space.

When I load the data in Teradata, beacause the target column is a UPI, it discards the other 026 with the space.

I want both 026 values.

How can I achieve that.

What I don't understand, why 026 and 026 with a space means same to SAS.

Tom
Super User Tom
Super User

First off you should complain to the owner of the Oracle table as this is NOT a good data design.  In reality it probably represents a data error that has been introduced by how the data was loaded originally into Oracle.

So along with the data from Oracle pull an extra column that has the length of the crazy variable.

Then load the data into a volatile table in Teradata.

Then execute on the Teradata side code to use SUBSTR() function to remove the un wanted blanks when inserting into the table with uniqueness constraint.

Hercules
Fluorite | Level 6

And yes..

I forgot to mention that both my source and target column are Varchar.

In character I can understand that its an universal behaviour, but Varchar occupy only the string length and not more than that, leaving the rest of the allocated buffer as not space but null.

jakarman
Barite | Level 11

Ok start with oracle Oracle Data Types varchar and varchar2 the varchar2 is a special one it tells the comparisons in Oracle are done not ignoring spaces.
How they got stored in Oracle is not mentioned.

Going form Oracle to SAS you have a similar one as to TD SAS/ACCESS(R) 9.4 for Relational Databases: Reference, Sixth Edition SAS only knows numeric(floating) datetime(floauting) and char in a fixed way (can be compressed)

At that moment all spaces are padded to the full length of the dices maximum length. I didn't see the varchar2 type in TD either. TD mentioned it will remove trailing spaces in the varchar type.

The primary-index in Oracle has the intention to find data quickly in a OLTP (ACID) approach. TD is different as you should not try or think to use OLTP on it. It is an analytical DBMS optimized for that.

Do not translate the Oracle indexing to TD as you get very dissatisfied. The Primary Index in TD is used for spreading the across AMPs how many of those do you have?

As spreading the load (like map/reduce hadoop) is te goal you need a relative unique content, when there is something like a SSID or IBAN you could use that. When not having anything like that the best thing is use NOPRIMARYINDEX as TD will generate one in that case and will do an acceptable spread across those AMPs. Avoid the skweness on tables.  
 
The question on varchar2 nvarchar2 is one of a different type. Native Datatypes see the ANSI section. The ANSI can be seen as the standard all tools should follow.

Oracel excludes varchar2 types there as also the unpadded function. My question now: why are you trying to do something that very Larry Ellison specific and want to see that others would follow that.     
 

---->-- ja karman --<-----
Hercules
Fluorite | Level 6

Thank you Jaap for the explanation.

At least the picture is clear to me.

And yes, I do know that Teradata is an OLAP database, having an architecture of amps and Mvps, which obviously makes it a good database for Data warehousing whereas Oracle is an OLTP on the other hand.

I understood that SAS has characters which by default pads the spaces to its maximum length.

And the answer to your question is,I want two separate values, one with the space and one without space because I want to replicate the data from source to landing in teradata, which means ideally the data should be as it is in the source. And then I want to carry that data followed by cleaning to the staging area.

I also do understand the concept of primary index differ between the two. I know how when we execute a query, the query spreads on multiple amps, retrieve the data and then show it to us.

But what I am doing here is that I am making the column as a UPI and not just the PI, so that no two values should be equal in that column.

But if it can not be achieved I have to look for an alternate solution.

But thanks to all for their valuable time.

Tom
Super User Tom
Super User

You probably do NOT want to do what you are describing.  It will just cause more headaches that it solves.

SAS only has fixed length character fields. It does NOT have a VARCHAR() variable type.  You cannot store different numbers of trailing blanks.  If this is important to you then store the desired LENGTH in another field.

Teradata does have a VARCHAR() variable type.  So you could store 'X' and 'X ' as distinct values into a variable in Teradata.   But it you pull that into a SAS dataset it will be padded to the length of the variable since SAS does not have the VARCHAR() type.  So if it is important for you then pull the length (calculated on the Teradata side) along with the data.

Length Value

1 X

2 X

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 32 replies
  • 4773 views
  • 6 likes
  • 5 in conversation