If you are relying on trailing blanks to enforce multiple rows/values in Teradata then you are treading on dangerous ground - just my 2 cents worth.
Perhaps you could explore the idea of adding an extra column to load into Teradata that is just a row number or count to cater for UPI. That way you can ensure you will always have the same number of rows/values in a much more obvious and reliable way. And it is still not clear to me what 026 versus 026 with blanks might actually represent. From a logical point of view I see no difference - if you print these out in a report there is no difference.
SAS(R) 9.4 Functions and CALL Routines: Reference, Third Edition finding the position of the last non-blank is length. There are more functions like the c (character) m (bytes)
The length of missing character all blank is either 0 or 1. That could be a confusing one as is with NULLS.
Yes Jaap, that is helpful.
But what if I want to select distinct values?
If I am using this:
Proc SQL;
select distinct col1 from data1;
quit;
Then the output is:
Col1
Anant
Which actually should be:
Col1
Anant
Anant...... (Considering dots as spaces)
You can't ask to do that . why not replace the trailing blanks with other blank character like TAB ?
if <condition is true> then column1=Tranwrd(column1,"20"x,"09"x)
That's what I was trying to do.
See my second post.
When I ma trying to replace spaces by # then its is giving me all the hashes.
Like for col1 length 10 (representing dots as spaces)
Col1
Anant...
Sas
Hello..
If I am using tranwrd then:
Col1
Anant#####
Sas#######
Hello######
But ideally it should be
Anant ###
Sas
Hello##
You need specify a condition . Which one of Anant should be distinguished ? sas take all of Anant as the same one .
Hmmmm ...
that can be a logic.
let me try out this one
thanks
No Xia,
Its still not working.
For eg:
If my data set is;
data saslib.tmp;
length a $ 20;
a='anant'; output;
a='anant';output;
a= 'anant ' ;output;
run;
And I am using:
data test;
set saslib.tmp;
chck=anyspace(a);
keep a chck;
run;
Then the output is:
anant | 6 |
anant | 6 |
anant | 6 |
which ideally should be :
anant | 0 |
anant | 0 |
anant | 6 |
No anant it will not work that makes sense as it cannot work.
With SAS and Teradata the character fields are either extended with spaces or are all left out. There is no way you can change that behavior.
The reason is thos concept is that basic and everywhere implemented as being convention it is meaningless. comparisions assignments everywhere.
That brings us back to SASKiwi's question. Why would you do that what is the source you have that it is meaningful information.
If you have some source where it is meaningfull that could another type of column information.
Ok so the whole scenario is, I have a table in Oracle.
I want to load its data in Teradata through SAS.
In Teradata, one of the column of my target table is a Unique Primary Index.
Now in oracle my source column has a value as
026 and 026 with a trailing space.
When I load the data in Teradata, beacause the target column is a UPI, it discards the other 026 with the space.
I want both 026 values.
How can I achieve that.
What I don't understand, why 026 and 026 with a space means same to SAS.
First off you should complain to the owner of the Oracle table as this is NOT a good data design. In reality it probably represents a data error that has been introduced by how the data was loaded originally into Oracle.
So along with the data from Oracle pull an extra column that has the length of the crazy variable.
Then load the data into a volatile table in Teradata.
Then execute on the Teradata side code to use SUBSTR() function to remove the un wanted blanks when inserting into the table with uniqueness constraint.
And yes..
I forgot to mention that both my source and target column are Varchar.
In character I can understand that its an universal behaviour, but Varchar occupy only the string length and not more than that, leaving the rest of the allocated buffer as not space but null.
Ok start with oracle Oracle Data Types varchar and varchar2 the varchar2 is a special one it tells the comparisons in Oracle are done not ignoring spaces.
How they got stored in Oracle is not mentioned.
Going form Oracle to SAS you have a similar one as to TD SAS/ACCESS(R) 9.4 for Relational Databases: Reference, Sixth Edition SAS only knows numeric(floating) datetime(floauting) and char in a fixed way (can be compressed)
At that moment all spaces are padded to the full length of the dices maximum length. I didn't see the varchar2 type in TD either. TD mentioned it will remove trailing spaces in the varchar type.
The primary-index in Oracle has the intention to find data quickly in a OLTP (ACID) approach. TD is different as you should not try or think to use OLTP on it. It is an analytical DBMS optimized for that.
Do not translate the Oracle indexing to TD as you get very dissatisfied. The Primary Index in TD is used for spreading the across AMPs how many of those do you have?
As spreading the load (like map/reduce hadoop) is te goal you need a relative unique content, when there is something like a SSID or IBAN you could use that. When not having anything like that the best thing is use NOPRIMARYINDEX as TD will generate one in that case and will do an acceptable spread across those AMPs. Avoid the skweness on tables.
The question on varchar2 nvarchar2 is one of a different type. Native Datatypes see the ANSI section. The ANSI can be seen as the standard all tools should follow.
Oracel excludes varchar2 types there as also the unpadded function. My question now: why are you trying to do something that very Larry Ellison specific and want to see that others would follow that.
Thank you Jaap for the explanation.
At least the picture is clear to me.
And yes, I do know that Teradata is an OLAP database, having an architecture of amps and Mvps, which obviously makes it a good database for Data warehousing whereas Oracle is an OLTP on the other hand.
I understood that SAS has characters which by default pads the spaces to its maximum length.
And the answer to your question is,I want two separate values, one with the space and one without space because I want to replicate the data from source to landing in teradata, which means ideally the data should be as it is in the source. And then I want to carry that data followed by cleaning to the staging area.
I also do understand the concept of primary index differ between the two. I know how when we execute a query, the query spreads on multiple amps, retrieve the data and then show it to us.
But what I am doing here is that I am making the column as a UPI and not just the PI, so that no two values should be equal in that column.
But if it can not be achieved I have to look for an alternate solution.
But thanks to all for their valuable time.
You probably do NOT want to do what you are describing. It will just cause more headaches that it solves.
SAS only has fixed length character fields. It does NOT have a VARCHAR() variable type. You cannot store different numbers of trailing blanks. If this is important to you then store the desired LENGTH in another field.
Teradata does have a VARCHAR() variable type. So you could store 'X' and 'X ' as distinct values into a variable in Teradata. But it you pull that into a SAS dataset it will be padded to the length of the variable since SAS does not have the VARCHAR() type. So if it is important for you then pull the length (calculated on the Teradata side) along with the data.
Length Value
1 X
2 X
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.