- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have a dataset that contains various numbers stored in a character format i.e.
123456
7890
23
I would like to pad each of these values to so that each string is 16 characters long using a PROC SQL statement i.e.
0000000000123456
0000000000007890
0000000000000023
This is what I have tried so far without much success:
PROC SQL;
SELECT FORMAT (X, '0000000000000000000')
FROM Table;
QUIT;
Any assistance would be greatly appreciated.
- Tags:
- proc sql
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Firstly, have you hard evidence that using proc sql will increase the processing speed? I can certainly pull out some posts which show SQL usage on large datasets to be far more resource consuming. Never restrict yourself to one technology.
Secondly, most datastep statement can be moved into a proc sql - of course if its passthrough then its a different matter - so:
data have; a="123456"; run; proc sql; create table WANT as select A, cats(repeat("0",20-lengthn(A)),A) as B, put(input(A,best.),z20.) as C from HAVE; quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There are many ways, concatenation, changing to number and then back again are two for example:
data want; a="123456"; b=cats(repeat("0",20-lengthn(a)),a); c=put(input(a,best.),z20.); run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you however I was looking for a PROC SQL solution if possible? I need a PROC SQL version in order to possibly increase the processing speed on a large dataset.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Firstly, have you hard evidence that using proc sql will increase the processing speed? I can certainly pull out some posts which show SQL usage on large datasets to be far more resource consuming. Never restrict yourself to one technology.
Secondly, most datastep statement can be moved into a proc sql - of course if its passthrough then its a different matter - so:
data have; a="123456"; run; proc sql; create table WANT as select A, cats(repeat("0",20-lengthn(A)),A) as B, put(input(A,best.),z20.) as C from HAVE; quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your response. I am still fairly new to SAS and therefore I am not certain of the processing overhead of some data steps vs PROC SQL however I have been told by my colleagues that using data step "merge" vs a SQL "join" procedure is much slower. I would like to mark your answer as the solution however I am not sure what variables "B" and "C" refer to? Please could you perhaps clarify this?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Well, I say they "could" be right in certain circumstances, but its very dependant on scenario and a wide ranging statement of SQL > datastep at merging is misleading at best.
B and C refer to variables in the output dataset, it takes the form of
<logical statement> as <new_variable>
or
<variable from contributing table> as <new_variable>
If you run the code in SAS you will see the output.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@MattSan wrote:
Thank you however I was looking for a PROC SQL solution if possible? I need a PROC SQL version in order to possibly increase the processing speed on a large dataset.
PROC SQL will not speed up the processing; unless your dataset resides in a remote RDBMS that has a lot more computing power than your SAS host, where SQL pass-through will have a noticeable effect.
In pure SAS, SQL is at best just as fast as a data step merge. And often considerably (up to several orders of magnitude) slower.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
My dataset does reside in a remote RDBMS and the extra effort to sort the data before merging the datasets is unwarranted.