Hi,
I have a database with Id's with strings of characters of different lengths whereas they must be normally identical. here is the example:
Id
CA000015432204
CA008552136654
CA8552136654
CA08552136654
CA000000300205
CA300205
The third and the fourth id are normally the same as the second, but with a lack of typing there are some missing zeros after the "CA" code. Similarly for the fifth and the sixth Id.
What I look for is How I can add as many zeros right after the "CA" code so that I will have
a total number of characters equal to 14 as for the first, second, and fifth id, and the result will look like this :
Id
CA000015432204
CA008552136654
CA008552136654
CA008552136654
CA000000300205
CA000000300205
My data contain more than a million of observations and many country codes lake "CA" for "Canada".
I think substr and compress can do the trick but I find no way to do it.
Thank you very much.
Under the assumption that the digits start at position 3 below code should do.
data sample;
infile datalines truncover;
input have_str:$14.;
length want_str $14;
want_str=substrn(have_str,1,2)||put(input(substr(have_str,3),f16.),z12.);
datalines;
CA000015432204
CA008552136654
CA8552136654
CA08552136654
CA000000300205
CA300205
;
run;
If you need to standardize you data on the data base side then use pass-through SQL. I'm pretty sure such standardization can also be done with database functions.
5 Oct 2017: Replaced PUTN() with PUT() in code above.
Under the assumption that the digits start at position 3 below code should do.
data sample;
infile datalines truncover;
input have_str:$14.;
length want_str $14;
want_str=substrn(have_str,1,2)||put(input(substr(have_str,3),f16.),z12.);
datalines;
CA000015432204
CA008552136654
CA8552136654
CA08552136654
CA000000300205
CA300205
;
run;
If you need to standardize you data on the data base side then use pass-through SQL. I'm pretty sure such standardization can also be done with database functions.
5 Oct 2017: Replaced PUTN() with PUT() in code above.
Nice solution, I did'nt get it to work in EG (sas 9.2), so I changed the putn:
data sample;
infile datalines truncover;
input have_str $14.;
length want_str $14;
want_str=substrn(have_str,1,2)||put(input(substr(have_str,3),f16.),z12.);
datalines;
CA000015432204
CA008552136654
CA8552136654
CA08552136654
CA000000300205
CA300205
;
run;
//Fredrik
Thank you both of you, very good solution,
Thank you FredriKE for the rectification you have made to let the Patrick's code work with my very old version of SAS. I'm sorry, I'll accept Patrick's solution for the SAS community to benefit from the latest updates. Thank you again for both of you.
Fw.d is just one of the available SAS formats you can use to write a number as a string of digits. With your data other SAS formats would return the same result, i.t. 16. or BEST16.
As for PUTN() vs. PUT: PUT() and PUTN() will return the same result in your case. Using PUT() is actually better here (performs better). PUTN() allows you to change a format during data step execution which can be very useful - but that's nothing we need to do here.
Using PUTN() was a "typo". I'm going to change the code I've posted.
Thank you @Patrickfor the very satisfactory clarifications.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.