Hi i am having a dataset having two vaiables no and name of city .i am having 30000 incremantal data of number if no is increasing then
As the city is increasing it should create a format as 1=Mumbai 2=chennai like that.... the no of city names are there.
The COLLATE function may work for your needs, or just create a static list of substring character values to reference / assign based on the offset, using SUBSTR and your count variable with a DATA step, as shown below:
retain a2z 'abcdefghijklmnopqrstuvwxyz';
do x=1 to length(a2z);
v = substr(a2z,x,1);
put v= x=;
Try this example. But it isn't clear for me that if there would be 30000 unique ids
then the new_format would be equals 'aaab', 'aaac' and so on. I mean that new_format equals two symbols or it may be more then two symbols.
The example is for two letters.
data n; do no=1 to 702 ; output;output;output; end; run;
data t; length f $1. new_format $3. fc $1.; retain fc ' ' f ' ' k 1 j 0 new_format ' '; set n; by no;
if first.no then do;
if j gt 26 then j=1;
if k gt 26 then k=1;
fc=scan('a b c d e f g h i j k l m n o p q r s t u v w x y z',k,' ');
f=scan('a b c d e f g h i j k l m n o p q r s t u v w x y z',j,' ');
if mod(no,26)=0 and no gt 26 then k=k+1;
if no le 26 then new_format=f; else new_format=fc||f;
keep no new_format;
> now i want the format like this in to new vaiable as
> i am having 30000 observations as no increase it
> should give format like this
> if no is 24566 new_format=za
> if no is 24567 new_format=zb
> if no is 24568 new_format=zc
Hmmmmm...I think format dataset would be a better choice for this kind question.
And as your said, there should be 702 obs(i.e. 26*26+26=702).Counld not be 'no is 24568 ';
And also your 'no' is numeric variable.
length label $2.;
retain fmtname 'fmt' type 'N' ;
do label='a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z' ;
do x='a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z' ;
do y='a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z' ;
drop x y;
data n; *To build up your original dataset;
do no=1 to 702 ;
proc format cntlin=format;
proc print data=temp(obs=40) noobs;run;
is this counting in "base 26" (like decimals are base=10, hex is base=16 and binary is base=2)?
But, I'm not sure what letter should take the place of a zero, perhaps it should be base27.
If so, it should be possible to derive the new_format "number" from a decimal number, with something (probably better with logs)
new_format = substr( '0abcdefghijklmnopqrstuvwxyz', 1+int( no/ 27), 1) !! substr( '0abcdefghijklmnopqrstuvwxyz', 1+mod( no, 27), 1) ;
Your 'base 26' would be some right.And something you thought would be further profound.
I also am confused that why this author want to use this mapping.
And your HyperLink,i readed,It is pretty useful. Learned it from you. :-) ;-)