Hi,
I'm trying to assign a unique 6 character variable to each observation in a dataset.
eg.
1="AAAAAA"
2="AAAAAB"
3="AAAAAC"
.
.
26="AAAAAZ"
27="AAAABA"
etc.
It needs to be dynamic based on the first observation which may not always be "AAAAAA".
What would be the best way of doing this?
Thanks
Hi @Luke_D and welcome to the SAS Support Communities!
Try this:
/* Create sample data for demonstration */
data have;
set sashelp.class;
if _n_=1 then string='ADBBAB';
run;
/* Continue the alphabetic series of strings */
data want(drop=_:);
set have;
if _n_=1 then _s=string;
else do;
_i=6-findc(_s,'Z',-6,'k');
if _i>0 then substr(_s,7-_i)=repeat('A',6);
if _i<6 then substr(_s,6-_i,1)=byte(rank(char(_s,6-_i))+1);
else put 'CAUTION: Overflow "ZZZZZZ" --> "AAAAAA" occurred in obs. ' _n_;
string=_s;
end;
retain _s;
run;
@Reeza wrote:
Essentially you're using a numbering system with base 26 instead of 10 so you have to do all the math calculations yourself. It would be much easier to use numbers......
I agree, but the variable is limited to 6 bytes and using letters gives us 300m+ more combinations than base 10.
Combining @Reeza's idea
Essentially you're using a numbering system with base 26 instead of 10 so you have to do all the math calculations yourself.
with @ChrisNZ's magic %ConvertBase macro one can calculate the n-th value of the sequence (for an almost arbitrary positive integer n) directly from the first without looping through the intermediate values.
Example:
%let value1=ADBZZU;
%let n=19;
%let c=%sysfunc(collate(65,90)); /* assuming the ASCII collating sequence */
%let d=0123456789%sysfunc(collate(65,80));
%let value&n=%sysfunc(translate(%substr(%ConvertBase(10,26,%eval(%ConvertBase(26,10,%sysfunc(translate(&value1,&d,&c)))+&n-1+26**6)),2),&c,&d));
%put &&value&n;
The code looks a bit cumbersome because the mathematical base-26 system uses digits 0, ..., 9, A, ..., P rather than A, ..., Z. That's where the TRANSLATE and COLLATE functions come into play.
Hi @Luke_D and welcome to the SAS Support Communities!
Try this:
/* Create sample data for demonstration */
data have;
set sashelp.class;
if _n_=1 then string='ADBBAB';
run;
/* Continue the alphabetic series of strings */
data want(drop=_:);
set have;
if _n_=1 then _s=string;
else do;
_i=6-findc(_s,'Z',-6,'k');
if _i>0 then substr(_s,7-_i)=repeat('A',6);
if _i<6 then substr(_s,6-_i,1)=byte(rank(char(_s,6-_i))+1);
else put 'CAUTION: Overflow "ZZZZZZ" --> "AAAAAA" occurred in obs. ' _n_;
string=_s;
end;
retain _s;
run;
Sir @FreelanceReinh Brilliant!!! I had to read your idea twice to get the hang of it. Yes, got it. Very neat indeed. 🙂
Just for fun.
%let dsn=sashelp.heart; %let string= ADBZYZ ; %let dsid=%sysfunc(open(&dsn)); %let nobs=%sysfunc(attrn(&dsid,nlobs)); %let dsid=%sysfunc(close(&dsid)); %let n1=%sysfunc(rank(%substr(&string,1,1))); %let n2=%sysfunc(rank(%substr(&string,2,1))); %let n3=%sysfunc(rank(%substr(&string,3,1))); %let n4=%sysfunc(rank(%substr(&string,4,1))); %let n5=%sysfunc(rank(%substr(&string,5,1))); %let n6=%sysfunc(rank(%substr(&string,6,1))); %put _user_; data key; do n1=&n1 to rank('Z'); loop2+1; _n2=ifn(loop2=1,&n2,rank('A')); do n2=_n2 to rank('Z'); loop3+1; _n3=ifn(loop3=1,&n3,rank('A')); do n3=_n3 to rank('Z'); loop4+1; _n4=ifn(loop4=1,&n4,rank('A')); do n4=_n4 to rank('Z'); loop5+1; _n5=ifn(loop5=1,&n5,rank('A')); do n5=_n5 to rank('Z'); loop6+1; _n6=ifn(loop6=1,&n6,rank('A')); do n6=_n6 to rank('Z'); n+1;string=cats(byte(n1),byte(n2),byte(n3),byte(n4),byte(n5),byte(n6));output; if n = &nobs then stop; end;end;end;end;end;end; keep string; run; data want; merge &dsn key; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.