- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm trying to assign a unique 6 character variable to each observation in a dataset.
eg.
1="AAAAAA"
2="AAAAAB"
3="AAAAAC"
.
.
26="AAAAAZ"
27="AAAABA"
etc.
It needs to be dynamic based on the first observation which may not always be "AAAAAA".
What would be the best way of doing this?
Thanks
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Luke_D and welcome to the SAS Support Communities!
Try this:
/* Create sample data for demonstration */
data have;
set sashelp.class;
if _n_=1 then string='ADBBAB';
run;
/* Continue the alphabetic series of strings */
data want(drop=_:);
set have;
if _n_=1 then _s=string;
else do;
_i=6-findc(_s,'Z',-6,'k');
if _i>0 then substr(_s,7-_i)=repeat('A',6);
if _i<6 then substr(_s,6-_i,1)=byte(rank(char(_s,6-_i))+1);
else put 'CAUTION: Overflow "ZZZZZZ" --> "AAAAAA" occurred in obs. ' _n_;
string=_s;
end;
retain _s;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here's a method to create an anonymized key data set if that's what you're looking for.
https://gist.github.com/statgeek/fd94b0b6e78815430c1340e8c19f8644
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
eg.
1=""ADBBAB"
2="ADBBAC"
3="ADBBAD"
Thanks for the link but I'm trying to get each observation to follow the sequence rather than just an anonymized key.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Separate each character into an array. Then increment each level as needed.
The BYTE() function will convert a number to a character, 65 to 90 are the capitalized values, but you'll have to manually do the increment for Z to A to get the value.
Essentially you're using a numbering system with base 26 instead of 10 so you have to do all the math calculations yourself. It would be much easier to use numbers......
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Reeza wrote:
Essentially you're using a numbering system with base 26 instead of 10 so you have to do all the math calculations yourself. It would be much easier to use numbers......
I agree, but the variable is limited to 6 bytes and using letters gives us 300m+ more combinations than base 10.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
1 byte is not a character when it comes to numeric values.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Combining @Reeza's idea
Essentially you're using a numbering system with base 26 instead of 10 so you have to do all the math calculations yourself.
with @ChrisNZ's magic %ConvertBase macro one can calculate the n-th value of the sequence (for an almost arbitrary positive integer n) directly from the first without looping through the intermediate values.
Example:
%let value1=ADBZZU;
%let n=19;
%let c=%sysfunc(collate(65,90)); /* assuming the ASCII collating sequence */
%let d=0123456789%sysfunc(collate(65,80));
%let value&n=%sysfunc(translate(%substr(%ConvertBase(10,26,%eval(%ConvertBase(26,10,%sysfunc(translate(&value1,&d,&c)))+&n-1+26**6)),2),&c,&d));
%put &&value&n;
The code looks a bit cumbersome because the mathematical base-26 system uses digits 0, ..., 9, A, ..., P rather than A, ..., Z. That's where the TRANSLATE and COLLATE functions come into play.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Luke_D and welcome to the SAS Support Communities!
Try this:
/* Create sample data for demonstration */
data have;
set sashelp.class;
if _n_=1 then string='ADBBAB';
run;
/* Continue the alphabetic series of strings */
data want(drop=_:);
set have;
if _n_=1 then _s=string;
else do;
_i=6-findc(_s,'Z',-6,'k');
if _i>0 then substr(_s,7-_i)=repeat('A',6);
if _i<6 then substr(_s,6-_i,1)=byte(rank(char(_s,6-_i))+1);
else put 'CAUTION: Overflow "ZZZZZZ" --> "AAAAAA" occurred in obs. ' _n_;
string=_s;
end;
retain _s;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sir @FreelanceReinh Brilliant!!! I had to read your idea twice to get the hang of it. Yes, got it. Very neat indeed. 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Just for fun.
%let dsn=sashelp.heart; %let string= ADBZYZ ; %let dsid=%sysfunc(open(&dsn)); %let nobs=%sysfunc(attrn(&dsid,nlobs)); %let dsid=%sysfunc(close(&dsid)); %let n1=%sysfunc(rank(%substr(&string,1,1))); %let n2=%sysfunc(rank(%substr(&string,2,1))); %let n3=%sysfunc(rank(%substr(&string,3,1))); %let n4=%sysfunc(rank(%substr(&string,4,1))); %let n5=%sysfunc(rank(%substr(&string,5,1))); %let n6=%sysfunc(rank(%substr(&string,6,1))); %put _user_; data key; do n1=&n1 to rank('Z'); loop2+1; _n2=ifn(loop2=1,&n2,rank('A')); do n2=_n2 to rank('Z'); loop3+1; _n3=ifn(loop3=1,&n3,rank('A')); do n3=_n3 to rank('Z'); loop4+1; _n4=ifn(loop4=1,&n4,rank('A')); do n4=_n4 to rank('Z'); loop5+1; _n5=ifn(loop5=1,&n5,rank('A')); do n5=_n5 to rank('Z'); loop6+1; _n6=ifn(loop6=1,&n6,rank('A')); do n6=_n6 to rank('Z'); n+1;string=cats(byte(n1),byte(n2),byte(n3),byte(n4),byte(n5),byte(n6));output; if n = &nobs then stop; end;end;end;end;end;end; keep string; run; data want; merge &dsn key; run;