BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Luke_D
Calcite | Level 5

Hi,

 

I'm trying to assign a unique 6 character variable to each observation in a dataset.

 

eg.

1="AAAAAA"

2="AAAAAB"

3="AAAAAC"

.

.

26="AAAAAZ"

27="AAAABA"

etc.

 

It needs to be dynamic based on the first observation which may not always be "AAAAAA".

 

What would be the best way of doing this?

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @Luke_D and welcome to the SAS Support Communities!

 

Try this:

/* Create sample data for demonstration */

data have;
set sashelp.class;
if _n_=1 then string='ADBBAB';
run;

/* Continue the alphabetic series of strings */

data want(drop=_:);
set have;
if _n_=1 then _s=string;
else do;
  _i=6-findc(_s,'Z',-6,'k');
  if _i>0 then substr(_s,7-_i)=repeat('A',6);
  if _i<6 then substr(_s,6-_i,1)=byte(rank(char(_s,6-_i))+1);
  else put 'CAUTION: Overflow "ZZZZZZ" --> "AAAAAA" occurred in obs. ' _n_;
  string=_s;
end;
retain _s;
run;

View solution in original post

10 REPLIES 10
Reeza
Super User
Please explain further what you mean by dynamic? Feel free to add more examples to illustrate the issue.

Here's a method to create an anonymized key data set if that's what you're looking for.
https://gist.github.com/statgeek/fd94b0b6e78815430c1340e8c19f8644
Luke_D
Calcite | Level 5
I mean that obs 1 won't always start "AAAAAA" so the next observation would need to add the next letter in the sequence based on the values of the previous observation.

eg.
1=""ADBBAB"
2="ADBBAC"
3="ADBBAD"

Thanks for the link but I'm trying to get each observation to follow the sequence rather than just an anonymized key.
Reeza
Super User
So you want each letter to be an alphabetic sequence?
Separate each character into an array. Then increment each level as needed.
The BYTE() function will convert a number to a character, 65 to 90 are the capitalized values, but you'll have to manually do the increment for Z to A to get the value.

Essentially you're using a numbering system with base 26 instead of 10 so you have to do all the math calculations yourself. It would be much easier to use numbers......
Luke_D
Calcite | Level 5

@Reeza wrote:
Essentially you're using a numbering system with base 26 instead of 10 so you have to do all the math calculations yourself. It would be much easier to use numbers......

I agree, but the variable is limited to 6 bytes and using letters gives us 300m+ more combinations than base 10.

Reeza
Super User
6 bytes for a numeric variable supports values up to 137,438,953,472 which is 444 times more values than using 6 characters.
1 byte is not a character when it comes to numeric values.
FreelanceReinh
Jade | Level 19

Combining @Reeza's idea

Essentially you're using a numbering system with base 26 instead of 10 so you have to do all the math calculations yourself.

with @ChrisNZ's magic %ConvertBase macro one can calculate the n-th value of the sequence (for an almost arbitrary positive integer n) directly from the first without looping through the intermediate values.

 

Example:

%let value1=ADBZZU;
%let n=19;
%let c=%sysfunc(collate(65,90)); /* assuming the ASCII collating sequence */
%let d=0123456789%sysfunc(collate(65,80));
%let value&n=%sysfunc(translate(%substr(%ConvertBase(10,26,%eval(%ConvertBase(26,10,%sysfunc(translate(&value1,&d,&c)))+&n-1+26**6)),2),&c,&d));
%put &&value&n;

The code looks a bit cumbersome because the mathematical base-26 system uses digits 0, ..., 9, A, ..., P rather than A, ..., Z. That's where the TRANSLATE and COLLATE functions come into play.

FreelanceReinh
Jade | Level 19

Hi @Luke_D and welcome to the SAS Support Communities!

 

Try this:

/* Create sample data for demonstration */

data have;
set sashelp.class;
if _n_=1 then string='ADBBAB';
run;

/* Continue the alphabetic series of strings */

data want(drop=_:);
set have;
if _n_=1 then _s=string;
else do;
  _i=6-findc(_s,'Z',-6,'k');
  if _i>0 then substr(_s,7-_i)=repeat('A',6);
  if _i<6 then substr(_s,6-_i,1)=byte(rank(char(_s,6-_i))+1);
  else put 'CAUTION: Overflow "ZZZZZZ" --> "AAAAAA" occurred in obs. ' _n_;
  string=_s;
end;
retain _s;
run;
novinosrin
Tourmaline | Level 20

Sir @FreelanceReinh  Brilliant!!! I had to read your idea twice to get the hang of it.  Yes, got it. Very neat indeed. 🙂

Luke_D
Calcite | Level 5
Thanks for the welcome and the fantastic solution!
Ksharp
Super User

Just for fun.

 

%let dsn=sashelp.heart;
%let string= ADBZYZ ;







%let dsid=%sysfunc(open(&dsn));
%let nobs=%sysfunc(attrn(&dsid,nlobs));
%let dsid=%sysfunc(close(&dsid));
%let n1=%sysfunc(rank(%substr(&string,1,1)));
%let n2=%sysfunc(rank(%substr(&string,2,1)));
%let n3=%sysfunc(rank(%substr(&string,3,1)));
%let n4=%sysfunc(rank(%substr(&string,4,1)));
%let n5=%sysfunc(rank(%substr(&string,5,1)));
%let n6=%sysfunc(rank(%substr(&string,6,1)));


%put _user_;

data key;
 do n1=&n1 to rank('Z');

  loop2+1;
   _n2=ifn(loop2=1,&n2,rank('A'));
 do n2=_n2 to rank('Z');

   loop3+1;
   _n3=ifn(loop3=1,&n3,rank('A'));
 do n3=_n3 to rank('Z');

   loop4+1;
   _n4=ifn(loop4=1,&n4,rank('A'));
 do n4=_n4 to rank('Z');

   loop5+1;
   _n5=ifn(loop5=1,&n5,rank('A'));
 do n5=_n5 to rank('Z');

   loop6+1;
   _n6=ifn(loop6=1,&n6,rank('A'));
 do n6=_n6 to rank('Z');
   n+1;string=cats(byte(n1),byte(n2),byte(n3),byte(n4),byte(n5),byte(n6));output;
   if n = &nobs then stop;
 end;end;end;end;end;end;
 keep string;
run;


data want;
 merge &dsn key;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 905 views
  • 13 likes
  • 5 in conversation