Hi,
I've been scratching my head for the past few hours and thought an expert can help me. I have a list of names that belong to the same common entity. I want to create unique 2 letter short codes using the names. I've written a program that should work (see result on row 2, where Harriet would create HA but because it was a duplicate of Ham, it removed A from the name to derive HR instead) but does not work for the subsequent record. Harry should have had A and R removed to derive HY
| short_code | nam_1 | iso_a2 | tmp_cd | last_tmp_cd | nam_1_s | in_prev | do_cnt | to_rm |
| HA | Ham | AL | HA | HAM | 0 | 1 | ||
| HR | Harriet | AL | HA,HR | HA | HRRIET | 0 | 2 | A |
| HA | Harry | AL | HA,HR,HA | HA,HR | HARRY | 0 | 1 | |
| HE | Hello | AL | HA,HR,HA,HE | HA,HR,HA | HELLO | 0 | 1 | |
| HI | Hitachi | AL | HA,HR,HA,HE,HI | HA,HR,HA,HE | HITACHI | 0 | 1 | |
| HO | Hog | AL | HA,HR,HA,HE,HI,HO | HA,HR,HA,HE,HI | HOG | 0 | 1 | |
| HG | Hogwarth | AL | HA,HR,HA,HE,HI,HO,HG | HA,HR,HA,HE,HI,HO | HGWARTH | 0 | 2 | O |
| HO | Holler | AL | HA,HR,HA,HE,HI,HO,HG,HO | HA,HR,HA,HE,HI,HO,HG | HOLLER | 0 | 1 | |
| HU | Hummer | AL | HA,HR,HA,HE,HI,HO,HG,HO,HU | HA,HR,HA,HE,HI,HO,HG,HO | HUMMER | 0 | 1 |
Here is a full working example that produced the results above. Apologies in advance as I had to put a lot of comments to make sure my head is working correctly! I also added a few failsafes and bits and bobs to troubleshoot.
/*list of names*/
data tmp;
length nam_1 $50;
input nam_1;
datalines;
Harry
Hummer
Harriet
Hog
Ham
Hello
Hogwarth
Hitachi
Holler
;
run;
/*add a common field*/
data tmp;
length short_code $5;
set tmp;
iso_a2="AL";
run;
/*objective: for the same common field (e.g. AL) short_code should be unique. short_code is derived from nam_1 (2 char)*/
/*short_code is all empty*/
proc sort data=tmp;
by iso_a2 descending short_code nam_1;
run;
data out;
set tmp;
by iso_a2 descending short_code nam_1;
length tmp_cd last_tmp_cd $500;
retain tmp_cd; /*i want to retain a list of short_codes that has been assigned for the same common field so i should not assign it again*/
if first.iso_a2 then do; /*for every unique common fieldm, reset my short_code list*/
tmp_cd=short_code;
end;
/*begin creating short_code*/
last_tmp_cd=tmp_cd; /*assign short_code from previous row to a new field to debug*/
nam_1_s=upcase(nam_1); /*assign name to temporary var to be manipulated if required*/
if short_code="" then do; /*assign short_code*/
in_prev=1; /*do at least once*/
do_cnt=0; /*failsafe for infinite loop*/
do until (in_prev=0 or do_cnt=6);
do_cnt=do_cnt+1; /*failsafe for infinite loop*/
if do_cnt=1 then short_code=upcase(substr(nam_1_s,1,2)); /*if first loop, then simple assignment. take first 2 char of nam_1_s as short code. this will introduce dups*/
else do; /*if not first loop it means there was a duplicate. now attempt to remove the char from nam_1_s*/
to_rm=substr(strip(reverse(short_code)),1,1); /*extract the 2nd char of short_code and use this to remove this char from nam_1_s*/
nam_1_s=cats(substr(nam_1_s,1,1),compress(substr(nam_1_s,2), to_rm, 'i')); /*keep first char and attempt to remove the next char*/
short_code=upcase(substr(nam_1_s,1,2));
put last_tmp_cd=;
put short_code=;
put nam_1_s=;
end;
/*as in you see in the logs, it works for the first duplicate short_code. but subsequent short_codes fail to get index value>0 even though th short_code value is in last_tmp_cd*/
in_prev=index(last_tmp_cd, short_code);
end;
end;
tmp_cd=catx(",",tmp_cd,short_code); /*add assigned short_code to the list so next short_code will not use the same value*/
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.