Dear SAS community,
I need your help. Image I have the following column with about 120 different observations
Column1
A
B
C
A
D
E
F
B
C
I want to create a numeric variable (Column2) based on the Column1 like this
Column1 Column2
A 1
B 2
C 3
A 1
D 4
E 5
F 6
B 2
C 3
How can I create the Column2 based on the values from the Column1? I want to have the same number for the same values from the character variable Column1, i.e. A gets the number 1, B the number 2, C the number 3 etc.
Many thanks and best regards
Alright. Try this
data have;
input Column1 $;
datalines;
A
B
C
A
D
E
F
B
C
;
data want(drop = c);
if _N_ = 1 then do;
dcl hash h();
h.definekey('Column1');
h.definedata('Column2');
h.definedone();
end;
set have;
if h.find() then do;
c + 1;
Column2 = c;
h.add();
end;
run;
Result:
Column1 Column2 A 1 B 2 C 3 A 1 D 4 E 5 F 6 B 2 C 3
Is this data representable? Is your actual data simply 'A', 'B', .. or more compex strings ?
@PeterClemmensen: Unfortunately, I have complex strings like HTALMHHGR or 2LBBUIIPN. As I said I have about 120 different observations, the mentioned two are just examples.
Ok. Does it matter what numeric value is the first? I.e. is it important that A = 1 and not 12345 ?
It is important that A gets 1 B 2 and so on (in ascending order).
Alright. Try this
data have;
input Column1 $;
datalines;
A
B
C
A
D
E
F
B
C
;
data want(drop = c);
if _N_ = 1 then do;
dcl hash h();
h.definekey('Column1');
h.definedata('Column2');
h.definedone();
end;
set have;
if h.find() then do;
c + 1;
Column2 = c;
h.add();
end;
run;
Result:
Column1 Column2 A 1 B 2 C 3 A 1 D 4 E 5 F 6 B 2 C 3
@PeterClemmensen: It works, thank you very much.
Anytime 🙂
Another approach is a custom informat to read the data into a fixed value.
proc format; invalue mystring (upcase) 'ABC' = 3 'PDQ' = 27 'ZZZ' = 18 other=_error_ ; data have; input x $; datalines; abc ABC aBc pdQ zzZ row ; data use; set have; y = input(x,mystring.); run;
The infomat approach may be more usable if you are going to have multiple data sets with similar values that you want to always map to the same number. If your data source has entry issues like inconsistent use of letter case the informat option (Upcase) as shown converts all the values to upper case before comparing to the values in the informat and assigning values. Plus, the other=_error_ option will let you know with invalid data messages if one of the values in the data is not defined in the informat (spelling issue or new value entirely). If there are going to be many different data sets the informat may become the preferred choice in the long run for data consistency.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.