Hello,
I am trying to create a class variable based on a variable that is currently character. By class variable I mean a variable that is populated by integers which can be linked to a certain character string, but there are over 100 character strings and they change monthly so I would like to not have to manually write out lines for every one.
My data looks like this:
id_var program_id
1 ABC
2 DEF
3 ABC
4 GHI
5 GHI
and I want it to look like this:
id_var program_id prog_ind
1 ABC 1
2 DEF 2
3 ABC 1
4 GHI 3
5 GHI 3
In the past, I had just been doing them by hand like so:
data want; set have;
if program_id="ABC" then prog_ind=1;
run;
But as I said, if there is a way to make the process go quicker (in coding, not in computation speed) that would be really helpful.
Thank You
Here is an example using your data to create an Informat and associated format.
proc sql;
create table work.temp as
select distinct program_id as Start
from Have;
quit;
data tempcntlin;
set work.temp;
length FmtName $ 10. ;
FmtName='Program';
label=put(_n_,best4.);
type='I';
output;
FmtName='ProgramTxt';
label=Start;
start=put(_n_,best4.);
type='N';
output;
run;
proc sort data=tempcntlin;
by Type FmtName start;
run;
Proc format library=work cntlin=tempcntlin out=work.cntlout;
run;
data want;
set have;
Prog_ind = input(program_id,Program.);
run;
proc print data=want;
var Prog_ind;
format Prog_ind Programtxt.;
run;
NOTE: this approach replaces the format constantly so you should not use it on any previously created data set.
I have provided an associated format to display the original value as needed.
A side effect of the Proc SQL step to select the names is that they will come out in alphabetic order and hence 1 will be assigne to the first ordered program_id. If that is not the desired result you will need to specify such.
I don't know how you are using this but you may not need the numeric variable at all.
Here is an example using your data to create an Informat and associated format.
proc sql;
create table work.temp as
select distinct program_id as Start
from Have;
quit;
data tempcntlin;
set work.temp;
length FmtName $ 10. ;
FmtName='Program';
label=put(_n_,best4.);
type='I';
output;
FmtName='ProgramTxt';
label=Start;
start=put(_n_,best4.);
type='N';
output;
run;
proc sort data=tempcntlin;
by Type FmtName start;
run;
Proc format library=work cntlin=tempcntlin out=work.cntlout;
run;
data want;
set have;
Prog_ind = input(program_id,Program.);
run;
proc print data=want;
var Prog_ind;
format Prog_ind Programtxt.;
run;
NOTE: this approach replaces the format constantly so you should not use it on any previously created data set.
I have provided an associated format to display the original value as needed.
A side effect of the Proc SQL step to select the names is that they will come out in alphabetic order and hence 1 will be assigne to the first ordered program_id. If that is not the desired result you will need to specify such.
I don't know how you are using this but you may not need the numeric variable at all.
You example seems to indicate that you want the prog_ind variable to be the rank of the sorted values of the program_id variable, regardless of its contents. If you don't mind sorting the data, do this:
data Have;
input id_var program_id $;
datalines;
1 ABC
2 DEF
3 ABC
4 GHI
5 GHI
;
proc sort data=have out=want;
by program_id;
run;
data want;
set want;
by program_id;
if first.program_id then
prog_ind + 1;
run;
proc print; run;
If you don't want to leave the data in sorted order, you can restore the original order by resorting on the id_var variable.
I don't understand your logic . Why GHI is 3 ?
OK. Assuming I understand what you mean. data have; input id_var program_id $; cards; 1 ABC 2 DEF 3 ABC 4 GHI 5 GHI ; run; data want; if _n_=1 then do; if 0 then set have; declare hash h(); h.definekey('program_id'); h.definedata('n'); h.definedone(); end; set have; if h.find() ne 0 then do; count+1;n=count;h.add(); end; drop count; run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.