hi,
suppose I have the following small sample from a bigger data set:
ID | Category |
---|---|
1 | CC |
2 | Z1 |
3 | B+ |
4 | CC |
5 | Z2 |
6 | B |
What I would like to have is for each unique category a unique variable category:
ID | Category | Var_Cat |
---|---|---|
1 | CC | 1 |
2 | Z1 | 2 |
3 | B+ | 3 |
4 | CC | 1 |
5 | Z2 | 4 |
6 | B | 5 |
Thank you!
data have;
infile cards truncover expandtabs;
input ID Category $;
cards;
1 CC
2 Z1
3 B+
4 CC
5 Z2
6 B
;
run;
data want;
if _n_ =1 then do;
if 0 then set have;
declare hash h();
h.definekey('Category');
h.definedata('n');
h.definedone();
end;
set have;
if h.find() ne 0 then do; count+1;n=count;h.add();end;
drop count;
run;
data have;
infile cards truncover expandtabs;
input ID Category $;
cards;
1 CC
2 Z1
3 B+
4 CC
5 Z2
6 B
;
run;
data want;
if _n_ =1 then do;
if 0 then set have;
declare hash h();
h.definekey('Category');
h.definedata('n');
h.definedone();
end;
set have;
if h.find() ne 0 then do; count+1;n=count;h.add();end;
drop count;
run;
Lots of questions need to be answered here ...
Is the "bigger" data set small enough that you can process it many times as needed to get the result?
Is the new variable supposed to be numeric or character?
If it is supposed to be character ... Why? (What difference would it make using the original values vs. the new variable?) Should "1" become "001" if there are hundreds of categories?
If it supposed to be numeric ... Do the numbers have to be consecutive, or would any unique numeric value be sufficient? If they need to be consecutive, do they have to be assigned using the order that appears in the data, or can they be assigned using some other order (such as alphabetical order of CATEGORY)?
There are many ways to approach this, depending on the answers you provide. For example, if the new variable should be numeric, can take on any integer value, but needs to be a unique match for the original value, you can do this in a DATA step with one statement:
var_cat = input(put(category, $hex4.), hex4.);
But the requirements have to be explained a bit more to make sure the solution matches what you need.
Good luck.
Give this a try. If your data is intrinsically sorted by ID, just get rid of all the "SeqNo" stuff and use ID instead.
Tom
data have;
set have;
SeqNo = _n_;
run;
proc sort data=have;
by Category;
run;
data have;
set have;
by Category;
retain Var_Cat 0;
if first.Category then
Var_Cat = Var_Cat + 1;
run;
proc sort data=have out=have(drop=SeqNo);
by SeqNo;
run;
If you have a smallish number of values you could also make an informat that could create the new variable.
proc format;
invalue Var_cat
'CC' = 1
'Z1' = 2
'B+' = 3;
run;
data want ;
set have;
var_cat = input(category,var_cat.);
run;
data want;
do until (last);
set have end=last;
array temp [1000]$ _temporary_;
if whichc(category,of temp(*))=0 then do;
n+1;
temp(n)=category;
end;
end;
do p=1 to nobs;
set have point=p nobs=nobs;
count=whichc(category,of temp(*));
output;
end;
stop;
drop n;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.