SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Assigning numeric values to mixed character categories

Accepted Solution Solved
Reply
Super Contributor
Posts: 413
Accepted Solution

Assigning numeric values to mixed character categories

hi,

suppose I have the following small sample from a bigger data set:

IDCategory
1CC
2Z1
3B+
4CC
5Z2
6B

What I would like to have is for each unique category a unique variable category:

IDCategoryVar_Cat
1CC1
2Z12
3B+3
4CC1
5Z24
6B5

Thank you!


Accepted Solutions
Solution
‎06-13-2015 10:53 AM
Super User
Posts: 9,676

Re: Assigning numeric values to mixed character categories

Code: Program



data have;
infile cards truncover expandtabs;
input ID Category $;
cards;
1 CC
2 Z1
3 B+
4 CC
5 Z2
6 B
;
run;

data want;
if _n_ =1 then do;
if 0 then set have;
declare hash h();
h.definekey('Category');
h.definedata('n');
h.definedone();
end;
set have;
if h.find() ne 0 then do; count+1;n=count;h.add();end;
drop count;
run;

View solution in original post


All Replies
Solution
‎06-13-2015 10:53 AM
Super User
Posts: 9,676

Re: Assigning numeric values to mixed character categories

Code: Program



data have;
infile cards truncover expandtabs;
input ID Category $;
cards;
1 CC
2 Z1
3 B+
4 CC
5 Z2
6 B
;
run;

data want;
if _n_ =1 then do;
if 0 then set have;
declare hash h();
h.definekey('Category');
h.definedata('n');
h.definedone();
end;
set have;
if h.find() ne 0 then do; count+1;n=count;h.add();end;
drop count;
run;
Super User
Posts: 5,081

Re: Assigning numeric values to mixed character categories

Lots of questions need to be answered here ...

Is the "bigger" data set small enough that you can process it many times as needed to get the result?

Is the new variable supposed to be numeric or character?

If it is supposed to be character ... Why?  (What difference would it make using the original values vs. the new variable?)  Should "1" become "001" if there are hundreds of categories?

If it supposed to be numeric ... Do the numbers have to be consecutive, or would any unique numeric value be sufficient?  If they need to be consecutive, do they have to be assigned using the order that appears in the data, or can they be assigned using some other order (such as alphabetical order of CATEGORY)?

There are many ways to approach this, depending on the answers you provide.  For example, if the new variable should be numeric, can take on any integer value, but needs to be a unique match for the original value, you can do this in a DATA step with one statement:

var_cat = input(put(category, $hex4.), hex4.);

But the requirements have to be explained a bit more to make sure the solution matches what you need.

Good luck.

PROC Star
Posts: 1,091

Re: Assigning numeric values to mixed character categories

Give this a try. If your data is intrinsically sorted by ID, just get rid of all the "SeqNo" stuff and use ID instead.

Tom

data have;
   set have;
   SeqNo = _n_;
run;

proc sort data=have;
   by Category;
run;

data have;
   set have;
   by Category;
   retain Var_Cat 0;

   if first.Category then
      Var_Cat = Var_Cat + 1;
run;

proc sort data=have out=have(drop=SeqNo);
   by SeqNo;
run;

Super User
Posts: 10,497

Re: Assigning numeric values to mixed character categories

If you have a smallish number of values you could also make an informat that could create the new variable.

proc format;

invalue Var_cat

'CC' = 1

'Z1' = 2

'B+' = 3;

run;


data want ;

     set have;

     var_cat = input(category,var_cat.);

run;

Super Contributor
Posts: 275

Re: Assigning numeric values to mixed character categories

data want;

   do until (last);

   set have end=last;

   array temp [1000]$ _temporary_;

   if whichc(category,of temp(*))=0 then do;

       n+1;

       temp(n)=category;

   end;

   end;

   do p=1 to nobs;

   set have point=p nobs=nobs;

   count=whichc(category,of temp(*));

   output;

   end;

   stop;

   drop n;

run;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 474 views
  • 9 likes
  • 6 in conversation