BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ilikesas
Barite | Level 11

hi,

suppose I have the following small sample from a bigger data set:

IDCategory
1CC
2Z1
3B+
4CC
5Z2
6B

What I would like to have is for each unique category a unique variable category:

IDCategoryVar_Cat
1CC1
2Z12
3B+3
4CC1
5Z24
6B5

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Code: Program



data have;
infile cards truncover expandtabs;
input ID Category $;
cards;
1 CC
2 Z1
3 B+
4 CC
5 Z2
6 B
;
run;

data want;
if _n_ =1 then do;
if 0 then set have;
declare hash h();
h.definekey('Category');
h.definedata('n');
h.definedone();
end;
set have;
if h.find() ne 0 then do; count+1;n=count;h.add();end;
drop count;
run;

View solution in original post

5 REPLIES 5
Ksharp
Super User

Code: Program



data have;
infile cards truncover expandtabs;
input ID Category $;
cards;
1 CC
2 Z1
3 B+
4 CC
5 Z2
6 B
;
run;

data want;
if _n_ =1 then do;
if 0 then set have;
declare hash h();
h.definekey('Category');
h.definedata('n');
h.definedone();
end;
set have;
if h.find() ne 0 then do; count+1;n=count;h.add();end;
drop count;
run;
Astounding
PROC Star

Lots of questions need to be answered here ...

Is the "bigger" data set small enough that you can process it many times as needed to get the result?

Is the new variable supposed to be numeric or character?

If it is supposed to be character ... Why?  (What difference would it make using the original values vs. the new variable?)  Should "1" become "001" if there are hundreds of categories?

If it supposed to be numeric ... Do the numbers have to be consecutive, or would any unique numeric value be sufficient?  If they need to be consecutive, do they have to be assigned using the order that appears in the data, or can they be assigned using some other order (such as alphabetical order of CATEGORY)?

There are many ways to approach this, depending on the answers you provide.  For example, if the new variable should be numeric, can take on any integer value, but needs to be a unique match for the original value, you can do this in a DATA step with one statement:

var_cat = input(put(category, $hex4.), hex4.);

But the requirements have to be explained a bit more to make sure the solution matches what you need.

Good luck.

TomKari
Onyx | Level 15

Give this a try. If your data is intrinsically sorted by ID, just get rid of all the "SeqNo" stuff and use ID instead.

Tom

data have;
   set have;
   SeqNo = _n_;
run;

proc sort data=have;
   by Category;
run;

data have;
   set have;
   by Category;
   retain Var_Cat 0;

   if first.Category then
      Var_Cat = Var_Cat + 1;
run;

proc sort data=have out=have(drop=SeqNo);
   by SeqNo;
run;

ballardw
Super User

If you have a smallish number of values you could also make an informat that could create the new variable.

proc format;

invalue Var_cat

'CC' = 1

'Z1' = 2

'B+' = 3;

run;


data want ;

     set have;

     var_cat = input(category,var_cat.);

run;

slchen
Lapis Lazuli | Level 10

data want;

   do until (last);

   set have end=last;

   array temp [1000]$ _temporary_;

   if whichc(category,of temp(*))=0 then do;

       n+1;

       temp(n)=category;

   end;

   end;

   do p=1 to nobs;

   set have point=p nobs=nobs;

   count=whichc(category,of temp(*));

   output;

   end;

   stop;

   drop n;

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1824 views
  • 9 likes
  • 6 in conversation