turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Data Management
- /
- Forum
- /
- Assigning numeric values to mixed character catego...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-13-2015 10:16 AM

hi,

suppose I have the following small sample from a bigger data set:

ID | Category |
---|---|

1 | CC |

2 | Z1 |

3 | B+ |

4 | CC |

5 | Z2 |

6 | B |

What I would like to have is for each unique category a unique variable category:

ID | Category | Var_Cat |
---|---|---|

1 | CC | 1 |

2 | Z1 | 2 |

3 | B+ | 3 |

4 | CC | 1 |

5 | Z2 | 4 |

6 | B | 5 |

Thank you!

Accepted Solutions

Solution

06-13-2015
10:53 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ilikesas

06-13-2015 10:53 AM

data have;

infile cards truncover expandtabs;

input ID Category $;

cards;

1 CC

2 Z1

3 B+

4 CC

5 Z2

6 B

;

run;

data want;

if _n_ =1 then do;

if 0 then set have;

declare hash h();

h.definekey('Category');

h.definedata('n');

h.definedone();

end;

set have;

if h.find() ne 0 then do; count+1;n=count;h.add();end;

drop count;

run;

All Replies

Solution

06-13-2015
10:53 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ilikesas

06-13-2015 10:53 AM

data have;

infile cards truncover expandtabs;

input ID Category $;

cards;

1 CC

2 Z1

3 B+

4 CC

5 Z2

6 B

;

run;

data want;

if _n_ =1 then do;

if 0 then set have;

declare hash h();

h.definekey('Category');

h.definedata('n');

h.definedone();

end;

set have;

if h.find() ne 0 then do; count+1;n=count;h.add();end;

drop count;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ilikesas

06-13-2015 11:02 AM

Lots of questions need to be answered here ...

Is the "bigger" data set small enough that you can process it many times as needed to get the result?

Is the new variable supposed to be numeric or character?

If it is supposed to be character ... Why? (What difference would it make using the original values vs. the new variable?) Should "1" become "001" if there are hundreds of categories?

If it supposed to be numeric ... Do the numbers have to be consecutive, or would any unique numeric value be sufficient? If they need to be consecutive, do they have to be assigned using the order that appears in the data, or can they be assigned using some other order (such as alphabetical order of CATEGORY)?

There are many ways to approach this, depending on the answers you provide. For example, if the new variable should be numeric, can take on any integer value, but needs to be a unique match for the original value, you can do this in a DATA step with one statement:

var_cat = input(put(category, $hex4.), hex4.);

But the requirements have to be explained a bit more to make sure the solution matches what you need.

Good luck.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ilikesas

06-13-2015 11:10 AM

Give this a try. If your data is intrinsically sorted by ID, just get rid of all the "SeqNo" stuff and use ID instead.

Tom

data have;

set have;

SeqNo = _n_;

run;

proc sort data=have;

by Category;

run;

data have;

set have;

by Category;

retain Var_Cat 0;

if first.Category then

Var_Cat = Var_Cat + 1;

run;

proc sort data=have out=have(drop=SeqNo);

by SeqNo;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ilikesas

06-13-2015 05:58 PM

If you have a smallish number of values you could also make an informat that could create the new variable.

proc format;

invalue Var_cat

'CC' = 1

'Z1' = 2

'B+' = 3;

run;

data want ;

set have;

var_cat = input(category,var_cat.);

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ilikesas

06-13-2015 08:24 PM

data want;

do until (last);

set have end=last;

array temp [1000]$ _temporary_;

if whichc(category,of temp(*))=0 then do;

n+1;

temp(n)=category;

end;

end;

do p=1 to nobs;

set have point=p nobs=nobs;

count=whichc(category,of temp(*));

output;

end;

stop;

drop n;

run;