I am Heather L. Edwards, Taxonomy Developer at the Associated Press and co-moderator for the Text and Content Analytics Forum. At the AP, we use the SAS Content Categorization tool to classify the 100,000+ content items that pass through the pipeline each day. We first implemented the tool in 2006 to replace the minimal descriptive metadata applied manually by editors. I am one of four full-time taxonomy developers creating and maintaining rules for approximately 8500 subject, geography, and organization categories. We also have a controlled vocabulary developer who manages our entity lists – approximately 93,000 people and 43,000 publicly-traded companies. Our classification system must be fully automated and extremely accurate. The descriptive metadata applied by the SAS Content Categorization tool is used to funnel content into products and to power both AP and customer websites with no human intervention. Once an item passes through the pipeline, its classification cannot be changed, so accuracy is critical. I am very lucky to work in a team of developers where we can collaborate on rule-writing and vocabulary-management challenges. Even so, I share Julia’s desire to learn from a larger community of practice. I hope that this forum will be a resource for us all to learn from each other.
... View more