Text mining and content categorization

Introducing ourselves

Reply
New Contributor
Posts: 4

Introducing ourselves

Welcome to the new SAS Text and Content Analytics Community!  We are excited that you’ve joined us and look forward to learning from and with you.  Our hope is to build a vibrant community of practitioners where we can all share our successes and find help with our challenges.

The first step in building our community is getting to know one another.  Please introduce yourself to the group in the comments below.  We also have two running polls that help us to keep track of our members’ areas of interest.  Please let us know and for those tools.  You can also keep up with the community or individual discussions by clicking the green "Follow" link on each page.

We look forward to meeting you!

Julia Marshall

Heather L. Edwards

New Contributor
Posts: 4

Re: Introducing ourselves

I am Heather L. Edwards, Taxonomy Developer at the Associated Press and co-moderator for the Text and Content Analytics Forum.

At the AP, we use the SAS Content Categorization tool to classify the 100,000+ content items that pass through the pipeline each day.  We first implemented the tool in 2006 to replace the minimal descriptive metadata applied manually by editors.

I am one of four full-time taxonomy developers creating and maintaining rules for approximately 8500 subject, geography, and organization categories. We also have a controlled vocabulary developer who manages our entity lists – approximately 93,000 people and 43,000 publicly-traded companies.

Our classification system must be fully automated and extremely accurate.  The descriptive metadata applied by the SAS Content Categorization tool is used to funnel content into products and to power both AP and customer websites with no human intervention.  Once an item passes through the pipeline, its classification cannot be changed, so accuracy is critical.

I am very lucky to work in a team of developers where we can collaborate on rule-writing and vocabulary-management challenges.   Even so, I share Julia’s desire to learn from a larger community of practice.  I hope that this forum will be a resource for us all to learn from each other.

Contributor
Posts: 36

Re: Introducing ourselves

My name is Julia Marshall and I am the co-moderator for the Text and Content Analytics Forum.

I am also the team lead for the Development Experience Clearinghouse (DEC), the online repository for the United States Agency for International Development.

The DEC team began the implementation of the SAS Content Categorization tool in 2010, which has been quite the challenging experience.The DEC has 6.5 people dedicated to managing the documents submitted to it every month. We get an average of about 500 documents a month, submitted by both Agency staff and implementing partners. One full-time and one part-time cataloger manually add metadata such as personal authors, authoring organizations, sponsoring organizations, contract numbers, titles, etc. to a document record in the system. An indexer will manually add topic descriptors from the USAID Thesaurus. While the budget (and the staff) for the DEC has shrunk however, the amount of documents submitted has only grown.

We knew we needed to automate some of these processes, so we chose the SAS Content Categorization tool on the recommendation of Denise Bedford who was at the World Bank at the time.

One thing that I wanted when we were starting out in 2010 was a community of practice to ask questions, seek advice, or to see what others were doing with this huge, complex application. I wanted a group of peers who were also trying to figure out how to get the most out of this application and who were willing to share their experiences. 

So now you know the impetus for this forum. I cannot in all honesty call myself an expert in using the SAS Content Categorization. The DEC team also is still struggling to implement the SAS Content Categorization module into the workflow. We are hoping to be able to do that this coming year – with any luck and hopefully with a little help from my new friends on this Forum.

Esteemed Advisor
Posts: 7,052

Re: Introducing ourselves

My name is Arthur Tabachneck and I am President of a company that is developing a web-based application using a combination of various products, including Content Categorization Studio, Text Miner, DataFlux and, of course, numerous other tools available in the various SAS suites.

While I have been a SAS user and manager of other SAS users for 40 years (yes, I know that is longer than SAS has existed as the SAS Institute), I am definitely not an expert regarding text analytics. I, for one, hope that this forum will turn out to be a useful way for all of us to obtain the information needed to make the best use of these products.

N/A
Posts: 1

Re: Introducing ourselves

My name is Steve Villa and I am an epidemiology research assistant who is interested in learning the various functions of SAS for data analysis, including text analysis. I am specifically interested in learning how to obtain data for practicing statistical programming.

Frequent Contributor
Posts: 138

Re: Introducing ourselves

Hi i am Manoj Bansal and I am SAS Professional

Contributor
Posts: 71

Re: Introducing ourselves

Hi Folks.  My name is Jared Prins.  I work for the Province of Alberta (in Canada) in the Alberta Tourism, Parks and Recreation Ministry (Parks Division).  I use SAS Text Miner on Surveys with open ended comments.  We've been using Text Miner since 2008.  I hope to expand the use of Text Mining in other areas such as public consultations and letters we receive from the public.

New Contributor
Posts: 2

Re: Introducing ourselves

hi, I'm Susan Doran. I started working with taxonomies, controlled vocabs, in 1989 when a Graduate Assistant at Syracuse University's School of Information Studies, with the US Dept of Ed's ERIC Clearinghouse, tweaking and adding to the ERIC Thesaurus, and doing indexing/abstracting. Subsequently, in every position I've had (from running the info mgmt group at the US Government Accountability Office, to being a nonprofit director, running a consulting firm, and being a hands-on practitioner/information architect) I've created, built, worked with, overhauled, and maintained taxonomies. In 1993-1998 I launched an information services dept for a national nonprofit--my first order of business was setting up the systems for and then overseeing the creation of a massive [for that time] documents database, requiring scanning, OCR, and text analysis of 20 years of the organization's information assets (books, newsletters, white papers, all contents of the library, 10 file cabinets), as well as content of its key members. It was perhaps primitive compared to what's available, but I'm still proud of that effort--which added tremendous value to thousands of users--and plugged me in at the ground floor. At GAO, in the 2000s, my department was responsible for scanning, text analysis, and indexing all of GAO's documents, integrating GAO's taxonomy with the web information architecture, leveraging taxonomy throughout the agency---as well as fine-tuning the text analysis tool and writing rules, to--among other things---prepopulate content on GAO's the web site, and send out topical alerts to Congress. Many other examples, but more recently I've worked with taxonomies that are *associated *with text mining and analysis--and to say I'm fascinated would be an understatement; however, I haven't been able to get into the guts of it as much as I yearn to. In starting a new company, Living Archives, I have a high-level sense of how I would like to make use of programs like SAS, but need to learn more how that might be possible---and how people are using SAS now. From what I read, even in these intros, I can see I'll learn a lot. I don't know how much value I can add immediately, but look forward to being part of this community. Thanks, SAS and Heather!

SAS Employee
Posts: 2

Re: Introducing ourselves

Hi and welcome to the forum!

My name is Dan Zaratsian and I'm an Analytics Consultant, specializing in Text Analytics as SAS.

I am responsible for pre-sales consulting and technology enablement across industries, as well as within SAS. I work closely with customers of all levels (analysts through executives) to provide analytics consulting/strategy, proof-of-concepts, and post-sale training and support. As a specialist in text analytics, I focus on areas where the client can leverage textual data (such as Twitter or Facebook content, blogs and forums, online reviews, call center notes, survey responses, emails, document collections).

I use SAS Text Analytics (and a wide variety of SAS software) on a daily basis and would be happy to help address challenges and/or brainstorm on new opportunities. I have a B.S. in Electrical Engineering from the University of Akron and a M.S. in Analytics from the Institute of Advanced Analytics at North Carolina State University.

Thanks!

SAS Employee
Posts: 7

Re: Introducing ourselves

Hi there and welcome all to the SAS Text and Content Analytics Community!

My name is Michael D. Wallis and I am an SAS Analytics Consultant, operating out of the Advanced Analytics division, specifically within the Text Analytics group.  In this role, I provide direct support for both internally and externally facing personnel (e.g., SAS pre-sales, consulting, PSD, etc.) regarding any and all things "Text Analytics."  Coming from the perspective of R&D, my work focuses on applied use of all SAS Text Analytics products and to understand what underlying algorithms, technological requirements, and research efforts go into our products and solutions.

My background includes 15 years of active software development and research in the areas of natural language processing, computational linguistics, machine learning, and data mining.  I completed both my B.S and M.S. programs in Computer Science at North Carolina State University.

I look forward to actively participating in this forum and to the opportunities for engaging one another for both collaborative learning and support.

Thank you,

Michael D. Wallis

Contributor
Posts: 24

Re: Introducing ourselves

Hi ,

I am Abhijit and I work on SAS Text Analytics tools..

New Contributor
Posts: 3

Re: Introducing ourselves

Hi, I am Bharath

I am new to this community and I am learning SAS tools for Text analytics.

New Contributor
Posts: 3

Re: Introducing ourselves

Hi, I am Bharath

I am new to this community and I am learning SAS tools for Text analytics.

Ask a Question
Discussion stats
  • 12 replies
  • 3708 views
  • 5 likes
  • 11 in conversation