What is it really all about?

Vocabulary Building

Enterprise vocabularies are typically built to suit the content used by the organization. In the process of creating or maintaining this vocabulary, terms are added or modified over time as new concepts are discovered or needs are identified. While text analytics can be viewed as an exploratory tool for content, analyzing known content for terminology to include in enterprise vocabularies is a way of reinforcing established concepts.

Deliberately selecting a small body of known training content to match to existing vocabulary terms is a way of ensuring the taxonomy is still providing adequate coverage for existing content. By the same token, known content can be used to identify and extract unknown concepts to build out the taxonomy in areas which are undeveloped. Either method is a way to build and maintain the enterprise taxonomy for use in categorizing content.

Auto-Categorization

One of the most common enterprise use cases for text analytics processes is the auto-categorization of content. While using controlled vocabularies offer benefits in the application of consistent metadata to content for identification and retrieval, the barrier to taxonomy adoption has often been the labor-intensive building, maintaining, and application of taxonomies and their values to content. Tools may be getting better at the automatic creation of taxonomies, but few, if any, offer taxonomy generation in any complete and usable state. What is automated, however, is the application of taxonomy values as metadata to content.

Auto-categorization works best with known vocabularies and known content. For example, a news publisher may write and publish news articles which need to be discoverable on a publicly accessible website. Onsite search or web search applications index content and make it discoverable. Most work from the content itself and use various methods to rank the page for returned results. The most notable, of course, is Google’s PageRank. Although modern search engines are far more sophisticated than simply matching keywords, having embedded meta tags which describe the document as a whole are best supplied from a common vocabulary so they are consistent on all content across the site and even between sites. The rapid velocity of news story generation, publication, and sharing requires meta tags to be applied more quickly than is practical by manual application. Thus, using a text analytics tool to identify and match concepts appearing in the content to controlled vocabulary concepts speed the application of metadata. As content changes between versions and over periods of time, the tagging taxonomy is also continually updated to cover concepts. Likewise, the text analytics tool continues to evolve rules-based categorization so concepts which are directly or even indirectly found in text can be tagged.

Content analytics can be defined as unlocking business value from unstructured content via semantic technologies to find answers to important questions or discover causes to certain trends. Companies can use content analytics to understand the content that is created, how it is used, the context it is in and the nature of that content. Content analytics is all about unstructured data and it can be used to explain trends in structured data and provide valuable insights to organisations.

Content analytics is especially relevant for organisations where knowledge is at the core of their business. When that is the case, ordinary business intelligence is not sufficient anymore. Knowing who read what content, when, how often it was shared, the number of clicks, the location of visitors, etc., so-called metadata, is simply not sufficient for a deep understanding. You should also care about whether the content is actually useful to your audiences and that it serves the objective it was intended for. Content related trends or insights revealing information such as what content caused a drop in sales can help make better content and thus drive growth or revenue. It is not about having more content; it is about having better content.

All About Text Analytics

What is it really all about?

Understanding Content Analytics