Tutorial on Semantic Annotation of Web Content

Ebrahim Bagheri, Jelena Jovanovic, Semantic Annotation of Web Content, Tutorial 2014


The World Wide Web is a vast collection of interconnected (linked) content accessible through the Internet. Such content cover information on every imaginable topic. However, the semantics of the knowledge contained within has remained, to a large extent, unattainable to the machines tasked with storage, indexing, and retrieval. This is because a large portion of this collection is unstructured and free-form. This format is readable by humans but not natively usable by computers. Unlocking this content would allow for the development of smarter tools capable of performing reasoning tasks, intelligent search, and factoid-based question and answering. Semantic technologies attempt to represent information in some formal structure capable for use by machines. While techniques for knowledge manipulation and reasoning have been around for some time, methods for transforming large-scale unstructured textual content from the Web into structured knowledge bases has just recently gained wide attention. Such methods include the use of ontologies, natural language processing (NLP) and machine learning techniques to map unstructured content into structured data suitable for automated reasoning. In this tutorial, we introduce the audience to the state-of-the-art in semanticizing unstructured textual content: what it is, its importance, what tools are available, and how it can be used to provide next-generation state-of-the-art web applications.


Part 1: Semantic Web technologies and Linked (Open) Data (Part I)

  • Today’s Web: social and syntactic; problems and challenges
  • Ontologies and semantic markup (RDF/RDFS/OWL)
  • Embedded semantics (RDFa, Microdata)
  • Semantic Web and its applications
  • Linked (Open) Data and structured knowledge bases on the Web

Part 2: Automated semantic annotation of text-based content (Part II)

  • Kinds of semantic annotators and annotation tasks
  • Typical semantic annotation process
  • State-of-the-art semantic annotation tools
  • Open questions and challenges
  • Other relevant approaches to text comprehension


Ebrahim Bagheri is an Assistant Professor at Ryerson University and has been active in the areas of the Semantic Web and Software Engineering. He was one of the research theme leaders of a national project on radiation emission monitoring supported by the National Research Council Canada and was responsible for leading the development of the Semantic Web and Knowledge Engineering components of the project. Also, he has been involved in projects that encompass the use of NLP and Semantic Web technologies in the areas of e-commerce and business process modeling. He currently leads projects worth over $3M CAD in the area of semantics-enabled information retrieval and software engineering. Ebrahim has also extensive teaching experience both online and in-classroom in core areas related to Software Engineering and Computer Science. He is an IBM Faculty Fellow, and a Senior Member of the IEEE. For more information, please see http://www.ee.ryerson.ca/people/Bagheri.html

Jelena Jovanović is an Associate Professor at the Department of Software Engineering, University of Belgrade, Serbia. Her research work is primarily in the areas of knowledge representation and semantic technologies, and their application in the domain of technology enhanced learning and knowledge management. She has participated in a number of projects and published numerous refereed papers in these research areas. She has also been teaching semantic technologies, knowledge representation and related AI technologies both at undergraduate and postgraduate level. She can be reached at http://jelenajovanovic.net