PhD thesis abstracts

September 2011

Doreen Boehnstedt

Semantic Tagging for Managing Web-based Learning Resources: Models, Methods and a Plattform for Supporting Resource-based Learning

The knowledge explosion, changing circumstances due to new forms of work and many technical developments determine that the knowledge acquired in education is not sufficient throughout life. Therefore, self-directed learning in the workplace is becoming increasingly important. This is a form of learning where a current information need is met by the self-directed interaction with a wide range of digital resources. Therefore, this learning is called Resource-based Learning. Increasingly, the importance of the Web as an information source grows because it provides many resources that can be used for learning purposes.

However, self-directed Resource-based Learning also poses many challenges to learners. First, digital resources on the Web are usually not didactically prepared and therefore are not intended to be used as learning materials. In addition, the relevant information is often distributed across many different websites. Further, there is already a very large but still rapidly increasing amount of information available on the Web, which can lead to information overload. In the scenario of self-directed learning considered here, there is no teacher who structures the learning process. Therefore, learners have to independently determine their information needs and plan their proceeding. They have to identify, annotate and organize relevant resources for future use. This makes an appropriate management of resources necessary. However, the majority of learners is unsatisfied with the currently available possibilities for the organization of Web resources.

The goal of this thesis is therefore the design and development of a tool to support learners in Resource-based Learning. In particular, the management of resources should be supported and hence challenges mentioned above are addressed.

Self-directed Resource-based Learning requires a personal information and knowledge management by the learners. In literature, several models for managing information and knowledge in organizational and personal scopes exist. For self-directed Resource-based Learning such a model is missing so far. Therefore, a model for Resource-based Learning is developed based on the existing models and on a questionnaire survey conducted in the context of this thesis. This model encompasses several process steps that should be supported by the tool.

The management of resources necessitates the learners to appropriately store the resources, such as based on topic of interest or task to be executed. Tagging is a simple and accepted way to manage any resource on the Web, but its power of expression is restricted. Other forms of resource management can be found in the area of formal knowledge organization (e.g. modeling of a semantic network), however, expert knowledge is usually required to build a semantic network. As a basis for the tool that is developed in the context of this work, therefore, a combination of both forms is proposed, i.e. a semantic network that is created and expanded by the learners using tagging. Core components of this network are resources and tags. Additionally, a learner is able to assign a type to each tag. Therefore, the information whether the tag is e.g. a topic or task can be stored. As part of this thesis, the types of tags that are necessary for the scenario of Resource-based Learning have been analyzed and evaluated.

Furthermore, an algorithm for automatic detection of these tag types is presented, as such an algorithm can reduce the manual maintenance effort for the management of resources. The evaluation of various corpora shows that the knowledge-based algorithm can classify a tag already during the tagging process with an accuracy which is sufficient for the scenario.

Based on the developed model of Resource-based Learning and its requirements for the management of resources, different tools and systems are analyzed with regard to their support of Resource-based Learning. None of the related tools fulfill the requirements appropriately.

Therefore, on the basis of the model's process steps and the derived functional requirements a concept for a supporting tool is developed. Based on the technical requirements, a system is designed, consisting of a browser add-on, a backend for the management of the knowledge networks and a web-based frontend. The tool is implemented and evaluated in user studies eventually.

The user studies conducted in this work show that the extended form of tagging, based on tag types, is well accepted and allows for appropriate management of resources. Furthermore, the studies show that the implemented tool addresses the challenges of self-directed Resource-based Learning adequately. The present work thus creates a basis for optimizing the approach to self-directed interaction with resources in order to meet an information need.

Advisor(s): Ralf Steinmetz, Ulrik Schroeder

SIG MM member(s): Ralf Steinmetz


Multimedia Communications Lab

Jun Wang

Semi-Supervised Learning for Scalable and Robust Visual Search

Unlike textual document retrieval, searching of visual data is still far from satisfactory. There exist major gaps between the available solutions and practical needs in both accuracy and computational cost. This thesis aims at the development of robust and scalable solutions for visual search and retrieval. Specifically, we investigate two classes of approaches: graph-based semi-supervised learning and hashing techniques. The graph-based approaches are used to improve accuracy, while hashing approaches are used to improve efficiency and cope with large-scale applications. A common theme shared between these two subareas of our work is the focus on semi-supervised learning paradigm, in which a small set of labeled data is complemented with large unlabeled datasets.

Graph-based approaches have emerged as methods of choice for general semi-supervised tasks when no parametric information is available about the data distribution. It treats both labeled and unlabeled samples as vertices in a graph and then instantiates pairwise edges between these vertices to capture affinity between the corresponding samples. A quadratic regularization framework has been widely used for label prediction over such graphs. However, most of the existing graph-based semi-supervised learning methods are sensitive to the graph construction process and the initial labels. We propose a new bivariate graph transduction formulation and an efficient solution via an alternating minimization procedure. Based on this bivariate framework, we also develop new methods to filter unreliable and noisy labels. Extensive experiments over diverse benchmark datasets demonstrate the superior performance of our proposed methods.

However, graph-based approaches suffer from the critical bottleneck in scalability since graph construction requires a quadratic complexity and the inference procedure costs even more. The widely used graph construction method relies on nearest neighbor search, which is prohibitive for large-scale applications. In addition, most large-scale visual search problems involve handling high-dimensional visual descriptors, thereby causing another challenge in excessive storage requirement. To handle the scalability issue of both computation and storage, the second part of the thesis focuses on efficient techniques for conducting approximate nearest neighbor (ANN) search, which is key to many machine learning algorithms, including graph-based semi-supervised learning and clustering. Specifically, we propose Semi-Supervised Hashing (SSH) methods that leverage semantic similarity over a small set of labeled data while preventing overfitting. We derive a rigorous formulation in which a supervised term minimizes the empirical errors on the labeled data and an unsupervised term provides effective regularization by maximizing variance and independence of individual bits. Experiments on several large datasets demonstrate the clear performance gain over several state-of-the-art methods without significant increase of the computational cost.

The main contributions of the thesis include the following.

1) a bivariate formulation for graph-based semi-supervised learning with an efficient solution by alternating optimization; b) theoretic analysis from the view of graph cut for the bivariate optimization procedure; c) novel applications of the proposed techniques, such as interactive image retrieval, automatic re-ranking for text based image search, and a brain computer interface (BCI) for image retrieval.

2) a rigorous semi-supervised paradigm for hash functions learning with a tradeoff between empirical fitness on pair-wise label consistence and an information-theoretic regularizer; b) several efficient solutions for deriving semi-supervised hash functions, including an orthogonal solution using eigen-decomposition, a revised strategy for learning non-orthogonal hash functions, a sequential learning algorithm to derive boosted hash functions, and an extension to unsupervised cases by using pseudo labels.

Two parts of the thesis - bivariate graph transduction and semi-supervised hashing - are complimentary and can be combined to achieve significant performance improvement in both speed and accuracy. Hash methods can help build sparse graphs in a linear time fashion and greatly reduce the data size, but they lack sufficient accuracy. Graph-based methods provide unique capabilities to handle non-linear data structures with noisy labels but suffer from high computational complexity. The synergistic combination of the two offers great potential for advancing the state-of-the-art in large-scale visual search and many other applications.

Advisor(s): Shih-Fu Chang

SIG MM member(s): Shih-Fu Chang



The DVMM Lab at Columbia University is dedicated to research of new theories, algorithms, and systems for multimedia content analysis, search, communication, and forensics, with a primary focus on digital video. It hosts faculty, students, and visiting researchers, conducting research as well as development of multimedia technologies, testbeds, and standards.

Our current research activities focus on five areas: multimedia search and retrieval, pervasive media and mobile communication, machine learning and object recognition, media security and forensics, multimedia standard, testbed, and benchmarking.

Philipp Scholl

Semantic and Structural Analysis of Web-based Learning Resources - Supporting Self-directed Resource-based Learning

In the knowledge-based society, the maintenance and acquisition of new knowledge are vital for each individual. Changed living and working conditions and the rapid development of technology cause the half-life of knowledge to decrease. Therefore, the knowledge that is acquired in educational institutions is no longer sufficient for an entire lifetime. Thus, self-directed learning at the workplace and in private life is becoming more and more important. At the same time, the Web has become a very important source for knowledge acquisition, as it provides a huge amount of resources containing information that can be utilized for learning purposes. This form of self-directed learning that often involves learning with web resources is commonly referred to as Resource-Based Learning. In particular, it is characterized by a high degree of freedom in choice of resources and execution of the learning process. When utilizing web resources as learning materials, learners face novel challenges: First, relevant information that covers the specific information need of a learner is often distributed over several web resources. This challenge can be addressed by providing adequate retrieval strategies where retrieval is not only restricted to a web search but also involves content that learners have already considered to be relevant. However, the so-called vocabulary gap - the fact that information can be expressed in completely different terminology, e.g. in technical terms or colloquial language - makes retrieval difficult. Further, in contrast to Learning Objects that are often used in educational institutions, web resources rarely include well-structured metadata. As Resource-Based Learning using web resources requires learners to handle and organize a large number of web resources efficiently, the availability of relevant metadata is vital. Eventually, in the majority of self-directed learning settings, the role of the teacher or tutor does not exist. These authorities usually set learning goals according to a curriculum, structure the learning process and assess the learning result. In self-directed learning, the learner has to take over these tasks which would otherwise have been accomplished by the teacher. This thesis examines this form of Resource-Based Learning and derives adequate mechanisms to support this kind of learning. The requirements of supporting Resource-Based Learning are deduced and, based on these requirements, the design and the implementation of a tool called ELWMS.KOM is presented. ELWMS.KOM is a tool that enables learners to organize their self-directed learning process and the contributing learning resources in a personal knowledge network by applying semantically typed tags. In particular, web resources are focused. Web resources are primarily not intended to be used for learning and thus, are rarely didactically adapted to learning scenarios. Further, they infrequently expose metadata that are relevant for learners. ELWMS.KOM is designed to attenuate these short-comings and the resulting challenges for learners by providing an appropriate level of support. The contributions of this thesis comprise of the derivation and implementation of paradigms and technologies that enable such a supporting functionality in ELWMS.KOM. Based on an examination of Learning Objects that are commonly used in learning scenarios in educational institutions, the peculiarities and differences to self-directed learning paradigms are analysed and design decisions for ELWMS.KOM are inferred. These design decisions represent a foundation for the supporting functionalities that are proposed in this thesis. Firstly, the technologies are presented that enable ELWMS.KOM to recommend tags and learning resources to the learner based on a semantic representation of their content. A user study based on ELWMS.KOM shows the need to support monolingual as well as cross-lingual approaches to recommend semantically related tags and resources. An analysis of the approach that has been chosen to determine semantic relatedness is presented. Based on this analysis, several strategies are compared that show potential to reduce the computational complexity of this approach without considerably reducing its quality. Additionally, several extensions to improve the quality this approach that incorporate supplementary semantic properties of a reference corpus are presented and evaluated. Furthermore, this thesis presents an approach to automatically segment web resources in order to support learners in the selection of relevant fragments of a web resource. This segmentation is based on a structural and visual analysis of web resources and yields a set of coherent segments. A user study confirms the quality of this approach. In addition, an approach is introduced that supports learners in the consistent creation of their tagging vocabulary in ELWMS.KOM for the semantic tag type Type. This approach automatically recognizes the web genre of a web resource and is language-independent. Novel features have been developed that allow a reliable classification of web genres. Several evaluations using different feature sets and corpora are presented. Finally, this thesis introduces the tag type Goal that supports learners to plan, execute and evaluate their overall learning process. This support feature has been derived from the theory of Self-Regulated Learning and has been implemented accordingly in ELWMS.KOM. The benefits are shown in two large-scale user studies that have been executed with ELWMS.KOM and the implemented goal setting mechanisms.

Advisor(s): Ralf Steinmetz, Wolfgang Effelsberg

SIG MM member(s): Ralf Steinmetz, Wolfgang Effelsberg


Multimedia Communications Lab

Wanmin Wu

Human-centric Control of Video Functions and Underlying Resources in 3D Tele-immersive Systems

3D tele-immersion (3DTI) has the potential of enabling virtual-reality-like interaction among remote people with real-time 3D video. However, today's 3DTI systems still suffer from various performance issues, limiting their broader deployment, due to the enormous demand on temporal (computing) and spatial (networking) resources. Past research focused on system-centric approaches for technical optimization, without taking human users into the loop. We argue that human factors (including user preferences, semantics, limitations, etc.) are an important and integral part of the cyber-physical 3DTI systems, and should not be neglected.

This thesis proposes a novel, comprehensive, human-centric framework for improving the qualities of 3DTI throughout its video function pipeline. We make three major contributions at different phases of the pipeline. At the sending side, we develop an intra-stream data adaptation scheme that reduces level-of-details within each stream without users being aware of it. This human-centric approach exploits limitations of human vision, and excludes details that are imperceptible. It effectively alleviates the data load for computation-intensive operations, thus improves the temporal efficiency of the systems. Yet even with intra-stream data reduced, spatial efficiency is still a problem due to the multi-stream/multi-site nature of 3DTI collaboration. We thus develop an inter-stream data adaptation scheme at the networking phase to reduce the number of streams with minimal disruption to the visual quality. This human-centric approach prioritizes streams based on user views and excludes less important streams from transmission. It considerably reduces the data load for networking, and thus enhances the spatial resource efficiency. The above two approaches (level-of-details reduction within a video stream and view-based differentiation among streams) work seamlessly together to bring both temporal and spatial resource demands under control, and prove to improve various qualities of the systems. Finally, at the receiving side, we take a holistic approach to study the ``quality'' concept in 3DTI environments. Our human-centric quality framework focuses on the Quality-of-Experience (QoE) concept that models user's perceptions, emotions, performances, etc. It investigates how the traditional Quality-of-Service (QoS) impacts QoE, and reveals how QoS should be improved for the best user experience. This thesis essentially demonstrates the importance of bringing human-awareness into the design, execution, and evaluation of the complex resource-constrained 3DTI environments.

Advisor(s): Klara Nahrstedt (Advisor)

SIG MM member(s): Wanmin Wu


Multimedia Operating Systems and Networking (MONET)

Research in the MONET research group focuses on system software issues to provide services and protocols for end-to-end Quality of Service (QoS) guarantees for distributed multimedia applications, leveraging the best effort services provided by the underlying operating system and networks. Toward this goal, we are doing research in a broad area including (but not limited to):

- Multimedia operating systems

- Multimedia communication protocols

- QoS middleware and large scale distributed systems

- Multimedia security and trustworthy computing systems

- Advanced tele-immersive and multimedia applications

- High speed QoS routing and ad hoc networks

Previous Section Table of Contents Next Section