records/records1004/phd_students
December 2010
PhD thesis abstractsChunxi LiuA unified user preference based framework for video content personalization
Nowdays, there are many ways for users to access video resource, and the number of videos grows rapidly. On the other hand, the user's needs become more diversified and personalized. However, people's capacities of using and managing video data have not increased with the growth of the video. The confliction between the user's requirement and the actual technologies results in the "intention gap" between the users and the video data. In order to meet the diverse need of the user and overcome the "intention gap", the video content personalization technologies are required. Compared with the traditional video services, the personalization system can better meet the needs of users, improve service quality, and enhance the user experience. Video content personalization technologies have broad application background and market demand, therefore the research is very important. The traditional personalization recommendation systems, such as the online book recommendation system etc, almost employ the collaborative filtering algorithm. However, the algorithm only considers the similarity between the users and the items for recommendation, and not considers the content of the data. Therefore, it is not suitable for video content personalization. Some works have been contributed to deal with the video content personalization. However due to the diversity and the complexity of the video data, these works are limited to the specific application environment. This thesis proposes a unified video content personalization model. In the model, firstly the structure of the videos is analyzed. Then, by considering the user's requirement the contents of the videos are analyzed. Finally, by ranking the video contents according to the user's preference the video content personalization is achieved. In order to verify the validity and generality of the model, the thesis tests it on three different types of videos: news video, online video and sports video. The experimental results show that the model is valid and the generalization capacity is good. The research results of the thesis have strong practical application value, and set up a guideline in the video content personalization domain.
Frank HopfgartnerPersonalised Video Retrieval: Application of Implicit Feedback and Semantic User Profiles
A challenging problem in the user profiling domain is to create profiles of users of retrieval systems. This problem even exacerbates in the multimedia domain. Due to the Semantic Gap, the difference between low-level data representation of videos and the higher concepts users associate with videos, it is not trivial to understand the content of multimedia documents and to find other documents that the users might be interested in. A promising approach to ease this problem is to set multimedia documents into their semantic contexts. The semantic context can lead to a better understanding of the personal interests. Knowing the context of a video is useful for recommending users videos that match their information need. By exploiting these contexts, videos can also be linked to other, contextually related videos. From a user profiling point of view, these links can be of high value to recommend semantically related videos, hence creating a semantic-based user profile. This thesis introduces a semantic user profiling approach for news video retrieval, which exploits a generic ontology to put news stories into its context. Major challenges which inhibit the creation of such semantic user profiles are the identification of user's long-term interests and the adaptation of retrieval results based on these personal interests. Most personalisation services rely on users explicitly specifying preferences, a common approach in the text retrieval domain. By giving explicit feedback, users are forced to update their need, which can be problematic when their information need is vague. Furthermore, users tend not to provide enough feedback on which to base an adaptive retrieval algorithm. Deviating from the method of explicitly asking the user to rate the relevance of retrieval results, the use of implicit feedback techniques helps by learning user interests unobtrusively. The main advantage is that users are relieved from providing feedback. A disadvantage is that information gathered using implicit techniques is less accurate than information based on explicit feedback. This thesis focuses on three main research questions. First of all, implicit relevance feedback, which is provided while interacting with a video retrieval system, is studied as information source to bridge the Semantic Gap. Therefore, implicit indicators of relevance are identified by analysing representative video retrieval interfaces. Studying whether these indicators can be exploited as implicit feedback within short retrieval sessions, video documents are recommended based on implicit actions performed by a community of users. Secondly, implicit relevance feedback is studied as potential source to build user profiles and hence to identify users' long-term interests in specific topics. This includes studying the identification of different aspects of interests and storing these interests in dynamic user profiles. Finally, this feedback is exploited to adapt retrieval results or to recommend related videos that match the users' interests. The research questions are analysed by performing both simulation-based and user-centred evaluation studies. The results suggest that implicit relevance feedback can be employed in the video domain and that semantic-based user profiles have the potential to improve video exploration.
Ingo KoflerIn-Network Adaptation of Scalable Video Content
This thesis investigates mechanisms and applications for in-network adaptation of scalable video bit streams based on the recent H.264/Scalable Video Coding (SVC) standard. In-network adaptation refers to the adaptation of a video stream by a network element during the stream's transport through the network. The advantages of performing adaptation directly in the network are the availability of local monitoring data and a higher responsiveness according to the current networking conditions. In contrast to previous work in this field, this thesis focuses on the feasibility and realization of in-network adaptation on existing home router platforms. In this context this thesis addresses the following six research objectives. Initially, the relevant transport mechanisms for H.264/SVC and their implications on in-network adaptation (1) were analysed. In the context of this work three different Linux-based router platforms which cover a representative range of residential router devices were used as a basis for further studies and evaluations. In general these platforms can be characterized by rather modest processing capabilities and networking performance. The hardware limitations were identified and quantified in evaluations (2) using both different benchmarks and real network traffic. The offered processing power and memory throughput are roughly 10 to 100 times lower than those of a modern desktop PC. Although their application-layer networking performance is not that low, all platforms fail in fully utilizing their nominal link capacities of 100 and 1000 Mbps, respectively. Based on the known limitations the thesis proposes a stateful, packet-based adaptation mechanism for adapting scalable video bit streams (3). The approach utilizes the RTP payload format for H.264/SVC and represents a light-weight approach for in-network adaptation on the application layer. It further meets the important requirements towards a media-aware network element (MANE) to be signaling aware and to operate statefully. The mechanism was integrated in a proxy service which was deployed on all of the three platforms to prove its feasibility. Experimental evaluations with different video bit streams in standard-definition quality demonstrate the scalability of the approach (4). The results indicate that the proxy service is able to adapt up to 16 concurrent video streams depending on the platform and video bit stream. On two of the three evaluated platforms the proposed approach even allows to handle and to adapt video streams in high-definition quality at bit rates around 15 Mbps. In addition to the proposed H.264/SVC-specific adaptation mechanism, also the applicability of generic metadata-driven adaptation on home router platforms was investigated. In particular, a proof-of-concept study of an XML-metadata-driven approach based on the MPEG-21 generic Bitstream Syntax Description (gBSD) was conducted on the platforms (5). In contrast to former evaluations that have been done on PC-based platforms, the obtained results indicate that the use of this generic adaptation cannot be recommended on such resource limited network devices. The benefits of using in-network adaptation on home router platforms are finally demonstrated in the context of high-definition streaming over IEEE 802.11 wireless networks (6). Monitoring information regarding the queueing delay, which is obviously available exclusively on the router, is used to control the adaptation of the video according to the varying throughput of the wireless link. This allows to react timely to changing conditions particularly in the case of mobile clients.
Jia LiLearning-based Visual Saliency Computation
With the rapid development of Internet, the amounts of images and videos are now growing explosively, leading to many new challenges on image/video processing. On one hand, the processing capability of computer is limited and the computational resource should be allocated to the important visual information with high priorities. On the other hand, the analysis results given by computer should be consistent with human cognition. To solve these two problems, this thesis will focus on learning-based visual saliency computation and the main objective can be described as predicting, locating and mining the important visual information that is consistent with human cognition. The main contributions of this thesis can be summarized as follows: Firstly, this thesis presents a probabilistic multi-task learning approach for computing visual saliency by simultaneously integrating the bottom-up and top-down factors. To the best of our knowledge, it is the first approach that explores the problem of visual saliency computation with the multi-task learning algorithm. In our approach, the bottom-up and the top-down factors are considered simultaneously in a probabilistic framework. In this framework, a bottom-up component simulates the low-level processes in human vision system using multi-scale wavelet decomposition; while a top-down component simulates the high-level processes to bias the competition of the input visual stimuli. Moreover, we propose a multi-task learning algorithm to optimize the models and model fusion strategies for various scenes. Extensive experiments on several datasets show that this approach demonstrates high robustness and effectiveness in computing visual saliency. Secondly, this thesis proposes a cost-sensitive rank learning approach for visual saliency computation. To the best of our knowledge, it is the first approach that formulates the problem of visual saliency computation in a rank learning framework. For the video dataset with sparse eye-fixations, this approach avoids the explicit selection of reliable positive and negative samples. Instead, all the positive and unlabeled data are directly integrated into a cost-sensitive rank learning framework. Experimental results show that the rank learning framework can simultaneously take the influences of local visual attributes and pair-wise ``target-distractor'' correlations into account, resulting in better performance on the video dataset with sparse eye fixations. Thirdly, this thesis presents a multi-task rank learning approach for visual saliency computation. In this approach, the problem of visual saliency computation is formulated in a multi-task rank learning framework to infer multiple saliency models that apply to different scene clusters. In the training process, this approach can infer multiple visual saliency models simultaneously. With an appropriate sharing of information across models, the generalization ability of each model can be greatly improved. Extensive experiments on the eye-fixation dataset show that our approach is highly effective in computing visual saliency in various scenes. Fourthly, the thesis proposes a novel approach for salient object extraction by using complementary saliency maps. Then a video advertising system is developed to demonstrate its feasibility. This system consists of mainly two modules: the pull advertising module and the push advertising module. In these two modules, the interesting/salient objects are extracted through simple user interactions or complementary saliency maps, respectively. These interesting/salient objects, along with the user preferences, are used to provide content-related and user-targeted ads in a low-intrusive way. In the future, this system will be integrated by HuaWei, a worldwide well-known telecommunication company, into their intelligent streaming media service products. In summary, this thesis investigates three important issues in learning-based visual saliency computation. Moreover, tentative studies have been carried out on salient object extraction and its application in saliency-based video advertising. To the best of our knowledge, this thesis presents a systematic study on how to apply machine learning into visual saliency computation for the first time. Moreover, this thesis demonstrates the feasibility and effectiveness of learning-based visual saliency computation. This will spark a great interest of research in the related communities in years to come.
Kalman GraffiMonitoring and Management of Peer-to-Peer Systems
The peer-to-peer paradigm has had large success in content distribution and multimedia communication applications on the Internet. In a peer-to-peer network, the participating nodes create an infrastructure to provide a desired functionality and offer their resources to host an application in a distributed manner. Besides the functional requirements of an application, the non-functional requirements to achieve a high service quality are also an important part of successful peer-to-peer networks and a major challenge is to meet these requirements in networks with unreliable nodes. In contrast to traditional centralized approaches where the quality can be measured and controlled, in a distributed environment it is challenging both to capture the status and performance of the whole distributed system in one point of time and to control its general behavior. In this dissertation, we focus on the monitoring and management of peer-to-peer systems. We systematically engineer SkyEye.KOM, a fully decentralized monitoring mechanism that provides both a precise status snapshot of the peer-to-peer system and enables queries for peer capacities, such as bandwidth or storage capacities, in a large-scale peer-to-peer system. It considers individual load limits of the peers and ensures that no peer is overloaded. The core tree topology of SkyEye.KOM is established and maintained solely with protocol-relevant messages. It is based on local peer identifier calculations and using the underlying peer-to-peer overlay. As a second step, we focus on the management of peer-to-peer systems and introduce P3R3O.KOM and SkyNet.KOM, two solutions to manage both the reservation of available capacities in the peer-to-peer system and the system behavior in a fully decentralized and efficient manner. P3R3O.KOM is a peer-to-peer protocol for reliable long-term resource reservation that overcomes the limitations of traditional peer-to-peer services, which typically are host only by single peers and cease once the service providing peer fails. Resource reservations are fulfilled with adjustable guarantees (even 100%) in the presence of strong churn through the automated and fully decentralized management of the resource provision redundancy. With SkyNet.KOM, we present a fully decentralized approach for automated management of peer-to-peer systems following the principles of autonomic computing. It allows the user or system provider to set service quality goals for the peer-to-peer system, which are automatically verified by the monitoring solution SkyEye.KOM and analyzed, aligned and enforced by the other components of SkyNet.KOM. Preset quality goals for the peer-to-peer system are reached and held through automated systematic re-configuration of the individual components of the peer-to-peer system. At the end, we present LifeSocial.KOM, a peer-to-peer-based platform for online social networks that incorporates the proposed monitoring mechanism to show the feasibility and application scope of the monitoring and management solutions. The impact of the thesis is to be seen in extending the applicability of the peer-to-peer paradigm to quality critical applications and scenarios. Through the monitoring approach, a system provider is able to observe and judge the quality of the peer-to-peer system. Regarding the function of capacity-based peer search, the capacities in a peer-to-peer system may be addressed and used to a full extent, allowing for the creation of applications with rich functionality using a wide set of capacities. Through the proposed management mechanisms, these capacities can also be used reliably in the presence of churn to host services and to establish the peer-to-peer paradigm as a serious and reliable alternative to traditional IT architectures. Additionally, through the automated quality control proposed with SkyNet.KOM, quality-controlled peer-to-peer applications may be created and operated, despite being hosted on a large-scale network of unreliable nodes. Lastly, peer-to-peer-based online social networks show the potential to become the next large application area for the peer-to-peer paradigm. LifeSocial.KOM is one of the first in this category and presents a viable approach for quality-aware peer-to-peer applications that satisfies the needs of both users and system providers.
Razib IqbalAn Architecture for Federated Video Processing and Online StreamingToday access to video is available via numerous multimedia enabled devices through a wide variety of network types. What is required is a mechanism to ensure that users can receive different qualities of video proportional to their device capabilities and network conditions. In this thesis, we propose an online adaptive video streaming approach which uses the Peer-to-Peer (P2P) paradigm to not only distribute the content using peers' bandwidth, but also adapt the video using peers' processing power, while taking into account receiver heterogeneity, watermarking, and perceptual encryption. The proposed adaptive video streaming architecture aims at online video adaptation with streaming in P2P overlays to serve heterogeneous devices including small handhelds. Participating peers therefore contribute with both bandwidth and CPU power. We used the MPEG-21 generic Bitstream Syntax Description (gBSD) as a content metadata format and implemented a 3-in-1 adaptation-watermarking-encryption system for compressed-domain adaptation of video in a P2P fashion. Simulation is used to manifest that the design is robust, reliable, and suitable for multi-participant real-time collaboration and real-life deployment. System performance is validated against an analytical model also developed in the thesis. The specific contributions made in this thesis are:
|
||||||||||||
|
||||||||||||
|
||||||||||||
|