Open Source Projects

This section lists multimedia open source projects. Feel free to contribute to this list!

  • Advene: aims at providing a model and a format to share annotations about digital video documents
  • Amalia.js: is an extensible and versatile HTML5 multimedia player that allows you to view any type of metadata along with your video or audio streams.
  • Ambulant: an open-source media player with support for SMIL 3.0
  • Aurio: is a .NET library that focuses on audio processing, analysis, media synchronization and media retrieval and implements various audio fingerprinting methods
  • Bob: a free signal processing and machine learning toolbox
  • Caffe: is a deep learning framework developed with cleanliness, readability, and speed in mind
  • Caliph & Emir: are MPEG-7 based Java prototypes for digital photo and image annotation and retrieval supporting graph like annotation for semantic metadata and content based image retrieval using MPEG-7 descriptors
  • Clam: is a full-fledged software framework for research and application development in the Audio and Music Domain
  • ClassX: is an interactive lecture streaming system developed by the Image, Video, and Multimedia Systems (IVMS) research group at Stanford University
  • ChucK:  is a new audio programming language for real-time synthesis, composition, performance, and analysis
  • CUDA Surf: a GPU implementation of the SURF algorithm
  • DisplayCast: a high-performance screen sharing system for intranets
  • DASH: MPEG-DASH (Dynamic Adaptive Streaming over HTTP) implementation incl. DASHEncoder, DASHPlayer (VLC Plugin and library 'libdash'), and dataset
  • DirectFB: a thin library that provides hardware graphics acceleration, input device handling and abstraction, integrated windowing system with support for translucent windows and multiple display layers, not only on top of the Linux Framebuffer Device
  • DruCall: is the WebRTC calling module for Drupal
  • eRS: A system to facilitate emotion recognition in movies 
  • ESSENTIA: is an open-source C++ library for audio analysis and audio-based music information retrieval. It contains an extensive collection of reusable algorithms
  • FALCON: an open source tool for music identification, written in Java and based on the popular search engine library Lucene
  • FCam: an open-source C++ API for easy and precise control of digital cameras
  • Fertilized Forests: has the aim to provide an easy to use, easy to extend, yet fast library for decision forests
  • FLANN: a library for performing fast approximate nearest neighbor searches in high dimensional spaces
  • Flavor: is an object-oriented media representation language designed for simplifying the development of applications that involve a significant media processing component 
  • Fobs (inactive): is a set of object oriented APIs to deal with media. It relies in the ffmpeg library, but provides developers with a much simpler programming interface
  • GamingAnywhere: is an open-source clouding gaming platform, highly extensible, portable, and reconfigurable. It currently supports Windows and Linux, and can be ported to other OS's including OS X and Android
  • Ginga: the middleware of the Japanese-Brazilian Digital TV System (ISDB-TB) and ITU-T Recommendation for IPTV services. Ginga is made up by a set of standardized technologies and Brazilian innovations that make it the most advanced middleware specification
  • GoalBit: a free open source video streaming platform capable of distributing high-bandwidth live video content to everyone preserving its quality
  • Golden Retriever: is a light-weight but complete framework for implementing CBIR (Content Based Image Retrieval) methods
  • GPAC: an Open Source multimedia framework for research and academic purposes. The project covers different aspects of multimedia, with a focus on presentation technologies (graphics, animation and interactivity)
  • GpuCV: is an open-source GPU-accelerated image processing and Computer Vision library
  • Hop: is a multi-tier programming language for the Web 2.0 and the so-called diffuse Web
  • ImageTerrier: is an open-source, scalable, high-performance search engine platform for content-based image retrieval applications
  • ImproveMyCity: ImproveMyCityMobile is an open source smartphone application for Android devices that allows citizens to report all types of issues related to their neighborhood
  • IRRLICHT: an open source high performance realtime 3D engine written and usable in C++ and also available for .NET languages
  • ITK: an open-source, cross-platform system that provides developers with an extensive suite of software tools for image analysis
  • Jitsi: an audio/video Internet phone and instant messenger written in Java. It supports some of the most popular instant messaging and telephony protocols
  • jReality:  is a Java based, open-source, full-featured 3D scene graph package designed for 3D visualization and specialized in mathematical visualization
  • JSCommunicator: a SIP communication tool developed in HTML and JavaScript
  • Kaltura: open source video solutions
  • libdash: libdash is a C++ library that provides an object oriented (OO) interface to the MPEG-DASH standard. It’s the official reference software of the ISO/IEC MPEG-DASH standard and supports the full range of DASH264 test vectors of the DASH Industry Forum
  • libLDB: is a C++ library for extracting an ultrafast and distinctive binary feature — LDB (Local Difference Binary) from an image patch. libLDB is suitable for vision apps which require real-time performance
  • LibPMK: a C++ implementation of Grauman and Darrell's Pyramid Match algorithm. This toolkit provides a flexible framework with which you can quickly match sets of image features and run experiments
  • Lire: provides a simple way to retrieve images and photos based on their color and texture characteristics
  • Live555: Internet streaming media, wireless, and multicast technology, services, & standards
  • Loki+Lire: is a framework for the creation of web-based interfaces for search, annotation and presentation of multimedia data
  • Lumicall: an open-source SIP / ENUM client for Android
  • LVL: small, fast, extensible C library for early computer vision. Not using opencv
  • MatConvNet: is a MATLAB toolbox implementing Convolutional Neural Networks (CNNs) for computer vision applications
  • Menpo: is a statistical modelling toolkit, providing all the tools required to build, fit, visualize, and test deformable models like Active Appearance Models, Constrained Local Models and Supervised Descent Method
  • Miro: open source music and video player
  • Mixare: is a free open source augmented reality browser, which is published under the GPLv3
  • MNM: Network-Integrated Multimedia Middleware
  • MPEG-7 Library: is a set of C++ classes, implementing the MPEG-7 standard (ISO/IEC 15938:2001 and 15938:2004)
  • MXM: the purpose of the MPEG eXtensible Middleware (MXM) standard is to promote the extended use of digital media content through increased interoperability and accelerated development of components, solutions and applications
  • openBliSSART: is a C++ framework and toolbox that provides Blind Source Separation for Audio Recognition Tasks
  • OpenCast Matterhorn: is a free, open-source platform to support the management of educational audio and video content
  • OpenCV: has > 500 algorithms, documentation and sample code for real time computer vision
  • openFrameworks: is an open source C++ toolkit for creative coding
  • OpenIMAJ: is a collection of libraries for multimedia analysis written in the Java programming language
  • OpenIP: is an open source project to create a C++ library providing the most common methods and algorithms in the field of image processing and computer vision
  • Open Media Foundation: Putting the power of the media and technology in the hands of the people
  • OpenMusic: http://repmus.ircam.fr/openmusic/home
  • OpenNI:  is an industry-led, not-for-profit organization formed to certify and promote the compatibility and interoperability of Natural Interaction (NI) devices, applications and middleware
  • OpenSim: is an open source multi-platform, multi-user 3D application server. It can be used to create a virtual environment (or world)
  • OpenSMILE: is a feature extraction tool that enables to extract large audio feature spaces in realtime
  • OpenSURF: aims to find salient regions in images which can be found under a variety of image transformations. SURF is a faster alternative to SIFT
  • Open SVC Decoder: is the name given to an extension of the H.264/MPEG-4 AVC video compression standard which has been developed jointly by ITU-T and ISO/IEC JTC 1.
  • Open Video Alliance: is the movement to promote free expression and innovation in online video
  • Orcc: the Open RVC-CAL Compiler (Orcc) can generate code for any platform, including hardware (Verilog, VHDL), software (C, Java), heterogeneous platforms (mixed hardware/software), and multi-softcore platforms, from a platform-agnostic, high-level description
  • Popcorn.js: is an HTML5 media framework written in JavaScript for filmmakers, web developers, and anyone who wants to create time-based interactive media on the web
  • Processing: is an open source programming language and environment for people who want to create images, animations, and interactions
  • PyCVF: Python computer vision framework
  • Rainbow (inactive): provide native audio and video recording capabilities in the browser through a JavaScript API
  • realXtend: offers a free open source virtual world platform
  • reSIProcate: C++ implementation of SIP, ICE, TURN and related protocols
  • SCRAPE: is an open source project that enables transferring 3D worlds developed in Processing to a CAVE Automatic Virtual Environment (CAVE)
  • SDL: is a cross-platform multimedia library designed to provide low level access to audio, keyboard, mouse, joystick, 3D hardware via OpenGL, and 2D video framebuffe
  • Sirannon (inactive): aims at being a modular multimedia streamer and receiver
  • SIVA Suite: is an open source framework for the creation, playback and administration of hypervideos
  • SELab: Sensory Experience Lab including SEVino, SESim, SEMP, and AmbientLib for creating, parsing, and rendering sensory effects compliant to MPEG-V
  • SemAuth: s an xslt script which translates a freemind mindmap into a website (i.e. set of xhtml pages)
  • SIFT: is a method to detect distinctive, invariant image feature points, which easily can be matched between images to perform tasks such as object detection and recognition, or to compute geometrical transformations between images
  • SiftGPU: a GPU implementation of the SIFT algorithm
  • SINGA: a general distributed deep learning platform for training big deep learning models over large datasets
  • Social Signal Interpretation (SSI): it is a framework that offers tools to record, analyse and recognize human behavior in real-time, such as gestures, mimics, head nods, and emotional speech
  • Sonic Visualizer: its aim is to be the first program you reach for when want to study a musical recording rather than simply listen to it
  • Srikata: is a BSD-licensed open source platform for virtual worlds. We aim to provide a set of open libraries and protocols which anyone can use to deploy a virtual world
  • Stage Framework: is an open source project that uses a web application to deliver HTML5 based magazine content. Each published magazine consists of HTML resources which represent the magazine pages
  • Streeme: helps to turn your home computer into a full featured music server with minimal effort
  • SWFmil: is a tool for processing Adobe Flash files. It allows to transform the binary Flash files into an XML-based representation (swf2xml) for further processing. In addition, XML files adhering to the swfmill format can be turned back into the binary Flash format.
  • TAPESTREA: is a unified framework for interactively analyzing, transforming and synthesizing complex sounds 
  • Theia: a computer vision libraryaimed at providing efficient and reliable algorithms for Structure from Motion
  • TimeSheets: relies on declarative W3C standards (SMIL Timing and SMIL Timesheets) to synchronize HTML content
  • Time Style Sheets (TSS): a set of document extensions that allow timing and synchronization of HTML elements within a Web page to be specified with CSS 
  • TOP-SURF: is an image descriptor that combines interest points with visual words, resulting in a high performance yet compact descriptor that is designed with a wide range of content-based image retrieval applications in mind
  • Torch 3 Vision: It's a machine vision library, written in simple C++ and based on the Torch machine-learning library
  • Torch 5: provides a Matlab-like environment for state-of-the-art machine learning algorithms. Torch5 is the official successor of Torch3
  • Trackmate: is an open source initiative to create an inexpensive, do-it-yourself tangible tracking system, allowing any computer to recognize tagged objects when placed on a surface
  • Tribler: is an application that enables its users to find, enjoy and share content. With content we mean video, audio, pictures, and much more
  • UltraGrid: is a software implementation of high-quality low-latency video and audio transmissions using commodity PC and Mac hardware
  • VideoJS: HTML5 video player
  • Video LAN (VLC): is a free and open source cross-platform multimedia player and framework that plays most multimedia files as well as DVD, Audio CD, VCD, and various streaming protocols
  • VideoLat: is a tool to help you analyse video delays, mainly aimed at conferencing applications. It provides an innovative approach to understand glass-to-glass video delays and speaker-to-microphone audio delays
  • Vireo-VH: Video Hyperlinking provides end-to-end support for threading and visualizing large video collections. It includes components for near-duplicate keyframe retrieval, partial near-duplicate video alignment, and Galaxy visualization
  • VLFeat:  is an open source library that implements popular computer vision algorithms including SIFT, MSER, k-means, hierarchical k-means, agglomerative information bottleneck, and quick shift
  • XFace: is a set of open source tools for creation of MPEG-4 and keyframe based 3D talking heads
  • Xkin: is an open source library that enables users to build hand gesture based applications employing Kinect sensor
  • Yael: is a library implementing computationally intensive functions used in large scale image retrieval, such as neighbor search, clustering and inverted files
  • Waisda?: is a video labeling game in the cultural heritage field. The goal is to collect user tags that can help bridge the semantic gap, to collect time-related metadata, and to offer people a new way of interacting with television programs
  • WATSS: a web based annotation tool to create ground truth for datasets related to visual surveillance and behavior understanding