HyFORGE – Hypermedia Edition and Navigation
This research focuses on the organization and the navigation around, possibly heterogenic, multimedia content, both audio and visual. Is the objective therefore to establish a structural framework and common semantics for this type of data via normalized, universal language indexation?
No. The definition of a common language enables to describe multimedia content and to formulate search queries is one thing, the application of this language is another. Certainly, several attempts have been made over recent years to define such languages. They were initiated with the specific goal of introducing new semantic elements into the next version of the Web. We could quote the MPEG-7 initiative or the MXF specification. However, the use of these languages remains limited. That can be explained in essence by the difficulty in automatically accessing the descriptors, or meta-data, from the raw data. This notion goes under the name Semantic Gap and explains the difficulties in developing algorithms of artificial intelligence that can be interpreted at a semantic level. In view of the technologies currently available, it is not yet feasible to use a universal language to index multimedia content.
What are the difficulties linked to the indexation of non-structured content such as sounds or images? What approach can be proposed in order to overcome possible pitfalls?
The first difficulty with the classical indexation of multimedia content such as sounds and videos is to define a descriptive language for these contents that takes into account their temporal and composite nature, in other words those containing constantly evolving data arising from a combination of sources. This language must be complete; it must cover all the content and be understandable to the user. Natural language is certainly an ideal candidate from the user’s point of view, but it is too far removed from computational realities. The second difficulty is to automate the description of multimedia contents by using this language and extracting the meta-data to provide entries in the index. We could ask why man is so successful in the interpretation of audiovisual data. To be succinct, we can say that man applies various cognitive mechanisms that enable him to differentiate sources based on perceived stimuli (bottom-up mechanism), and to compare the elements identified with existing models (top-down mechanism). Our understanding of these mechanisms is still only partial and their application is only possible under controlled conditions (for example, automatic word recognition). Moreover, multimedia data demonstrates extreme variability linked as much to the type of source as to acquisition procedures. As a consequence, software solutions based on this indexation approach are neither reliable nor robust. In the context of the HyFORGE project, we are adopting another approach which is based on the “Human-In-The-Loop” paradigm. The software tools must exploit the fact that the user is the best research tool. Rather than straining to extract semantic indexation keys from the content, research tools must provide the user with a global view of the overall content and enable him to hone his research in a progressive, interactive way by navigation mechanisms within the content itself.
Copyright © WilmaScope
This project is about “peer-to-peer navigation” and classification by data similarity, what do these concepts represent more precisely ?
Classification of data by similarity is a method for structuring different contents. By organizing in a topological, hierarchical way similar contents within clusters, we obtain content separation that is then easier to navigate. The user can then advance more rapidly within this structure by researching clusters which match his enquiry and by avoiding others. We intend to regroup similarities based on various characteristics extracted from the content and to create hypermedia links between similar contents by applying self-organizing algorithms.
The interface often plays a determining role in user choice, what is proposed for HyFORGE applications? What are the challenges for interfaces using 3D or navigation with tactile screen for example?
The design of a software interface is a complex problem that must take into consideration the numerous constraints imposed by user needs and computational realities. Not everything is possible, and what is developed must meet user demands as far as possible and match their skill set. An interface must therefore be functional, intuitive, ergonomic and efficient. In the context of the HyFORGE project, we have identified basic technologies in this sense, such as the visualization of 2D and 3D data, interfaces with zoom, vector graphs and also ambisonics or time compression. Moreover, it is important to design interfaces in line with the materials available. Although we are concentrating on developments linked to conventional material, there are interesting perspectives with specific installations (for example, multipoint interface on large tactile screens).
Copyright © PerceptivePixel
This work contains mention of an inter-media approach, how is it specifically intend to progress in this sense and how will that influence our way of thinking and our navigation habits?
One line of research within the HyFORGE project measures the similarity between heterogenic multimedia contents. As a feature of artistic composition, the user often wants to find audio and visual content that is esthetically compatible. In addition, a sequence of images can become a research criteria in audio data, and a piece of music can be a research key in video data. The aims in this field are still a little vague but the artists, especially in the field of VJing, are calling for tools that will enable them to reinforce the emotions they wish to communicate by generating audio and visual atmospheres. This reinforcing can especially be achieved by the temporal synchronization of data, for example by adjusting the projection of images to match the tempo of music. But it can also be obtained by ensuring coherence between the data sources themselves, for example with the spectral components of sound and image. Let’s be clear that this line of research is highly exploratory and is the artistic branch of the project. It is still too early to be able to exploit an inter-media approach to combine the elements observed within the different modalities (for example the sound of a guitar and a photo of the instrument).
Who stands to benefit from the final applications developed in the context of the HyFORGE project? What kind of usage would be favoured and what advantages would there be for the end user?
The tools that we are developing are appropriate for users needing to structure their multimedia content in order to facilitate subsequent use. That could include the individual user who wishes to organize his videos or personal music files, to the professional user who is aiming to structure his commercial operation (for example libraries of sound effects or images). We have also identified potential applications in the field of data indexation of medical imagery or video surveillance.
Copyright © Apple Inc.
Is it intended to expand this work via a community of external developers by proposing an Open Source development platform, for example?
Setting up a development community requires a solid foundation and a general consensus on the direction of development work. The HyFORGE project is still too immature in this sense and requires a lot of research, not merely development. However, we will call on existing communities for the specific components that will be integrated into our developments, for example facilities for 3D visualization (ex: OpenGL) or sound synthesis (ex: PortAudio).
What kind of artistic collaboration do you envisage in the context of M15/Numediart for the development of your project?
At its core, the HyFORGE project is not artistic, but can serve artists such as composers or video jockeys. However, we are convinced that the approach that we are proposing can become an esthetic search process and thereby create video or audio emotions which artists could use. For example, we have been developing a tool for the treatment of audio signals that would enable temporal compression or expansion, in other words, that the audio data can be heard in an accelerated way by minimizing the distortion and maximizing the perceptual quality. If this tool was initially dedicated to navigate and search sound content, it could be adapted to generate sound effects in real time on stage with actors’ voices or instrument sounds. Besides, its visual abstraction can be somewhat esthetic. Let us conclude by saying that we are open to all ideas or propositions coming from artists whose creative imagination often surpasses the narrow awareness of engineers.