HyFORGE — Hypermedia Edition and Navigation

 

This research focuses on the orga­ni­za­tion and the nav­i­ga­tion around, pos­si­bly het­ero­genic, mul­ti­me­dia con­tent, both audio and visual. Is the objec­tive there­fore to estab­lish a struc­tural frame­work and com­mon seman­tics for this type of data via nor­mal­ized, uni­ver­sal lan­guage indexation?

No. The def­i­n­i­tion of a com­mon lan­guage enables to describe mul­ti­me­dia con­tent and to for­mu­late search queries is one thing, the appli­ca­tion of this lan­guage is another. Cer­tainly, sev­eral attempts have been made over recent years to define such lan­guages. They were ini­ti­ated with the spe­cific goal of intro­duc­ing new seman­tic ele­ments into the next ver­sion of the Web. We could quote the MPEG-7 ini­tia­tive or the MXF spec­i­fi­ca­tion. How­ever, the use of these lan­guages remains lim­ited. That can be explained in essence by the dif­fi­culty in auto­mat­i­cally access­ing the descrip­tors, or meta-data, from the raw data. This notion goes under the name Seman­tic Gap and explains the dif­fi­cul­ties in devel­op­ing algo­rithms of arti­fi­cial intel­li­gence that can be inter­preted at a seman­tic level. In view of the tech­nolo­gies cur­rently avail­able, it is not yet fea­si­ble to use a uni­ver­sal lan­guage to index mul­ti­me­dia content.

 

What are the dif­fi­cul­ties linked to the index­a­tion of non-structured con­tent such as sounds or images? What approach can be pro­posed in order to over­come pos­si­ble pitfalls?

The first dif­fi­culty with the clas­si­cal index­a­tion of mul­ti­me­dia con­tent such as sounds and videos is to define a descrip­tive lan­guage for these con­tents that takes into account their tem­po­ral and com­pos­ite nature, in other words those con­tain­ing con­stantly evolv­ing data aris­ing from a com­bi­na­tion of sources. This lan­guage must be com­plete; it must cover all the con­tent and be under­stand­able to the user. Nat­ural lan­guage is cer­tainly an ideal can­di­date from the user’s point of view, but it is too far removed from com­pu­ta­tional real­i­ties. The sec­ond dif­fi­culty is to auto­mate the descrip­tion of mul­ti­me­dia con­tents by using this lan­guage and extract­ing the meta-data to pro­vide entries in the index. We could ask why man is so suc­cess­ful in the inter­pre­ta­tion of audio­vi­sual data. To be suc­cinct, we can say that man applies var­i­ous cog­ni­tive mech­a­nisms that enable him to dif­fer­en­ti­ate sources based on per­ceived stim­uli (bottom-up mech­a­nism), and to com­pare the ele­ments iden­ti­fied with exist­ing mod­els (top-down mech­a­nism). Our under­stand­ing of these mech­a­nisms is still only par­tial and their appli­ca­tion is only pos­si­ble under con­trolled con­di­tions (for exam­ple, auto­matic word recog­ni­tion). More­over, mul­ti­me­dia data demon­strates extreme vari­abil­ity linked as much to the type of source as to acqui­si­tion pro­ce­dures. As a con­se­quence, soft­ware solu­tions based on this index­a­tion approach are nei­ther reli­able nor robust. In the con­text of the HyFORGE project, we are adopt­ing another approach which is based on the “Human-In-The-Loop” par­a­digm. The soft­ware tools must exploit the fact that the user is the best research tool. Rather than strain­ing to extract seman­tic index­a­tion keys from the con­tent, research tools must pro­vide the user with a global view of the over­all con­tent and enable him to hone his research in a pro­gres­sive, inter­ac­tive way by nav­i­ga­tion mech­a­nisms within the con­tent itself.

 

WilmaScope [http://wilma.sourceforge.net/]

Copy­right © WilmaScope

 

This project is about “peer-to-peer nav­i­ga­tion” and clas­si­fi­ca­tion by data sim­i­lar­ity, what do these con­cepts rep­re­sent more precisely ?

Clas­si­fi­ca­tion of data by sim­i­lar­ity is a method for struc­tur­ing dif­fer­ent con­tents. By orga­niz­ing in a topo­log­i­cal, hier­ar­chi­cal way sim­i­lar con­tents within clus­ters, we obtain con­tent sep­a­ra­tion that is then eas­ier to nav­i­gate. The user can then advance more rapidly within this struc­ture by research­ing clus­ters which match his enquiry and by avoid­ing oth­ers. We intend to regroup sim­i­lar­i­ties based on var­i­ous char­ac­ter­is­tics extracted from the con­tent and to cre­ate hyper­me­dia links between sim­i­lar con­tents by apply­ing self-organizing algorithms.

 

The inter­face often plays a deter­min­ing role in user choice, what is pro­posed for HyFORGE appli­ca­tions? What are the chal­lenges for inter­faces using 3D or nav­i­ga­tion with tac­tile screen for example?

The design of a soft­ware inter­face is a com­plex prob­lem that must take into con­sid­er­a­tion the numer­ous con­straints imposed by user needs and com­pu­ta­tional real­i­ties. Not every­thing is pos­si­ble, and what is devel­oped must meet user demands as far as pos­si­ble and match their skill set. An inter­face must there­fore be func­tional, intu­itive, ergonomic and effi­cient. In the con­text of the HyFORGE project, we have iden­ti­fied basic tech­nolo­gies in this sense, such as the visu­al­iza­tion of 2D and 3D data, inter­faces with zoom, vec­tor graphs and also ambison­ics or time com­pres­sion. More­over, it is impor­tant to design inter­faces in line with the mate­ri­als avail­able. Although we are con­cen­trat­ing on devel­op­ments linked to con­ven­tional mate­r­ial, there are inter­est­ing per­spec­tives with spe­cific instal­la­tions (for exam­ple, mul­ti­point inter­face on large tac­tile screens).

 

PerceptivePixel [http://www.perceptivepixel.com]

Copy­right © PerceptivePixel

 

This work con­tains men­tion of an inter-media approach, how is it specif­i­cally intend to progress in this sense and how will that influ­ence our way of think­ing and our nav­i­ga­tion habits?

One line of research within the HyFORGE project mea­sures the sim­i­lar­ity between het­ero­genic mul­ti­me­dia con­tents. As a fea­ture of artis­tic com­po­si­tion, the user often wants to find audio and visual con­tent that is esthet­i­cally com­pat­i­ble. In addi­tion, a sequence of images can become a research cri­te­ria in audio data, and a piece of music can be a research key in video data. The aims in this field are still a lit­tle vague but the artists, espe­cially in the field of VJing, are call­ing for tools that will enable them to rein­force the emo­tions they wish to com­mu­ni­cate by gen­er­at­ing audio and visual atmos­pheres. This rein­forc­ing can espe­cially be achieved by the tem­po­ral syn­chro­niza­tion of data, for exam­ple by adjust­ing the pro­jec­tion of images to match the tempo of music. But it can also be obtained by ensur­ing coher­ence between the data sources them­selves, for exam­ple with the spec­tral com­po­nents of sound and image. Let’s be clear that this line of research is highly exploratory and is the artis­tic branch of the project. It is still too early to be able to exploit an inter-media approach to com­bine the ele­ments observed within the dif­fer­ent modal­i­ties (for exam­ple the sound of a gui­tar and a photo of the instrument).

 

Who stands to ben­e­fit from the final appli­ca­tions devel­oped in the con­text of the HyFORGE project? What kind of usage would be favoured and what advan­tages would there be for the end user?

The tools that we are devel­op­ing are appro­pri­ate for users need­ing to struc­ture their mul­ti­me­dia con­tent in order to facil­i­tate sub­se­quent use. That could include the indi­vid­ual user who wishes to orga­nize his videos or per­sonal music files, to the pro­fes­sional user who is aim­ing to struc­ture his com­mer­cial oper­a­tion (for exam­ple libraries of sound effects or images). We have also iden­ti­fied poten­tial appli­ca­tions in the field of data index­a­tion of med­ical imagery or video surveillance.

 

Apple Core Animation [http://www.apple.com/macosx/technology/coreanimation.html]

Copy­right © Apple Inc.

 

Is it intended to expand this work via a com­mu­nity of exter­nal devel­op­ers by propos­ing an Open Source devel­op­ment plat­form, for example?

Set­ting up a devel­op­ment com­mu­nity requires a solid foun­da­tion and a gen­eral con­sen­sus on the direc­tion of devel­op­ment work. The HyFORGE project is still too imma­ture in this sense and requires a lot of research, not merely devel­op­ment. How­ever, we will call on exist­ing com­mu­ni­ties for the spe­cific com­po­nents that will be inte­grated into our devel­op­ments, for exam­ple facil­i­ties for 3D visu­al­iza­tion (ex: OpenGL) or sound syn­the­sis (ex: PortAudio).

 

What kind of artis­tic col­lab­o­ra­tion do you envis­age in the con­text of M15/Numediart for the devel­op­ment of your project?

At its core, the HyFORGE project is not artis­tic, but can serve artists such as com­posers or video jock­eys. How­ever, we are con­vinced that the approach that we are propos­ing can become an esthetic search process and thereby cre­ate video or audio emo­tions which artists could use. For exam­ple, we have been devel­op­ing a tool for the treat­ment of audio sig­nals that would enable tem­po­ral com­pres­sion or expan­sion, in other words, that the audio data can be heard in an accel­er­ated way by min­i­miz­ing the dis­tor­tion and max­i­miz­ing the per­cep­tual qual­ity. If this tool was ini­tially ded­i­cated to nav­i­gate and search sound con­tent, it could be adapted to gen­er­ate sound effects in real time on stage with actors’ voices or instru­ment sounds. Besides, its visual abstrac­tion can be some­what esthetic. Let us con­clude by say­ing that we are open to all ideas or propo­si­tions com­ing from artists whose cre­ative imag­i­na­tion often sur­passes the nar­row aware­ness of engineers.

 

Numediart Audio Skimming <img mce_tsrc='http://www.numediart.org/blog/../files/audioskimming.png' alt='Numediart Audio Skimming [http://www.numediart.org/]' />