ISO/IEC 14496-11 2005 specifies the coded representation of interactive audio-visual scenes and applications. It specifies the following tools
- the coded representation of the spatio-temporal positioning of audio-visual objects as well as their behaviour in response to interaction (scene description);
- the coded representation of synthetic two-dimensional (2D) or three-dimensional (3D) objects that can be manifested audibly and/or visually;;
- the Extensible MPEG-4 Textual (XMT) format, a textual representation of the multimedia content described in ISO/IEC 14496 using the Extensible Markup Language (XML); and
- a system level description of an application engine (format, delivery, lifecycle, and behaviour of dowloadable Java byte code applications).