The challenge is on! The 3DLife Challenge 2010, "Sports Activity Analysis in Camera Networks", is now live within the ACM Multimedia Grand Challenge 2010. More details of the 3DLife annual research challenge can be found at the 3DLife website. A short description of the challenge follows.
Advances in the availability and utility of cameras is rapidly changing the sporting landscape. In professional sports we are familiar with high-end camera technology being used to enhance the viewer experience above and beyond a traditional broadcast. High profile examples include the Hawk-Eye Officiating System as used in tennis and cricket or ESPN’s recent announcement to showcase 3D broadcast in its coverage of the 2010 FIFA World Cup. Whilst extremely valuable to the viewing experience, such technologies are really only feasible for high profile professional sports. On the other hand, advances in camera technology coupled with falling prices means that reasonable quality visual capture is now within reach of most local and amateur sporting and leisure organizations. Thus it becomes feasible for every field sports club, whether tennis, soccer, cricket or hockey, to install their own camera network at their local ground. In fact, the same goes for other leisure activities like dance, aerobics and performance art that take place in a constrained environment and that would benefit from visual capture. In these cases, the motivation is usually not for broadcast purposes, or for the technology to act as a “video referee” or adjudicator, but rather to facilitate coaches and mentors to provide better feedback to athletes based on recorded competitive training matches, training drills or any prescribed set of activities.
This challenge focuses on exploring the limits of what is possible in terms of 2D and 3D data extraction from a low-cost camera network for sports. Tennis is chosen as a case study as it is a sporting environment that is relatively easy to instrument with cheap cameras and features a small number of actors (players) who exhibit explosive and rapid sophisticated motion. Video data from an AV network, corresponding to 9 cameras with built in mics, installed around an indoor court capturing real athletes is provided for experimentation purposes. The capture infrastructure is deliberately set-up to model what is feasible for a local tennis club using commercial off-the shelf components i.e. 720 x 680, MPEG-4 25Hz cameras that are not calibrated or synchronized and that share only limited overlapping fields of view. We are interested in submissions that explore the limits of what is possible from such a real-world capture scenario in terms of:
- Player localization on court and tracking through multiple camera views;
- Event-based analysis and human behaviour modeling using multiple views of the same event / activity: one example is robustly classifying every stroke as a serve, backhand, forehand, etc considering fusion across multiple camera views; another example is detecting the game structure automatically (point, game, match).
- 3D reconstruction of the playing arena and/or the players or their actions; an example is using player location and stroke classification to animate an avatar of the player, even coarsely;
- Longitudinal analysis of player activity and motion over an entire training session;
- Novel visualization and feedback mechanisms of any analysis results.