We use cookies to ensure that we give you the best experience on our website. By continuing to browse this repository, you give consent for essential cookies to be used. You can read more about our Privacy and Cookie Policy.

Durham Research Online
You are in:

A machine learning driven solution to the problem of perceptual video quality metrics

Katsigiannis, Stamos and Rabah, Hassan and Ramzan, Naeem (2020) 'A machine learning driven solution to the problem of perceptual video quality metrics.', in AI for Emerging Verticals; Human-robot computing, sensing and networking. .


The advent of high-speed internet connections, advanced video coding algorithms, and consumer-grade computers with high computational capabilities has led videostreaming-over-the-internet to make up the majority of network traffic. This effect has led to a continuously expanding video streaming industry that seeks to offer enhanced quality-of-experience (QoE) to its users at the lowest cost possible. Video streaming services are now able to adapt to the hardware and network restrictions that each user faces and thus provide the best experience possible under those restrictions. The most common way to adapt to network bandwidth restrictions is to offer a video stream at the highest possible visual quality, for the maximum achievable bitrate under the network connection in use. This is achieved by storing various pre-encoded versions of the video content with different bitrate and visual quality settings. Visual quality is measured by means of objective quality metrics, such as the Mean Squared Error (MSE), Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), Visual Information Fidelity (VIF), and others, which can be easily computed analytically. Nevertheless, it is widely accepted that although these metrics provide an accurate estimate of the statistical quality degradation, they do not reflect the viewer’s perception of visual quality accurately. As a result, the acquisition of user ratings in the form of Mean Opinion Scores (MOS) remains the most accurate depiction of human-perceived video quality, albeit very costly and time consuming, and thus cannot be practically employed by video streaming providers that have hundreds or thousands of videos in their catalogues. A recent very promising approach for addressing this limitation is the use of machine learning techniques in order to train models that represent human video quality perception more accurately. To this end, regression techniques are used in order to map objective quality metrics to human video quality ratings, acquired for a large number of diverse video sequences. Results have been very promising, with approaches like the Video Multimethod Assessment Fusion (VMAF) metric achieving higher correlations to useracquired MOS ratings compared to traditional widely used objective quality metrics.

Item Type:Book chapter
Full text:(AM) Accepted Manuscript
Download PDF
Publisher Web site:
Date accepted:No date available
Date deposited:13 January 2021
Date of first online publication:15 December 2020
Date first made open access:13 January 2021

Save or Share this output

Look up in GoogleScholar