A NOVEL RATE-BASED MACROBLOCK
CLASSIFICATION FOR FAST MODE
DECISION IN H.264 VIDEO CODING

Md. Salah Uddin Yusuf; Mohiuddin Ahmad

A NOVEL RATE-BASED MACROBLOCK CLASSIFICATION FOR FAST MODE DECISION IN H.264 VIDEO CODING

Md. Salah Uddin Yusuf¹, Mohiuddin Ahmad²

Assistant Professor, Dept. of EEE, Khulna University of Engineering & Technology (KUET), Khulna, Bangladesh
Professor, Dept. of EEE, Khulna University of Engineering & Technology, Khulna (KUET), Bangladesh

Related article at Pubmed, Scholar Google

Visit for more related articles at International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

Abstract

In this paper, a novel rate-based macroblock classification is proposed for fast mode decision in H.264 video coding standard. The main idea is to classify each macroblock into simple or complex motion contents based on the Inter16x16 mode’s residue block bit-rate and then according to the classification different mode searching orders with distinct early termination schemes are employed. This new algorithm is very simple for both hardware and software implementations without extra computational module. To speed up the intra mode decision, a new fast Intra 4x4 mode selection algorithm is also proposed by choosing most likely modes using the low complexity SATD cost as screening function. It is demonstrated by experimental results that the proposed fast algorithm can reduce 47% to 65% of the H.264 total encoding time with negligible degradation in the rate-distortion performance. While the rate-based algorithm combined with the fast intra mode selection method could further speedup 5% to 10% of the encoding time with only little rate-distortion degradation.

Keywords

Macroblock, Mode Decision, Early termination Scheme, Video Coding, H.264/AVC.

INTRODUCTION

The newest video coding standard is known as H.264/AVC [1], [2], which greatly outperforms the previous MPEG- 1/2/4 and H.261/263 [3]-[6] standards in terms of both picture quality and compression efficiency. In order to achieve the highest coding efficiency, H.264 employs rate distortion optimization (RDO) technique to get the optimal coding result in terms of maximizing coding quality and minimizing resulting data bits [7, 8]. Therefore, it has to encode all possible modes, including intra and inter prediction modes and allowing variable-size blocks in motion estimation, which contributes a lot to the prediction accuracy. The basic idea to bring in many prediction modes is that the blocks with higher motion details can be better encoded with smaller blocks; similarly, the blocks with less motion details can be encoded with larger block modes [2].

Although the variable block size technique can significantly improve the prediction accuracy, it also brings in great computation loading, since for each macroblock there are totally 9 modes needed to be calculated in order to determine the best mode with lowest rate-distortion cost. To reduce the computation, several fast mode-decision algorithms have been developed and mainly focused on how to eliminate unnecessary modes. By using the Sobel operator, the edge direction map is successfully utilized for fast H.264 inter mode selection. to reduce the computation in intra prediction [9]. In [10], an objective measure is proposed between the intra and the inter modes to decide spatial and temporal correlation. Recently, a motion content classification based fast mode decision algorithm was proposed in [11], which makes use of the spatial mode correlation to prediction the most probable mode of current macroblock. Most of the developed fast mode decision algorithms are involved additional computation such edge direction map, coefficients cost or mode predictor for eliminating less probable modes, which may increase the complexity in practical implementation especially using hardware. In this paper, the bit-rate for encoding the Inter 16x16 mode is proposed for simple and complex motion blocks classification and different mode searching orders and termination schemes are employed to eliminate unnecessary modes. As the encoding of the Inter16x16 mode is part of the rate-distortion optimization mode selection process, thus no additional operation module is required for the proposed fast algorithm.

The rest of the paper is organized with section II providing the reasons on using rate-based motion content classification. The proposed fast algorithm and threshold selection are described in section III. A new fast intra selection algorithm is presented in section IV for further speedup the decision process. Simulation results are presented in section V and the conclusion is given in section VI.

RATE-BASED MACROBLOCK MOTION CONTENT CLASSIFICATION

In H.264, most of the computation cost is spent on those modes which are not the final best modes; therefore, in order to reduce the computation, a good way is to exclude some unlikely modes in advance. Basically, the macroblock (MB) can be divided into two types: simple motion macroblock (SMB) and complex motion macroblock (CMB). This is a rough motion content classification since we cannot precisely determine the best mode in advance [11]. If an MB is considered as an SMB, only large modes (SKIP, Inter16×16, Inter16×8, and Inter8×16) are selected for the best mode. Similarly, a CMB only covers small block-size modes (Inter8×8, Inter 8×4, Inter4×8, Inter4×4, Intra4×4 and Intra16×16). This mode search strategy could avoid a lot of computation especially when a MB is predicted as SMB. In such condition, the computation of the rate-distortion cost for small block-size modes can be saved which is more computationally intensive than large block-size modes. To take advantage of this mode search approach, still, we need to develop an efficient and accuracy measure for SMB and CMB classification. In H.264, rate-distortion cost is used to decide the best mode and it is defined as

(1)

where, λ is the Lagrange multiplier, SSD presents the sum of squared difference between original MB and reconstructed MB, Rmotion represents the number of bits for encoding the motion vectors and head information, and Rresidue represents the number of bits for encoding the residue MB. Based on the definition of the cost function, we may consider using JRD, SSD, Rmotion, and Rresidue for the motion content classifications. As we want to have an efficient classifier, the total cost of JRD and SSD are not good candidates. It is because the computation of the SSD is very computationally intensive which requires forward/inverse quantization, forward/inverse transformation and pixel-reconstructions processes [12]. Besides the high computational requirement, SSD cannot also provide very decisive information for mode decision. Therefore, SSD is not a good choice for block mode classification. Similarly, Rmotion is the number of bits for encoding the motion vector, which is also not highly related to final mode decision.

The reason why H.264 brings in variable block-size modes is that in some cases current MB cannot be well predicted by large modes, which means the difference between current and predicted MB is quite large. Thus, we have to resort to small modes for better estimation. The Rresidue is the number of bits after entropy coding of residue blocks. It is directly related to the difference between original and predicted MB. Thus, Rresidue for large block size such Inter16x16 mode could be a good measure for classifying motion content of the MB for fast mode selection. For example, if Rresidue of Inter16x16 mode is small, which means that the large block-size mode can well predict the current MB, then it is likely that the current MB should be a SMB and the best mode is among large block-size modes; otherwise, we should focus on small block-size modes to determine the best mode. Based on this idea, we choose the bit-rate for encoding the Inter16x16 residue block (RInter16x16) as the measure for SMB and CMB classification and then using different mode searching orders to save some of unnecessary modes computation. The classification can be based on a pre-defined threshold as follows:

If RInter16x16 < Threshold, then the current MB is SMB; and if RInter16x16 ≥ Threshold, then the current MB is CMB; In order to demonstrate that RInter16x16 could provide very good classification accuracy, the classification accuracy of some QCIF sequences are shown in Table I with quantization parameter (QP) = 32 and threshold = 140.

The Correct Ratio is the ratio of which the MB classification accords with the result obtained from the exhaustive mode selection; SMB Error Ratio is the ratio of which SMBs are mistakenly categorized into the CMBs, and the opposite ratio is called CMB Error Ratio. From the Table 1, we can find that when the thresholds are properly chosen, the classification correct ratio is very high, which can be up to 98%.

PROPOSED FAST INTER 16X16 RATE-BASED MODE SELECTION ALGORITHM

Based on the RInter16x16 for motion content classification, we can use different mode searching orders to avoid some unnecessary modes calculation for achieving fast mode selection. The proposed fast Inter16x16 rate-based mode decision algorithm first calculates the RInter16x16 of current MB, and then compares it with a pre-defined threshold. If the RInter16x16 is less than the threshold, the current block is determined as a SMB which covers large modes;

If the RInter16x16 is less than the threshold, the current block is determined as a SMB which covers large modes; otherwise, it is a CMB which covers small modes. The fast algorithm can be summarized as:

Step 1: Compute the bit-rate for the Inter16x16 mode residue block RInter16x16

Step 2: If RInter16x16 < Threshold then selects the best mode from DIRECT/SKIP, Inter16x16, Inter8x16 and Inter16x8 modes based their rate-distortion costs.

Step 3: If RInter16x16 ≥ Threshold then select the best mode from InterP8×8 (Inter8×4, Inter4×8, Inter4×4), Intra16×16 and Intra4×4 modes based their rate distortion cost.

The flow chart of the proposed fast mode decision algorithm is shown in Fig.1. With use of this search orders, a lot of computation can be saved due to unnecessary and computationally intensive modes especially the intra modes calculation can be avoided. In addition, the proposed algorithm is very easy to implement in both hardware and software and it does not require extra module because the calculation of RInter16x16 is a necessary step in the process of rate-distortion cost computation. In the proposed algorithm as explained above, the reason why we select the Intra16x16 mode in the CMB path is that when the prediction accuracy of Inter16x16 mode is good enough, we do not have to resort to Intra16x16 mode, which is only a compensation of large block-size modes.

To efficiently apply the proposed algorithm, we have to resolve how to select good threshold values for different QPs, which may have great influence on the computation time and rate-distortion performance. The classification accuracy and rate-distortion coding performance of the proposed fast algorithm using various QPs and threshold values are shown in Table II. From Table II, we can find that the larger the QP values, the larger the correct ratio, which results in a very similar coding performance between original algorithm and the proposed fast algorithm. In addition, the RDO performance is not very sensitive to the selection of thresholds. Take QP = 36 as an example, as long as we choose a threshold value in the zone of [90,150], the RDO performance is quite stable. This property is significant since we do not have to exert much effort on how to precisely determine the best thresholds. With extensive simulations on different kinds of video sequences and QPs, we formulated the relationship between the threshold and QPs as following equation:

(2)

The formula is reasonable since when QP increases, the quantization step increases which leads to smaller RInter16x16. Thus, we have to decrease the threshold to adjust to the smaller RInter16x16.

A FAST INTRA 4X4 MODE SELECTION ALGORITHM

Intra prediction means that the prediction of current macroblock comes from the spatial information of already encoded macroblock in the same image [13]. It mainly contributes to the I-frame encoding and is also an important prediction way in P-frame and B-frame when inter prediction cannot work well. The 16x16 intra prediction works well in a gently changing area. In Intra 4x4 mode, there are totally nine prediction modes supported as shown in figure 2. Eight prediction modes are for a specific prediction direction and one mode is DC prediction mode. In H.264 standard, a full search (FS) is used to examine all 9 modes for the Intra 4×4 prediction to find the one with the smallest RD cost as the best mode. Therefore, for luminance components in a macroblock, we have to examine 16×9 = 144 different RD cost calculations; while in FS, inter prediction requires seven RD cost calculations. From the comparison, we can find thatthe computation load on Intra 4x4 mode is quite large, since there are too many modes to cover. Therefore, if we can exclude some unnecessary prediction modes in advance, much computation time can be saved. To accelerate the coding process, the JVT reference software version JM 6.1d provides a fast SAD-based cost function [14]:

(3)

Where, SAD is sum of absolute difference between the original block S and the predicted block P, and the K equal to 0 for the probable mode and 1 for the other modes. The SAD is expressed by

(4)

where sij and pij are the (i, j)th elements of the current original block S and the predicted block P, respectively. This SAD-based cost function could save a lot of computations as all the processes of image block transformation, quantization, and reconstruction can be saved. Also, the number of bits is estimated by constants either equal 4 or 0. Thus, the variable length coding using CAVLC or CABAC can also be saved. However, the expense of the computation reduction usually comes with quite significant degradation of coding efficiency. To achieve better ratedistortion performance, JM6.1d also provided an alternative SATD-based cost function [14]:

(5)

where SATD is sum of absolute Hadamard-transformed difference between the original block S and the predicted block P, which is given by

(6)

where hij are the (i, j)th element of the Hadamard transformed image block H which is the difference between the original block S and the predicted block P. The Hadamard transformed block H is defined as

(7)

Experimental results show that the JSATD could achieve better rate-distortion performance than the JSAD, but it requires more computation due to the Hadamard transformation. However, the performance degradation of SATD is still large; therefore, we propose a new Intra 4x4 mode fast selection algorithm. In many cases, the best intra prediction mode found by full search is not the same as that from SATD criterion; nevertheless, it is likely that the real best mode is among the smallest K modes in the sense of SATD criterion and we define such probability as P, as shown in Table III. From the table, we can find that when K=3, the probability P is up to 0.85. Based on this result, we propose a fast Intra 4×4 mode selection algorithm as follows:

Step1: Calculate the SATD cost of nine Intra 4x4 prediction modes;

Step2: Single out K modes with smallest SATD cost which are collected in the set ;

Step3: Calculate the RD cost among those modes in the set and determine the best prediction mode.

In the proposed algorithm, the computation complexity of SATD is much less than that of RD cost and we merely need to examine K modes instead of full search. Therefore, the computation time can be reduced quite a lot.

SIMULATION RESULTS

The proposed mode selection algorithm was tested using the first 50 frames from different kind of video using the first 50 frames from different kinds video sequences (Akiyo, Foreman and Stefan, Carphone, Trevor, Salesman e.t.c) all in QCIF format of slow motion, medium motion and fast motion details. The experiment was carried out in the JVT JM12 encoder [14] and the test parameters are as written below:

- Number of previous frames used for inter motion search is 1,

- CAVLC is enabled,

- GOP structure is IPPPIPPP…,

- Max search range is 32,

- QP values are 24, 28, 32, 36 and 40, and the corresponding thresholds are calculated with experience formula: 55, 81, 120, 169, 230 and 303, In the fast intra mode selection algorithm, we make experiments in the case of K = 2, 3, 4.

Compared with the original H.264/AVC encoder with the rate-distortion optimization and computation time, the proposed algorithm achieves large computation time reduction on average as well as quite similar R-D performance, as listed in Table IV (for the sequence of Akiyo), Table 5 (for the sequence of Foreman) and Table 6 (for the sequence of Stefan). When only fast inter mode selection algorithm is employed, in Akiyo and Foreman, the average computation reduction can reach 62% or more, with average 0.04 PSNR reductions and average 0.2% bitrate increase and in Stefan, the average computation reduction reaches 47%, with average 0.02 PSNR reductions and average 0.12% bitrate increase. When we combine fast inter mode and intra mode selection algorithms together in the case of K = 2, 3 and 4, the computation time can be further reduced, with a little degradation of R-D performance.

The results shown in the tables indicate that K = 3 can lead to a better tradeoff between computation load and R-D performance than K = 2 or 4.

From the simulation results, we can summarize that The threshold values determined by equation (2) are widely adapted to different kinds of video sequences: slow motion, medium motion and high motion. This also indicates that the thresholds are not sensitive to QP values. The proposed algorithm can reduce more computation time in slow motion sequences than in higher motion sequences. That is because there are more large mode macroblocks in slow motion sequences, so that much computation on small modes can be skipped, especially intra modes, which are quite computationally intensive. When QP value decreases, the degradation of R-D performance increases.

From Table II, we can find that in large QP values, the correct ratio of classification is higher, which determines the degradation of R-D performance. Therefore, a smaller QP leads to a smaller correct ratio of classification and worse RD performance. Table VII, VIII and IX shows the R-D performance and computation time of three conventional fast mode selection algorithms: skip mode early termination algorithm, fast intra mode decision algorithm and fast inter mode decision algorithm [10], [11], [12]. The results indicate that our proposed algorithm has a better tradeoff between R-D performance and computation load.

CONCLUSION

In this paper, efficient fast inter and intra mode selection algorithms for H.264 standard are proposed. In the fast inter mode algorithm, the motivation is to skip some unlikely modes by dividing the macroblock into CMB and SMB based on the bit-rate for encoding the Inter16x16 mode residue block (RInter16x16). The threshold value used for the classification is also formulated by an equation in terms of QPs, which can maintain a good tradeoff between computation complex and rate-distortion performance. The contribution of fast intra mode algorithm is to examine three most probable intra prediction modes rather than full search. Experimental results demonstrate that the proposed algorithms achieve large computation savings while maintaining the very similar rate-distortion performance.

Tables at a glance


Table 1	Table 2	Table 3	Table 4	Table 5

Table 6	Table 7	Table 8	Table 9

Figures at a glance


Figure 1	Figure 2

References

Iain E.G. Richardson, “H.264 and MPEG-4 video compression: video coding for next-generation multimedia”, Chichester: John Wiley and Sons, c2003.

T. Wiegand, G J. Sullivan, GisleBjontegaard, and Ajay Luthra, “Overview of the H.264/AVC video coding standard,” IEEE Trans on Circuits and Systems for video technology, vol. 13, pp.560-576, July 2003.

Watkinson, John, “MPEG-2”, Oxford, Boston: Focal Press, 1999.

Pereira, Fernando C. N., “The MPEG-4 book”, Upper Saddle River, N.J.: Prentice Hall PTR, c2002.

V Roden, T., “H.261 and MPEG1-a comparison”, Proceedings of the 1996 IEEE Fifteenth Annual International Phoenix Conference on 27-29 March 1996 Page(s): 65 – 71.

Video coding for low bitrate communication (in ITU-T Recommendation H.263 version 1, 1995.), Sep. 1997

T. Wiegand, M.Lightstone, D. Mukherjee, T.G.Campbell, and S.K.Mitra, “Rate-distortion optimized mode selection for very low bit rate video coding and the emerging H.263 standard”, IEEE Trans, Circuits Syst. Video Technol., vol.6, no,2, pp.182-190, Apr.1996

G. J. Sullivian and T. Wiegand, “Rate-distortion optimization for video compression”, IEEE Signal Process. Mag., vol. 15, no.11, pp.74-90, Nov.1998.

F. Pan, X. Lin, R. Susano, K. P. Lim, Z. G. Li, G. N. Feng, D. J. Wu, and S. Wu, “Fast mode decision for intra prediction,” Joint Video Team (JVT), Mar.2003, Doc. JVT-G013

K.P.Lim et all , “Fast inter mode selection”, Joint Video Team, Sept.2003, Doc. JVT-120

B. Jeon and J. Lee, “Fast mode decision for H.264,” Joint Video Team (JVT), Dec.2003, Doc. JVT-J033.

Ming Yang, Wensheng Wang, “Fast Macroblock mode selection based on motion content classification in H.264/AVC”, IEEE Proc. ICIP, pp. 741-744, Oct. 2004

A. Hallapuro, M. Karczewicz, “Low Complexity Transform and Quantization –Part I: Basic Implementation”, ITU-T Q.6/SG16 DocumentJVT-B038, Jan. 2002

Joint Video Team (JVT) Reference Software, http://iphome.hhi.de/suehring/tml/index.htm