| [1]T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, "Overview of the H. 264/AVC video coding standard," IEEE Transactions on circuits and systems for video technology, vol. 13, no. 7, pp. 560-576, 2003. [2]G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, "Overview of the high efficiency video coding (HEVC) standard," IEEE Transactions on circuits and systems for video technology, vol. 22, no. 12, pp. 1649-1668, 2012. [3]B. Bross et al., "Overview of the versatile video coding (VVC) standard and its applications," IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3736-3764, 2021. [4]G. Lu, W. Ouyang, D. Xu, X. Zhang, C. Cai, and Z. Gao, "Dvc: An end-to-end deep video compression framework," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 11006-11015. [5]J. Li, B. Li, and Y. Lu, "Deep contextual video compression," Advances in Neural Information Processing Systems, vol. 34, pp. 18114-18125, 2021. [6]Z. Hu, G. Lu, and D. Xu, "FVC: A new framework towards deep video compression in feature space," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1502-1511. [7]X. Sheng, J. Li, B. Li, L. Li, D. Liu, and Y. Lu, "Temporal context mining for learned video compression," IEEE Transactions on Multimedia, vol. 25, pp. 7311-7322, 2022. [8]H. Chen, B. He, H. Wang, Y. Ren, S. N. Lim, and A. Shrivastava, "Nerv: Neural representations for videos," Advances in Neural Information Processing Systems, vol. 34, pp. 21557-21568, 2021. [10]S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, "Cbam: Convolutional block attention module," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3-19. [11]I. E. Richardson, The H. 264 advanced video compression standard. John Wiley & Sons, 2011. [12]A. Habibian, T. v. Rozendaal, J. M. Tomczak, and T. S. Cohen, "Video compression with rate-distortion autoencoders," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 7033-7042. [13]J. Pessoa, H. Aidos, P. Tomás, and M. A. Figueiredo, "End-to-end learning of video compression using spatio-temporal autoencoders," in 2020 IEEE Workshop on Signal Processing Systems (SiPS), 2020: IEEE, pp. 1-6. [14]Z. Li, M. Wang, H. Pi, K. Xu, J. Mei, and Y. Liu, "E-nerv: Expedite neural video representation with disentangled spatial-temporal context," in European Conference on Computer Vision, 2022: Springer, pp. 267-284. [15]J. C. Lee, D. Rho, J. H. Ko, and E. Park, "Ffnerv: Flow-guided frame-wise neural representations for videos," in Proceedings of the 31st ACM International Conference on Multimedia, 2023, pp. 7859-7870. [16]B. He, C. Zhu, G. Lu, Z. Zhang, Y. Chen, and L. Song, "GNeRV: A Global Embedding Neural Representation For Videos." [17]X. Huang and S. Belongie, "Arbitrary style transfer in real-time with adaptive instance normalization," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 1501-1510. [18]H. Chen, M. Gwilliam, S.-N. Lim, and A. Shrivastava, "Hnerv: A hybrid neural representation for videos," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 10270-10279. [19]Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, "A convnet for the 2020s," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 11976-11986. [20]H. Chen, M. Gwilliam, B. He, S.-N. Lim, and A. Shrivastava, "Cnerv: Content-adaptive neural representation for visual data," arXiv preprint arXiv:2211.10421, 2022. [21]J. Kim, J. Lee, and J.-W. Kang, "SNeRV: Spectra-Preserving Neural Representation for Video," in European Conference on Computer Vision, 2025: Springer, pp. 332-348. [22]J. E. Saethre, R. Azevedo, and C. Schroers, "Combining Frame and GOP Embeddings for Neural Video Representation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 9253-9263. [23]W. Shi et al., "Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1874-1883. [24]D. Hendrycks and K. Gimpel, "Gaussian error linear units (gelus)," arXiv preprint arXiv:1606.08415, 2016. [25]X. Zhang et al., "Boosting Neural Representations for Videos with a Conditional Decoder," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 2556-2566. [26]V. Sitzmann, J. Martel, A. Bergman, D. Lindell, and G. Wetzstein, "Implicit neural representations with periodic activation functions," Advances in neural information processing systems, vol. 33, pp. 7462-7473, 2020. [27]Z. Liu et al., "FINER: Flexible spectral-bias tuning in Implicit NEural Representation by Variable-periodic Activation Functions," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 2713-2722. [28]H. Zhu et al., "FINER++: Building a Family of Variable-periodic Functions for Activating Implicit Neural Representation," arXiv preprint arXiv:2407.19434, 2024. [29]Y. Bai, C. Dong, C. Wang, and C. Yuan, "Ps-nerv: Patch-wise stylized neural representations for videos," in 2023 IEEE International Conference on Image Processing (ICIP), 2023: IEEE, pp. 41-45. [30]C. Gomes, R. Azevedo, and C. Schroers, "Video compression with entropy-constrained neural representations," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 18497-18506. [31]G. Gao, H. M. Kwan, F. Zhang, and D. Bull, "PNVC: Towards practical INR-based video compression," arXiv preprint arXiv:2409.00953, 2024. [32]H. M. Kwan, G. Gao, F. Zhang, A. Gower, and D. Bull, "HiNeRV: Video Compression with Hierarchical Encoding-based Neural Representation," Advances in Neural Information Processing Systems, vol. 36, 2024. [33]Q. Chang, H. Yu, S. Fu, Z. Zeng, and C. Chen, "MNeRV: A Multilayer Neural Representation for Videos," arXiv preprint arXiv:2407.07347, 2024. [34]M. Tarchouli, T. Guionnet, M. Riviere, W. Hamidouche, M. Outtas, and O. Deforges, "Res-NeRV: Residual Blocks For A Practical Implicit Neural Video Decoder," in 2024 IEEE International Conference on Image Processing (ICIP), 2024: IEEE, pp. 3751-3757. [35]Q. Cao, D. Zhang, and X. Zhang, "Saliency-Based Neural Representation for Videos," in International Conference on Pattern Recognition, 2025: Springer, pp. 389-403. [36]J. Chen et al., "Run, Don't walk: Chasing higher FLOPS for faster neural networks," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 12021-12031. [37]D. Oktay, J. Ballé, S. Singh, and A. Shrivastava, "Scalable model compression by entropy penalized reparameterization," arXiv preprint arXiv:1906.06624, 2019. [38]H. Yan, Z. Ke, X. Zhou, T. Qiu, X. Shi, and D. Jiang, "DS-NeRV: Implicit Neural Video Representation with Decomposed Static and Dynamic Codes," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 23019-23029. [39]L. Tang, J. Zhu, X. Zhang, L. Zhang, S. Ma, and Q. Huang, "CANeRV: Content Adaptive Neural Representation for Video Compression," arXiv preprint arXiv:2502.06181, 2025. [40]A. Radford et al., "Learning transferable visual models from natural language supervision," in International conference on machine learning, 2021: PMLR, pp. 8748-8763. [41]P. Wang et al., "Ofa: Unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework," in International Conference on Machine Learning, 2022: PMLR, pp. 23318-23340. [42]B. He et al., "Towards scalable neural representation for diverse videos," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 6132-6142. [43]B. Jacob et al., "Quantization and training of neural networks for efficient integer-arithmetic-only inference," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2704-2713. [44]Y. Bengio, N. Léonard, and A. Courville, "Estimating or propagating gradients through stochastic neurons for conditional computation," arXiv preprint arXiv:1308.3432, 2013. [45]J. Shi, "Good features to track," in 1994 Proceedings of IEEE conference on computer vision and pattern recognition, 1994: IEEE, pp. 593-600. [46]B. D. Lucas and T. Kanade, "An iterative image registration technique with an application to stereo vision," in IJCAI'81: 7th international joint conference on Artificial intelligence, 1981, vol. 2, pp. 674-679. [47]D. A. Huffman, "A method for the construction of minimum-redundancy codes," Proceedings of the IRE, vol. 40, no. 9, pp. 1098-1101, 1952. [48]A. Mercat, M. Viitanen, and J. Vanne, "UVG dataset: 50/120fps 4K sequences for video codec analysis and development," in Proceedings of the 11th ACM Multimedia Systems Conference, 2020, pp. 297-302. [49]H. Wang et al., "MCL-JCV: a JND-based H. 264/AVC video quality assessment dataset," in 2016 IEEE international conference on image processing (ICIP), 2016: IEEE, pp. 1509-1513. [50]P. Goyal, "Accurate, large minibatch SG D: training imagenet in 1 hour," arXiv preprint arXiv:1706.02677, 2017. [51]I. Loshchilov and F. Hutter, "Stochastic gradient descent with warm restarts," in Proceedings of the 5th International Conference on Learning Representations, pp. 1-16. [52]P. K. Diederik, "Adam: A method for stochastic optimization," (No Title), 2014. [53]G. Bjontegaard, "Calculation of average PSNR differences between RD-curves," ITU SG16 Doc. VCEG-M33, 2001.
|