• 被引用:0
  • 點閱:59
  • 評分:
  • 下載:0
  • 書目收藏:0
近期,隱式神經表示(Implicit Neural Representation, INR)因成功解決學習式視訊壓縮中面臨解碼速度慢的困境,成為熱門研究方向,然而現有INR方法仍然無法在壓縮效能上與最先進的學習式視訊壓縮方法匹敵,因此本文將以提升壓縮性能為目標改善INR模型。
在模型架構方面,本文設計一個增強影像特徵模組,包含根據視訊中特徵變化程度動態調整GOP(Group of Pictures)區間,並加入時間嵌入輕量化,使得在同個GOP下的相同特徵能額外嵌入時間訊息,以及選取相對接近的關鍵特徵當作影像特徵,而非依賴固定的GOP起始圖框作為特徵來源。同時引入自注意力模組CBAM(Convolutional Block Attention Module)加強關鍵特徵的關注。並在解碼器中透過帶有權重的殘差連接(Skip Connect), 以改善梯度流動,並實現輔助特徵與主要特徵之間的平衡。
在模型壓縮方面,本文根據視訊的動態程度,對動靜態視訊採取不同的剪枝策略,並結合模型剪枝策略,保留解碼過程的關鍵層。在損失函數設計中,加入分階段頻域損失,兼顧局部與全局特徵的表現。
在視訊表示法任務上,相較於本文基於Deng的模型,本文方法在模型大小減少2%的情況下,PSNR指標還上升0.28。在視訊壓縮上,本文方法在PSNR指標超越基準方法、H265傳統視訊壓縮,以及DCVC學習式視訊壓縮方法。
此外,不同於以往 INR 視訊壓縮方法均採取固定的訓練方式,本文透過分析視訊的特性,進而動態調整訓練策略,並通過消融實驗驗證其有效性。
In recent years, Implicit Neural Representation (INR) has become a popular research direction because it successfully solves the slow decoding speed in learned video compression.
However, the existing INR methods are still not able to match the state-of-the-art learned video compression method in terms of compression performance. Therefore, we aim to improve INR models with the goal of improving compression performance.
In terms of model architecture, we design an enhanced image feature module, which dynamically adjusts Group of Pictures (GOP) interval based on the feature variation in a video. Additionally, we introduce lightweight temporal embeddings to embed time information into features within the same GOP. Instead of relying on fixed GOP initial frames as feature sources, we select relatively close key features as image features. Meanwhile, we introduce self-attention module CBAM (Convolutional Block Attention Module) to strengthen attention to key features.
Moreover, in the decoder, we employ skip connections with weight to improve gradient flow and achieve a balance between auxiliary and primary features.
In terms of model compression, we propose a dynamic pruning strategy based on the dynamic degree of video, applying different pruning strategies for static and dynamic videos. We also combine model pruning strategy to retain key layers during the decoding process. In loss function design, we introduce a stage-wise frequency domain loss to optimize both local and global feature representations.
For video representation tasks, compared to Deng’s model, our proposed method reduces model size by 2%, while improving the PSNR metric by 0.28dB. For video compression, our method outperforms the baseline method, traditional video compression H.265, and learned video compression DCVC in PSNR.
Notably, unlike previous INR-based video compression methods that use fixed training methods, this study through analyzes the characteristics of video to adjust the training strategy. Furthermore, we validate the effectiveness of our approach through ablation experiments.
摘要 i

2.1 顯式視訊表示與隱式神經表示 5
2.2 隱式神經表示的嵌入類型 7
2.3 隱式神經表示的解碼器架構 9
2.4 隱式神經表示的模型壓縮流程 11
2.5 隱式神經表示的訓練策略 11
2.6 總結文獻方法 12
3 第三章 本文提出方法 14
3.1 本文模型架構 14
3.1.1 增強影像特徵模組 15
3.1.2 多解析度時間網格 18
3.1.3 特徵融合模組 19
3.1.4 自注意力模組 (CBAM) 20
3.1.5 光流引導圖框聚合 21
3.1.6 解碼器 23
3.2 視訊壓縮流程 24
3.2.1 量化感知訓練 25
3.2.2 特徵量化 25
3.2.3 剪枝 26
3.2.4 權重編碼 27
3.3 損失函數 27
3.3.1 增強損失函數 27
3.3.2 分階段損失函數 28
4 第四章 實驗結果 30
4.1 實驗設置 30
4.1.1 實驗環境 30
4.1.2 資料集 31
4.1.3 超參數 32
4.1.4 評估方式 35
4.2 視訊表示法實驗結果 35
4.3 視訊壓縮實驗結果 36
4.4 消融實驗 40
4.4.1 模型架構 40
4.4.2 損失函數 41
4.4.3 模型壓縮 42
5 第五章 結論 45
參考文獻 47


[1]T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, "Overview of the H. 264/AVC video coding standard," IEEE Transactions on circuits and systems for video technology, vol. 13, no. 7, pp. 560-576, 2003.
[2]G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, "Overview of the high efficiency video coding (HEVC) standard," IEEE Transactions on circuits and systems for video technology, vol. 22, no. 12, pp. 1649-1668, 2012.
[3]B. Bross et al., "Overview of the versatile video coding (VVC) standard and its applications," IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3736-3764, 2021.
[4]G. Lu, W. Ouyang, D. Xu, X. Zhang, C. Cai, and Z. Gao, "Dvc: An end-to-end deep video compression framework," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 11006-11015.
[5]J. Li, B. Li, and Y. Lu, "Deep contextual video compression," Advances in Neural Information Processing Systems, vol. 34, pp. 18114-18125, 2021.
[6]Z. Hu, G. Lu, and D. Xu, "FVC: A new framework towards deep video compression in feature space," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1502-1511.
[7]X. Sheng, J. Li, B. Li, L. Li, D. Liu, and Y. Lu, "Temporal context mining for learned video compression," IEEE Transactions on Multimedia, vol. 25, pp. 7311-7322, 2022.
[8]H. Chen, B. He, H. Wang, Y. Ren, S. N. Lim, and A. Shrivastava, "Nerv: Neural representations for videos," Advances in Neural Information Processing Systems, vol. 34, pp. 21557-21568, 2021.

[10]S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, "Cbam: Convolutional block attention module," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3-19.
[11]I. E. Richardson, The H. 264 advanced video compression standard. John Wiley & Sons, 2011.
[12]A. Habibian, T. v. Rozendaal, J. M. Tomczak, and T. S. Cohen, "Video compression with rate-distortion autoencoders," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 7033-7042.
[13]J. Pessoa, H. Aidos, P. Tomás, and M. A. Figueiredo, "End-to-end learning of video compression using spatio-temporal autoencoders," in 2020 IEEE Workshop on Signal Processing Systems (SiPS), 2020: IEEE, pp. 1-6.
[14]Z. Li, M. Wang, H. Pi, K. Xu, J. Mei, and Y. Liu, "E-nerv: Expedite neural video representation with disentangled spatial-temporal context," in European Conference on Computer Vision, 2022: Springer, pp. 267-284.
[15]J. C. Lee, D. Rho, J. H. Ko, and E. Park, "Ffnerv: Flow-guided frame-wise neural representations for videos," in Proceedings of the 31st ACM International Conference on Multimedia, 2023, pp. 7859-7870.
[16]B. He, C. Zhu, G. Lu, Z. Zhang, Y. Chen, and L. Song, "GNeRV: A Global Embedding Neural Representation For Videos."
[17]X. Huang and S. Belongie, "Arbitrary style transfer in real-time with adaptive instance normalization," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 1501-1510.
[18]H. Chen, M. Gwilliam, S.-N. Lim, and A. Shrivastava, "Hnerv: A hybrid neural representation for videos," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 10270-10279.
[19]Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, "A convnet for the 2020s," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 11976-11986.
[20]H. Chen, M. Gwilliam, B. He, S.-N. Lim, and A. Shrivastava, "Cnerv: Content-adaptive neural representation for visual data," arXiv preprint arXiv:2211.10421, 2022.
[21]J. Kim, J. Lee, and J.-W. Kang, "SNeRV: Spectra-Preserving Neural Representation for Video," in European Conference on Computer Vision, 2025: Springer, pp. 332-348.
[22]J. E. Saethre, R. Azevedo, and C. Schroers, "Combining Frame and GOP Embeddings for Neural Video Representation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 9253-9263.
[23]W. Shi et al., "Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1874-1883.
[24]D. Hendrycks and K. Gimpel, "Gaussian error linear units (gelus)," arXiv preprint arXiv:1606.08415, 2016.
[25]X. Zhang et al., "Boosting Neural Representations for Videos with a Conditional Decoder," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 2556-2566.
[26]V. Sitzmann, J. Martel, A. Bergman, D. Lindell, and G. Wetzstein, "Implicit neural representations with periodic activation functions," Advances in neural information processing systems, vol. 33, pp. 7462-7473, 2020.
[27]Z. Liu et al., "FINER: Flexible spectral-bias tuning in Implicit NEural Representation by Variable-periodic Activation Functions," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 2713-2722.
[28]H. Zhu et al., "FINER++: Building a Family of Variable-periodic Functions for Activating Implicit Neural Representation," arXiv preprint arXiv:2407.19434, 2024.
[29]Y. Bai, C. Dong, C. Wang, and C. Yuan, "Ps-nerv: Patch-wise stylized neural representations for videos," in 2023 IEEE International Conference on Image Processing (ICIP), 2023: IEEE, pp. 41-45.
[30]C. Gomes, R. Azevedo, and C. Schroers, "Video compression with entropy-constrained neural representations," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 18497-18506.
[31]G. Gao, H. M. Kwan, F. Zhang, and D. Bull, "PNVC: Towards practical INR-based video compression," arXiv preprint arXiv:2409.00953, 2024.
[32]H. M. Kwan, G. Gao, F. Zhang, A. Gower, and D. Bull, "HiNeRV: Video Compression with Hierarchical Encoding-based Neural Representation," Advances in Neural Information Processing Systems, vol. 36, 2024.
[33]Q. Chang, H. Yu, S. Fu, Z. Zeng, and C. Chen, "MNeRV: A Multilayer Neural Representation for Videos," arXiv preprint arXiv:2407.07347, 2024.
[34]M. Tarchouli, T. Guionnet, M. Riviere, W. Hamidouche, M. Outtas, and O. Deforges, "Res-NeRV: Residual Blocks For A Practical Implicit Neural Video Decoder," in 2024 IEEE International Conference on Image Processing (ICIP), 2024: IEEE, pp. 3751-3757.
[35]Q. Cao, D. Zhang, and X. Zhang, "Saliency-Based Neural Representation for Videos," in International Conference on Pattern Recognition, 2025: Springer, pp. 389-403.
[36]J. Chen et al., "Run, Don't walk: Chasing higher FLOPS for faster neural networks," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 12021-12031.
[37]D. Oktay, J. Ballé, S. Singh, and A. Shrivastava, "Scalable model compression by entropy penalized reparameterization," arXiv preprint arXiv:1906.06624, 2019.
[38]H. Yan, Z. Ke, X. Zhou, T. Qiu, X. Shi, and D. Jiang, "DS-NeRV: Implicit Neural Video Representation with Decomposed Static and Dynamic Codes," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 23019-23029.
[39]L. Tang, J. Zhu, X. Zhang, L. Zhang, S. Ma, and Q. Huang, "CANeRV: Content Adaptive Neural Representation for Video Compression," arXiv preprint arXiv:2502.06181, 2025.
[40]A. Radford et al., "Learning transferable visual models from natural language supervision," in International conference on machine learning, 2021: PMLR, pp. 8748-8763.
[41]P. Wang et al., "Ofa: Unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework," in International Conference on Machine Learning, 2022: PMLR, pp. 23318-23340.
[42]B. He et al., "Towards scalable neural representation for diverse videos," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 6132-6142.
[43]B. Jacob et al., "Quantization and training of neural networks for efficient integer-arithmetic-only inference," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2704-2713.
[44]Y. Bengio, N. Léonard, and A. Courville, "Estimating or propagating gradients through stochastic neurons for conditional computation," arXiv preprint arXiv:1308.3432, 2013.
[45]J. Shi, "Good features to track," in 1994 Proceedings of IEEE conference on computer vision and pattern recognition, 1994: IEEE, pp. 593-600.
[46]B. D. Lucas and T. Kanade, "An iterative image registration technique with an application to stereo vision," in IJCAI'81: 7th international joint conference on Artificial intelligence, 1981, vol. 2, pp. 674-679.
[47]D. A. Huffman, "A method for the construction of minimum-redundancy codes," Proceedings of the IRE, vol. 40, no. 9, pp. 1098-1101, 1952.
[48]A. Mercat, M. Viitanen, and J. Vanne, "UVG dataset: 50/120fps 4K sequences for video codec analysis and development," in Proceedings of the 11th ACM Multimedia Systems Conference, 2020, pp. 297-302.
[49]H. Wang et al., "MCL-JCV: a JND-based H. 264/AVC video quality assessment dataset," in 2016 IEEE international conference on image processing (ICIP), 2016: IEEE, pp. 1509-1513.
[50]P. Goyal, "Accurate, large minibatch SG D: training imagenet in 1 hour," arXiv preprint arXiv:1706.02677, 2017.
[51]I. Loshchilov and F. Hutter, "Stochastic gradient descent with warm restarts," in Proceedings of the 5th International Conference on Learning Representations, pp. 1-16.
[52]P. K. Diederik, "Adam: A method for stochastic optimization," (No Title), 2014.
[53]G. Bjontegaard, "Calculation of average PSNR differences between RD-curves," ITU SG16 Doc. VCEG-M33, 2001.

 電子全文(網際網路公開日期:20300522)


以下內文出自: https://ndltd.ncl.edu.tw/cgi-bin/gs32/gsweb.cgi/login?o=dnclcdr&s=id=%22113TIT00441025%22.&searchmod
美女直播不囉嗦/最優質美女直播24小時上線中/影音視訊交友聊天伴遊/更多互動等您加入體驗!
美女直播限時看麻豆入會送50點。最聽話。最辣。最開放。最火紅。服務: 色情視訊memeshow免費視訊美女影片
MEMESHOW現在入會免費試看 — 限制火辣級辣妹、曖昧輔導級正妹、清純普遍級 妹妹,你想聊天的類型應有盡有,立即入會。
撫慰你的心靈,滿足你的慾望,各式互動讓你興奮,追求快感,刺激無限。偷窺癡漢聊天室免費聊天室
服務:美女演出、直播視訊、1v1獨播。
創14季以來新高!南亞科第3季營收年增130%
文章標籤
全站熱搜
創作者介紹
創作者 memeshow 的頭像
memeshow

直播memeshow直播最優質人氣主播 | 給想看私密空間

memeshow 發表在 痞客邦 留言(0) 人氣(0)