《The challenge of realistic music generation: modelling raw audio at scale》论文阅读笔记

时间：2018-07-07 18:39:57 阅读：160 评论：0 收藏：0 [点我收藏+]

The challenge of realistic music generation: modelling raw audio at scale

作者：Deep mind三位大神

出处：NIPS 2018

Abstract

首先提出了基于表达方式的音乐生成（high-level representations such as scoresor MIDI）有一些自己的问题，经过高度抽象后，音乐中的一些细节特征损失掉了，从而导致perception of musicality and realism 的损失。本文的音乐数据生成在raw audio domain中进行。autoregressive models（自回归模型）在处理波形speech数据中表现不俗，但在处理音乐时，we find them biased towards capturing local signal structure at the expense of modelling long-range correlations，于是本文提出autoregressive discrete autoencoders (ADAs) 帮助AR model capture long-range correlations in waveforms。

Introduction

强调了music在不同的timescale上展现的structure特性，并且列出了midi等表示形式的限制，主要还是在丢失音乐性相关细节和乐器相关细节上。

1.1 raw audio signal

吹了一波wave signal的好处，优势，和上面提到的midi做比较，并指出在wave形式下建模更具挑战性和难度。

1.2 相关生成模型

相比于表示型数据，audio waveforms生成模型的研究历史并不长，原因是：This was long thought to be infeasible due to the scale of the problem, as audio signals are often sampled at rates of 16 kHz or higher（不太明白为什么，应该是采样成本较高）. 近期的AR模型采用step步进的方式来进行生成，如Wavenet，VRNN，WaveRNN，SampleRNN，解决了采样成本的问题，这里也提到了用GAN来生成波形文件。

贡献：1.提出文献关注点较少的raw audio domain的生成模型，可以作为benchmark测试ability of a model to capture long-range structure in data

2. We investigate the capabilities of autoregressive models for this task, and demonstrate a computationally efficient method to enlarge their receptive fields using autoregressive discrete autoencoders (ADAs)

3. introduce the argmax autoencoder (AMAE) as an alternative to vector quantisation variational autoencoders (VQ-VAE)

Scaling up autoregressive models for music

要为long-range structure建模，需要enlarge the receptive fields，wavenet，sampleRNN都提出自己的方式来扩大接受野，但内存限制很容易触及天花板

（未完待续）

重要参考文献：

Arecurrent latent variable model for sequential data

Experiments in musical intelligence

Synthesizing audio with generative adversarial networks

Samplernn: An unconditional end-to-end neural audio generation model

《The challenge of realistic music generation: modelling raw audio at scale》论文阅读笔记

原文：https://www.cnblogs.com/punkcure/p/9277681.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)