An efficient Short-Time Discrete Cosine Transform and Attentive MultiResUNet framework for Music Source Separation
Thomas Sgouros, Angelos Bousis and Nikolaos Mitianoudis
Department
of Electrical and Computer Engineering,
Democritus University of Thrace,
67100, Xanthi Greece
tsgouros@ee.duth.gr,
bousis.ang@gmail.com,
nmitiano@ee.duth.grDemocritus University of Thrace,
67100, Xanthi Greece
Abstract
The music source separation problem, where the
task at hand is to estimate the audio components that are
present in a mixture, has been the centre of research activity
for a long time. In more recent frameworks, the problem is
tackled by creating deep learning models, which attempt to
extract information from each component by using Short-Time
Fourier Transform (STFT)} spectrograms as input. Most
approaches assume that one source is present at each
time-frequency point, which allows to allocate this point from
the mixture to the desired source. Since this assumption is
very strong and is reported not to hold in practice, there is
a problem that arises from the use of the magnitude of the
STFT as input to these networks, which is the absence of the
Fourier phase information during the separated source
reconstruction.} The recovery of the Fourier phase
information is neither easily tractable, nor computationally
efficient to estimate. In this paper, we propose a novel
Attentive MultiResUNet architecture, that uses real-valued
Short-Time Discrete Cosine Transform data as inputs. This step
avoids the phase recovery problem, by estimating the
appropriate values within the network itself, rather than
employing complex estimation or post-processing algorithms.
The proposed novel network features a U-Net type structure
with residual skip connections and an attention mechanism that
correlates the skip connection and the decoder output at the
previous level.
Audio Samples
Example 1:
Vocals: | Vocals GT : |
Bass: | Bass GT: |
Drums: | Drums GT: |
Other: | Other GT: |
Example 2:
Vocals: | Vocals GT : |
Bass: | Bass GT: |
Drums: | Drums GT: |
Other: | Other GT: |
Example 3:
Vocals: | Vocals GT : |
Bass: | Bass GT: |
Drums: | Drums GT: |
Other: | Other GT: |
Example 4:
Vocals: | Vocals GT : |
Bass: | Bass GT: |
Drums: | Drums GT: |
Other: | Other GT: |
Example 5:
Vocals: | Vocals GT : |
Bass: | Bass GT: |
Drums: | Drums GT: |
Other: | Other GT: |
Example 6:
Vocals: | Vocals GT : |
Bass: | Bass GT: |
Drums: | Drums GT: |
Other: | Other GT: |
Example 7:
Vocals: | Vocals GT : |
Bass: | Bass GT: |
Drums: | Drums GT: |
Other: | Other GT: |
Example 8:
Vocals: | Vocals GT : |
Bass: | Bass GT: |
Drums: | Drums GT: |
Other: | Other GT: |
Example 9:
Vocals: | Vocals GT : |
Bass: | Bass GT: |
Drums: | Drums GT: |
Other: | Other GT: |
Example 10:
Vocals: | Vocals GT : |
Bass: | Bass GT: |
Drums: | Drums GT: |
Other: | Other GT: |