Neural Engine Sound Synthesis with Physics-Informed Inductive Biases and Differentiable Signal Processing

Publication date

2026-02-06T12:45:16Z

2026-02-06T12:45:16Z

2025



Abstract

Treball fi de màster de: Master in Sound and Music Computing


Supervisor: Lonce Wyse


Engine sound synthesis is increasingly important in automotive audio and interactive media, yet presents unique challenges for neural audio generation that distinguish it from musical audio paradigms. Unlike sustained musical tones, where periodic oscillations exist inherently in the acoustic vibration, engine sounds emerge from sequential combustion events that generate sharp pressure transients recurring at rates from 600 to over 8000 RPM. This creates acoustic phenomena exhibiting significant inharmonicity, extremely low fundamental frequencies—down to 5 Hz—and rapid temporal sequences with intervals below 2 milliseconds, demanding approaches that can model both precision in timing and complexity in timbral evolution, beyond conventional musical audio assumptions. While existing differentiable digital signal processing (DDSP) methods have demonstrated success across various audio synthesis tasks, they often rely on generic synthesis modules that do not explicitly recognize or incorporate the acoustic principles and physical mechanisms underlying engine sounds. This thesis presents a novel approach to engine sound synthesis through systematic integration of physics-informed inductive biases within the entire differentiable synthesis pipeline. It proposes the Procedural Engines Model (PRCE), a deep learning architecture that combines time-varying embeddings of RPM and torque parameters– including their temporal derivatives– and derived conditioning signals– throttle position and deceleration fuel cutoff (DFCO)– with specialized model heads for physics-informed parameter conversion driving two custom differentiable synthesizer configurations that incorporate domain-specific acoustic principles. To guide learning toward accurate engine timbre reproduction, a custom loss function is introduced that prioritizes spectral energy near engine-order harmonics, drawing inspiration from Campbell diagrams commonly used in noise, vibration, and harshness (NVH) analysis. Engine sounds present a fundamental duality: while in reality a sum of structured noise-like pressure pulses, they manifest as distinctly harmonic acoustic phenomena.This motivates two complementary synthesis strategies that provide contrasting optimization pathways toward the same acoustic target: direct spectral-temporal reconstruction that implicitly reflects the underlying pulse structure, and explicit pulse sequence modeling through acoustic simulation of individual combustion events, their temporal alignment and exhaust system propagation. The PRCE framework implements both perspectives as two configurations. The Harmonic-Plus-Noise (HPN) variant employs modified harmonic synthesis with systematic inharmonicity and temporal-spectral structuring of noise components to model observable acoustic characteristics. The Pulse-Train-Resonator (PTR) conf iguration directly models physical–acoustic phenomena by composing combustion pulses aligned to engine firing patterns and propagating them through differentiable resonator networks simulating exhaust acoustics. Evaluation on procedurally generated engine sound datasets totaling 2.5 hours across varied operating conditions reveals complementary strengths between synthesis approaches. PTR achieves modestly superior validation performance (5.7% improvement in total loss) and demonstrates more consistent training-validation transfer, while HPN shows greater flexibility across diverse engine configurations and robustness to harmonic irregularities. Both variants successfully capture authentic engine acoustic behaviors despite distinct synthesis strategies and their audible signatures. This research demonstrates systematic integration of physics-informed inductive biases into differentiable synthesis architectures, providing a methodological framework applicable to physically-constrained audio generation beyond automotive contexts. The work reveals that domain-specific biases produce distinct acoustic signatures that influence both optimization strategies and perceptual outcomes. To support future research, we openly publish the Procedural Engines Dataset, a comprehensive collection of procedurally generated engine audio with time-aligned control annotations and the complete PRCE model pipeline.

Document Type

Master's final project

Language

English

Subjects and keywords

Tractament del senyal

Recommended citation

This citation was generated automatically.

Rights

Creative Commons license AttributionNonCommercial- NoDerivs 4.0 International

Attribution-NonCommercial-NoDerivatives 4.0 International

https://creativecommons.org/licenses/by-nc-nd/4.0/

This item appears in the following Collection(s)