This note expands on H4 from the Four Research Hypotheses.
The Observation
The first thing that struck me when I began studying noise was that everything is discussed in terms of "dB (decibels)."
A 70 dB construction sound and a 70 dB modified motorcycle exhaust are numerically identical. Yet the stress induced by the latter is orders of magnitude greater. This is because the modified motorcycle sound carries the context of "someone intentionally producing it," "arriving without warning," and "occurring even in the middle of the night."
dB is an indicator of the physical energy of sound, not an indicator of human stress.
Three Limitations of Current Indicators
1. The Trap of Time-Averaging
LAeq (equivalent continuous sound level), used in environmental standards, is a time-averaged value. A sudden blast of noise at 2:00 a.m. is diluted and disappears within a 24-hour average. The very moment when harm is most acute ceases to exist in the indicator.
2. The Semantic Content of Sound Is Lost
dB carries no information about "what the sound is." Natural sound at 80 dB (a waterfall) and artificial sound at 80 dB (a truck) produce entirely different stress responses, as numerous studies have demonstrated. Yet regulatory limits do not distinguish between sound sources.
3. Individual Differences Are Ignored
The same 60 dB may be a "sound that makes you want to flee" for a sensory-hypersensitive individual, and a "sound you do not notice" for a typical member of the public. Current indicators assume an "average person" and cannot account for differences arising from sensory traits.
Hypothesis
Even at the same dB level, stress responses vary substantially depending on the type of sound (modified motorcycle exhaust, loudspeaker announcements, construction noise, etc.) and the context of occurrence (time of day, whether the sound is anticipated, the listener's state). Current dB-averaged indicators are insufficient as proxy measures for sensory stress.
A Proposed New Metric: The dB-Stress Divergence Index
We aim to develop a "dB-Stress Divergence Index" that models the gap between measured dB and subjective stress.
- Measured dB = 60, but subjective stress = 90 equivalent --> divergence score +30 (high context-dependent stress)
- Measured dB = 70, but subjective stress = 40 equivalent --> divergence score -30 (familiar or anticipated sound)
By identifying the sound sources, locations, and times of day with the highest divergence scores, it becomes possible to visualize the "situations where people are truly struggling" that dB-based regulation cannot capture.
Existing Alternative Metrics: The Psychoacoustic Approach
The limitations of dB are recognized by researchers, and the field of psychoacoustics has developed several alternative metrics.
Loudness (sone/phon): A metric for "perceived magnitude" that reflects human auditory characteristics. It corrects for frequency-dependent sensitivity differences but does not account for the semantic content of sound sources.
Sharpness (acum): A metric reflecting the prevalence of high-frequency components. It can quantify the sensation of "piercing" sound but cannot address suddenness or context.
Roughness and fluctuation strength: Metrics that capture temporal variation patterns in sound. Irregular sounds such as modified motorcycle exhaust produce high roughness values, but the psychological context of "who intentionally produced it" lies outside their scope.
Each of these metrics is closer to human perception than dB, yet none can incorporate "context" --- time of day, the cause of the sound, or the state of the listener. The "dB-Stress Divergence Index" envisioned by this project aims to go one step further by using these psychoacoustic metrics as a foundation while explicitly modeling the divergence from subjective evaluations.
Technical Approach: Automated Sound Source Classification via TinyML
To identify the "type" of sound source, we are exploring automated sound source classification using TinyML technology.
Specifically, we plan to run Google's YAMNet model (AudioSet 521-class classification) on a Raspberry Pi for real-time sound source classification. YAMNet can identify categories such as "engine noise," "car horn," "birdsong," and "human conversation," which can then be connected to analyses such as "at this location during this time period, modified motorcycle exhaust is the primary contributor to subjective stress."
The advantage of TinyML (machine learning on microcontrollers) is that audio data can be processed on-device, transmitting only the classification results. This is critically important from a privacy protection standpoint. Street-level audio recording could lead to the interception of conversations, but recording only sound source classification labels does not constitute the collection of personal information.
Using platforms such as Edge Impulse, it is also possible to fine-tune the YAMNet base model to specialize in the sound environment of Bunkyo Ward (文京区). For example, additional categories specific to the Japanese urban environment --- such as "modified motorcycle," "loudspeaker advertisement," "construction noise," and "ambulance" --- can be trained.
Context-dependent noise research targeting the general population does exist, but research specifically targeting sensory-hypersensitive individuals remains a blank. This is where the Quiet City Project's originality lies.
References
Environmental Noise Guidelines for the European Region
World Health Organization (WHO). WHO Regional Office for Europe
Read source
Auditory and non-auditory effects of noise on health
Basner, M. et al.. The Lancet, 383(9925), 1325-1332
Read source
YAMNet: Yet Another Audio Model for Sound Event Detection
Google Research. TensorFlow Hub
Read source
Psychoacoustics: Facts and Models (3rd ed.)
Fastl, H. & Zwicker, E.. Springer
Read source