Skip to main content
Institute for Social Vision Design
ISVD-LAB-001Hypothesis
1.6.4

What dB Cannot Measure — The Concept of Context-Dependent Stress

Naoya Yokota
About 11 min read

Even at identical dB levels, stress responses vary dramatically depending on the type of sound, time of day, and whether the sound is anticipated. This note examines the limitations of current dB-averaged indicators and argues for new metrics tailored to sensory-hypersensitive individuals.

This note expands on H4 from the Four Research Hypotheses.

The Observation

The first thing that struck me when I began studying noise was that everything is discussed in terms of "dB (decibels)."

A 70 dB construction sound and a 70 dB modified motorcycle exhaust are numerically identical. Yet the stress induced by the latter is orders of magnitude greater. This is because the modified motorcycle sound carries the context of "someone intentionally producing it," "arriving without warning," and "occurring even in the middle of the night."

dB is an indicator of the physical energy of sound, not an indicator of human stress (Basner et al., 2014).

Three Limitations of Current Indicators

1. The Trap of Time-Averaging

LAeq (equivalent continuous sound level), used in environmental standards, is a time-averaged value. A sudden blast of noise at 2:00 a.m. is diluted and disappears within a 24-hour average. The very moment when harm is most acute ceases to exist in the indicator.

2. The Semantic Content of Sound Is Lost

dB carries no information about "what the sound is." Natural sound at 80 dB (a waterfall) and artificial sound at 80 dB (a truck) produce entirely different stress responses, as numerous studies have demonstrated. Yet regulatory limits do not distinguish between sound sources.

3. Individual Differences Are Ignored

The same 60 dB may be a "sound that makes you want to flee" for a sensory-hypersensitive individual, and a "sound you do not notice" for a typical member of the public. Current indicators assume an "average person" and cannot account for differences arising from sensory traits.

Hypothesis

Even at the same dB level, stress responses vary substantially depending on the type of sound (modified motorcycle exhaust, loudspeaker announcements, construction noise, etc.) and the context of occurrence (time of day, whether the sound is anticipated, the listener's state). Current dB-averaged indicators are insufficient as proxy measures for sensory stress.

A Proposed New Metric: The dB-Stress Divergence Index

We aim to develop a "dB-Stress Divergence Index" that models the gap between measured dB and subjective stress.

  • Measured dB = 60, but subjective stress = 90 equivalent --> divergence score +30 (high context-dependent stress)
  • Measured dB = 70, but subjective stress = 40 equivalent --> divergence score -30 (familiar or anticipated sound)

By identifying the sound sources, locations, and times of day with the highest divergence scores, it becomes possible to visualize the "situations where people are truly struggling" that dB-based regulation cannot capture.

Existing Alternative Metrics: The Psychoacoustic Approach

The limitations of dB are recognized by researchers, and the field of psychoacoustics has developed several alternative metrics.

Loudness (sone/phon): A metric for "perceived magnitude" that reflects human auditory characteristics. It corrects for frequency-dependent sensitivity differences but does not account for the semantic content of sound sources.

Sharpness (acum): A metric reflecting the prevalence of high-frequency components. It can quantify the sensation of "piercing" sound but cannot address suddenness or context.

Roughness and fluctuation strength: Metrics that capture temporal variation patterns in sound. Irregular sounds such as modified motorcycle exhaust produce high roughness values, but the psychological context of "who intentionally produced it" lies outside their scope.

Each of these metrics is closer to human perception than dB, yet none can incorporate "context" --- time of day, the cause of the sound, or the state of the listener. The "dB-Stress Divergence Index" envisioned by this project aims to go one step further by using these psychoacoustic metrics as a foundation while explicitly modeling the divergence from subjective evaluations.

Note that the "dB-Stress Divergence Index" outlined above is a regression-residual-based design measuring the gap between objective dB and subjective stress. Separately, our Phase 2 research plan explores an event-based formulation that models acoustic events directly; the following section records the initial sketch of that approach.

Research Note: An Initial Sketch of the Contrast Index (Conceptual Stage)

This section records a conceptual-stage sketch developed as part of the Phase 2 research plan. It is first presented in earnest in this note. In the current multiplicative form, none of the parameters β, γ, or M have been estimated (α appears below only as a residual reference to a discarded additive form); substantial revisions are anticipated based on the Bunkyo Ward fieldwork (planned for autumn 2026 through spring 2027). This is not an established "ISVD-proposed Contrast Index." This article is a research-plan note, and any citation should refer to it as "ISVD Quiet City Project Phase 2 Research Planning Note (provisional sketch stage)."

Why "Contrast"?

dB measures the total quantity of sound. However, Park et al. (2017) demonstrated in a large-scale population survey that the strongest predictor of non-auditory health effects (depression, insomnia, anxiety) is not the objective noise exposure level (Ldn) but individual noise sensitivity. The premise "dB is not sound" is consistent with this finding.

Our working hypothesis is therefore that stress arises not from absolute dB but from the combination of (i) how much an acoustic event contrasts with the predictable background, (ii) how often the same event repeats, (iii) when it occurs, and (iv) what it is. We provisionally call this combination the Contrast Index (CI).

CI is a different line of inquiry from the "dB-Stress Divergence Index" presented in the preceding section. The divergence index handles "subjective minus objective" residuals via regression, whereas CI handles the perceptual contrast of individual acoustic events on an event basis. These two formulations may eventually be unified, but at this stage they are positioned as parallel sketches.

Initial Sketch of CI

CI is written as a composite of four variables:

CI = M × g(ΔdB) × R^β × T^γ
  • M (Sound Source Mask): a semantic gate weighting sound-source categories by their contribution to annoyance.
  • g(ΔdB) (Contrast Trigger): a threshold function returning 1 when the gap from background L90 exceeds a threshold θ_dB and 0 otherwise. The CI value itself is therefore decoupled from the continuous dB scale.
  • R (Repetition): the number of occurrences of the same sound-source type within a time window.
  • T (Temporal weight): larger at night, smaller during the day.
  • β, γ: unestimated exponents.

The key point is that CI takes as input not a dB value but an event detection ("did a contrast event occur?"). This structural choice is required to keep CI consistent with the lab's core thesis that "dB is not sound."

Distinguishing M=0 from g(ΔdB)=0

In a multiplicative form, both M=0 and g(ΔdB)=0 yield CI=0, but the two zeros mean fundamentally different things:

  • M=0: the sound source falls outside the classified categories (natural sounds such as birdsong or rustling leaves, or cases in which source identification failed). This is a research-design decision not to compute a stress contribution for this category.
  • g(ΔdB)=0: the event does not contrast sufficiently with the background (below threshold θ_dB). Even for the same source category, an event is not detected when contrast is small.

While both produce CI=0 in the arithmetic, observation logs flag them separately so that downstream analyses can distinguish them. This is an important branching point for future refinement of the observation feedback loop.

Grounds for M — From Literature Prior to Field-Based Posterior

We deliberately avoid setting M heuristically (e.g., "ambulance = 0.1, modified motorcycle = 1.5"). Instead, M is estimated in two stages.

Stage 1: Literature prior

Hou et al. (2023) used a hierarchical graph representation learning (HGRL) model to measure correlations between sound-source categories and subjective annoyance ratings in urban soundscapes. Anthropogenic sounds such as horns, buses, screech brakes, and motorcycles show positive correlations with annoyance, while natural sounds such as rustling leaves and bird tweets show negative correlations (the specific correlation values will be incorporated into the Phase 2 plan after careful reading of the original paper).

These measured correlations, normalized to a scale of roughly [0, 2], serve as the literature prior for M. Natural sounds with negative correlations correspond to M < 1, while strong positive correlations (horns, modified exhaust) correspond to M > 1. The exact transformation function (ρ → M, linear vs. logarithmic) will be finalized in the Phase 2 plan after reviewing the literature values.

The independence of dose-response curves across source types in transportation noise is established by Miedema & Oudshoorn (2001) via a meta-analysis (aircraft > road > rail at the same Lden). This finding also provides grounds for designing M as a "category-specific weight."

Stage 2: Field-based posterior update

Using subjective annoyance data collected during the Bunkyo Ward fieldwork, M is updated as a Bayesian posterior. Specifically, annoyance is measured with the ICBEN five-point verbal scale (ISO/TS 15666:2021), and a hierarchical Bayesian regression with the CI components as predictors updates M.

Structural Response to the α·ΔdB Dominance Problem

An earlier draft of this sketch considered an additive form CI = M × (α·ΔdB + β·log(R) + γ·T). However, in this additive form, when α·ΔdB dominates, CI effectively collapses to a linear function of ΔdB, betraying the core thesis that "dB is not sound." Such a structure also conflicts with the Miedema & Oudshoorn meta-analytic finding that dose-response curves are independent across source types — if the same dB level produces different annoyance for different sources, a linear dB term is structurally indefensible.

For this reason, we abandoned the additive form and adopted the multiplicative form above. ΔdB appears only inside the threshold function g(·), and the CI value itself is expressed as a function of M, R, and T.

Position Relative to ISO 12913 and Zwicker PA

CI is not a derivative of ISO 12913. ISO 12913 is a framework for perception by listeners; CI is an indicator on the objective measurement side. They sit at different layers.

Likewise, CI is not a derivative of the Zwicker psychoacoustic annoyance formula (composite of loudness, sharpness, roughness, and fluctuation strength). Zwicker PA computes an instantaneous unpleasantness for a single acoustic sample; it does not include semantic content, repetition, or time-of-day. CI sits independently by incorporating these event-based variables.

In Phase 2 we will verify CI in three directions:

AxisComparatorExpectation
Independence from objective perceptual modelZwicker PADoes CI explain variance not captured by PA?
Alignment with subjective evaluationISO 12913-2 eight perceptual attributes (annoying / chaotic)Positive correlation between CI and annoying score
Alignment with socio-acoustic surveyICBEN five-point scalePositive correlation between CI and %HA

Retraction Conditions

The lab will retract the CI proposal if any of the following hold:

  1. CI offers no independent information beyond what Zwicker PA or LAeq already explain.
  2. CI shows no significant correlation with the ISO 12913 PA-Annoying score.
  3. The posterior distribution of M is too wide (95% credible interval exceeding ±50%) to yield practically meaningful estimates.

The ±50% threshold in retraction condition 3 is a provisional standard at the current stage; its practical applicability will be reassessed using Phase 2 field data. This threshold is not a confirmed value and will be finalized in the Phase 2 plan based on Bayesian estimation conventions and implementation experience.

By recording these retraction conditions ex ante, we aim to reduce the risk that this provisional sketch takes on a life of its own before being established as an independent indicator.

Technical Approach: Automated Sound Source Classification via TinyML

To identify the "type" of sound source, we are exploring automated sound source classification using TinyML technology.

Specifically, we plan to run Google's YAMNet model (AudioSet 521-class classification) on a Raspberry Pi for real-time sound source classification. YAMNet can identify categories such as "engine noise," "car horn," "birdsong," and "human conversation," which can then be connected to analyses such as "at this location during this time period, modified motorcycle exhaust is the primary contributor to subjective stress."

The advantage of TinyML (machine learning on microcontrollers) is that audio data can be processed on-device, transmitting only the classification results. This is critically important from a privacy protection standpoint. Street-level audio recording could lead to the interception of conversations, but recording only sound source classification labels does not constitute the collection of personal information.

Using platforms such as Edge Impulse, it is also possible to fine-tune the YAMNet base model to specialize in the sound environment of Bunkyo Ward (文京区). For example, additional categories specific to the Japanese urban environment --- such as "modified motorcycle," "loudspeaker advertisement," "construction noise," and "ambulance" --- can be trained.

Context-dependent noise research targeting the general population does exist, but research specifically targeting sensory-hypersensitive individuals remains a blank. This is where the Quiet City Project's originality lies.


Related guides: For research hypothesis design methodology, see Logic Model Guide. For an introduction to evidence-based policy making, see Introduction to EBPM.

References

Environmental Noise Guidelines for the European RegionWorld Health Organization (WHO). WHO Regional Office for Europe

Auditory and non-auditory effects of noise on healthBasner, M. et al.. The Lancet, 383(9925), 1325-1332

YAMNet: Yet Another Audio Model for Sound Event DetectionGoogle Research. TensorFlow Hub

Psychoacoustics: Facts and Models (3rd ed.)Fastl, H. & Zwicker, E.. Springer

Joint Prediction of Audio Event and Annoyance Rating in an Urban Soundscape by Hierarchical Graph Representation LearningHou, Y. et al.. Proc. INTERSPEECH 2023, arXiv:2308.11980

Noise sensitivity, rather than noise level, predicts the non-auditory effects of noise in community samplesPark, J. et al.. BMC Public Health

Annoyance from transportation noise: relationships with exposure metrics DNL and DENL and their confidence intervalsMiedema, H. M. E. & Oudshoorn, C. G. M.. Environmental Health Perspectives, 109(4), 409-416

Acoustics — Soundscape — Part 1: Definition and conceptual framework (ISO 12913-1:2014)ISO TC 43/SC 1. International Organization for Standardization

Acoustics — Assessment of noise annoyance by means of social and socio-acoustic surveys (ISO/TS 15666:2021)ISO TC 43/SC 1. International Organization for Standardization

Related Content

Participate in & Support Research

If you're interested in ISVD's research, we welcome your support as a supporting member.