IKANDY Smart Auto — Per-Process Audio Dominance Detection with Metadata Backend Arbitration

01 — Abstract

What This Is

This document discloses a novel method implemented in IKANDY — a desktop music visualizer for Windows — for automatically detecting which audio-producing process on a host system is the dominant source, and silently switching the application's metadata retrieval backend to the most appropriate provider for that process. The method, referred to internally as Smart Auto, combines per-process audio session state inspection with backend arbitration, hysteresis dwell timing, candidate persistence across brief silent passages, and a denylist gate that prevents communication and broadcasting applications from incorrectly being promoted as the music source.

Purpose of this publication

This document is intended to establish prior art. Any subsequent patent claim covering this method or substantially similar methods is anticipated by this disclosure. IKANDY's source code repository provides timestamped evidence of invention predating this publication.

02 — Problem

The Listening Environment
Is Rarely Uniform

Music visualizer applications that display track metadata (title, artist, album art, synced lyrics) face a fundamental challenge: a user's listening environment is rarely uniform within a session. A user may listen via Spotify, then switch to VLC for a local file, then open a browser to stream a DJ set. Each source exposes metadata through a different mechanism with different limitations:

Source App	Native Metadata API	Limitations
Spotify	PKCE OAuth Web API	Requires user-supplied client ID; fails silently if another app is producing audio
VLC	HTTP API (Lua interface)	Must be manually enabled in VLC preferences; reports minimal metadata for most local files
foobar2000	Beefweb REST plugin	Component must be installed; port is user-configurable
Browser / other	Windows SMTC	Best-effort; rich for some apps (YouTube Music, Spotify Web), sparse or empty for others

Prior art (Windows SMTC widgets, last.fm scrobblers, unified Now Playing daemons) reads from a single metadata source or polls all sources indiscriminately with no awareness of which process is currently producing audio. No existing system arbitrates metadata backends based on per-process audio production state.

03 — Method

The Novel Method

Smart Auto operates through six discrete steps, each adding a layer of robustness against the failure modes of naïve dominance detection.

STEP 01 Per-Process Audio Session Enumeration

The Windows Core Audio API family is queried via the electron-loopback-audio bridge (which invokes IAudioSessionManager2 and IAudioSessionEnumerator) to enumerate all active audio sessions on the default render endpoint. For each session, the owning process ID is captured along with the session's current Active / Inactive state flag. Process name is resolved from PID via the Windows process table.

The enumeration is polled on an interval of approximately 500 milliseconds.

STEP 02 Dominance Detection via Session State

Rather than relying on audio peak level (which fluctuates dramatically during natural quiet passages, dynamic mixes, and silent intros), Smart Auto uses the WASAPI session activity flag as the primary signal. A session reports Active when its owning process is producing audio output of any kind, regardless of amplitude. Sessions in the Inactive state are excluded from dominance evaluation.

When multiple sessions are simultaneously Active, dominance is resolved by an allowlist-driven priority order: applications known to be primary music producers (Spotify, foobar2000, VLC) outrank applications known to be ancillary audio producers.

// Simplified dominance evaluation
function getDominantProcess(sessions) {
  const active = sessions.filter(s => s.state === 'Active');
  if (active.length === 0) return null;
  const filtered = active.filter(s => !IGNORE_LIST.has(s.name));
  return filtered.sort(byAllowlistPriority)[0] ?? null;
}

STEP 03 Denylist Gate (False-Promotion Guard)

A small ignore list of processes that frequently produce audio but are never the user's intended music source is applied before allowlist priority ordering. These include communication clients (Discord, Microsoft Teams, Zoom, Slack) and broadcasting/recording software (OBS Studio variants).

Without this gate, a Discord notification arriving during VLC playback could trigger a backend switch to Discord — a process whose audio metadata is meaningless to a music visualizer. The denylist ensures these processes can still be manually selected by a user but cannot be automatically promoted to the dominant source.

STEP 04 Candidate Persistence Across Silence

WASAPI flips a session's state from Active to Inactive during silent passages within a track (intros, breakdowns, song-to-song gaps). A naïve dominance detector would drop the candidate during these passages and re-detect when audio resumes, producing thrashing UI state and incorrect backend selection during quiet moments.

To address this, Smart Auto maintains the previously-dominant candidate across short silent passages: if no process is currently Active but a previously-dominant process is still running, the candidate is retained for a grace period rather than dropped immediately. This produces stable behavior across natural quiet passages while still releasing the candidate when the user genuinely stops playback.

STEP 05 Hysteresis Dwell (Anti-Thrash)

A new candidate must sustain dominance for a minimum dwell period of approximately 1500 milliseconds before a backend switch is committed. This prevents rapid toggling when two applications produce audio briefly in succession.

// Hysteresis dwell guard
let candidate = null;
let candidateSince = 0;
const DWELL_MS = 1500;

function evaluateSwitch(dominant, now) {
  if (dominant?.name !== candidate?.name) {
    candidate = dominant;
    candidateSince = now;
    return;
  }
  if (candidate && (now - candidateSince) >= DWELL_MS) {
    commitBackendSwitch(candidate);
  }
}

STEP 06 Metadata Backend Mapping & Silent Switch

The committed dominant process is mapped to a metadata backend via a priority table. The switch is performed without interrupting playback or visualization, with a transient UI indicator briefly identifying the new source (e.g., "foobar2000 metadata active") before auto-dismissing.

Detected Process	Backend Selected	Fallback if Unavailable
`Spotify.exe`	Spotify PKCE Web API	SMTC
`vlc.exe`	VLC HTTP Lua API	SMTC
`foobar2000.exe`	Beefweb REST API	SMTC
`chrome.exe`, `msedge.exe`, `firefox.exe`	SMTC	—
Unknown / unmatched	SMTC	—

If the selected backend's prerequisite is not configured (e.g., VLC's HTTP interface is disabled), the system falls back to SMTC automatically and surfaces a non-blocking UI hint indicating the configuration step the user can take to unlock the richer backend.

Tier 2 SMTC Discovery

A parallel mechanism queries Windows' GlobalSystemMediaTransportControlsSessionManager via PowerShell to discover applications that register with SMTC but are not in the static allowlist (for example, Plex, Kodi, Apple Music, and various streaming clients). When such an application is detected as the active SMTC source AND a corresponding process is running on the host, it bypasses the allowlist gate and is eligible for dominance promotion. This provides extensibility without requiring an exhaustive allowlist of every media-aware application on Windows.

Backend Pause on Switch

When Smart Auto commits a backend switch, it issues a fire-and-forget pause command to the previous backend via that backend's native control API: Spotify's PUT /me/player/pause, VLC's ?command=pl_pause, or foobar2000's Beefweb POST /api/player/pause. This prevents the previous source from continuing to produce audio simultaneously with the newly-selected source. The previous-backend pause is fire-and-forget — its failure (e.g., Spotify Web API returning 404 when IKANDY is not registered as a Spotify Connect device) does not block the switch.

04 — Novelty

What Makes This Novel

Claimed Novel Combination

Existing solutions (SMTC widgets, last.fm scrobblers, unified Now Playing daemons, browser media-key handlers) read from a single metadata source or poll all sources equally with no audio-production awareness. No prior system uses per-process WASAPI audio session state as the arbitration signal for selecting a metadata retrieval backend, combines it with candidate persistence across silent passages, applies a process-identity-based denylist to prevent false promotion from communication apps, and governs backend switching via hysteresis in a desktop music visualizer context.

The combination of these elements constitutes the claimed novel method:

05 — Implementation

Implementation Context

Smart Auto is implemented in IKANDY, an Electron-based music visualizer running on Windows. The audio session enumeration layer uses the electron-loopback-audio package, which provides a Node.js binding to the Windows Core Audio API. The metadata backend layer communicates via IPC between the Electron main process and the renderer. The visualizer itself renders via Butterchurn, a WebGL implementation of the MilkDrop preset format.

The Tier 2 SMTC discovery layer invokes PowerShell to query GlobalSystemMediaTransportControlsSessionManager, parses the resulting JSON, and cross-references the reported application user model IDs against the running process table.

Source code is maintained at github.com/IKANDYapp/IKANDY. Commit history provides timestamped evidence of invention predating this publication.

06 — Prior Art Search

Prior Art Search Summary

A search of npm, GitHub, and general web sources conducted May 2026 found no implementation combining the elements listed in §04 above. Existing packages such as electron-loopback-audio address system-level audio capture but do not perform process-aware metadata routing. Windows SMTC provides a unified metadata surface but does not select backends based on process identity, audio session state, or hysteresis-governed dominance.

Last.fm scrobblers and similar daemons consume metadata from a single source per session and require manual configuration to switch sources. They do not perform automatic arbitration based on audio production state.