Propagation-Enhanced Acoustic Biometrics [Under Development]

Noise-Resistant Mobile PIN Authentication against Shoulder-Surfing and Spoofing

← Back to Repository

Abstract

We present I-PIN, the first PIN authentication system that exploits structure-borne propagation acoustics to achieve robust resistance against visual eavesdropping, replay attacks, and environmental noise. The core innovation lies in modeling each PIN tap as a Location-coupled Acoustic Fingerprint (LocAF) that jointly encodes the finger's anatomical traits and the keypad's structural attributes, with path-dependent frequency suppression during structural propagation naturally amplifying fine-grained biometric distinctions.

By exploiting propagation-enhanced acoustic biometrics and introducing a Log-Energy Scaling Ratio (LESR) to model frequency attenuation, we design a novel LESR–wavelet denoising method that preserves fine-grained identity features while resisting noise and replay attacks. A deep learning framework combining LESR-enhanced PANNs with contrastive learning further disentangles user identity from behavioral and environmental variations.

Key Contributions:

  • LocAF Technology: Novel acoustic fingerprinting based on structure-borne sound propagation
  • LESR Denoiser: Advanced noise reduction achieving 59.6 dB SNR improvement
  • Multi-LocAF Fusion: Enhanced accuracy through multi-location acoustic analysis
  • Robust Security: 100% resistance to shoulder-surfing and replay attacks
I-PIN System Overview

I-PIN illustration: I-PIN authenticates users via tap-induced acoustic biometrics, effectively preventing PIN leakage from shoulder-surfing and replay attacks.

I-PIN Application Demo Video

I-PIN Application Demo: Demonstrating the acoustic biometric-based PIN authentication system in real-world usage scenarios

Experimental Results: Our system achieves 99.4% authentication success rate with 0.9% false positive rate across diverse environments (mall, office, subway, taxi) using 27 participants and 31,045 LocAF samples. The system demonstrates superior performance compared to existing acoustic and visual authentication methods.

LocAF Data Demonstration

This section demonstrates the core LocAF (Location-specific Acoustic Fingerprint) signals captured from real user interactions. We present data from 3 users tapping 2 different PIN digits (0, 2), with each digit repeated 2 times to show consistency and variability.

LocAF Data Overview

The following table presents the complete dataset of LocAF signals captured from our dual-microphone array system. Each row represents a specific user-PIN-attempt combination, with corresponding visualizations for time-domain analysis and frequency-domain CWT spectrograms.

LocAF Data Visualization

Schematic of LocAF generation and propagation in a smartphone. A fingertip tap on each PIN keypad excites a LocAF signal that propagates through distinct structure-borne paths to the device's built-in top and bottom microphones.

PIN User Attempt Raw Audio Time Domain Top Mic
CWT Spectrogram
Bottom Mic
CWT Spectrogram
0 u1 c1 Time Domain Analysis Top Mic CWT Spectrogram Bottom Mic CWT Spectrogram
0 u1 c2 Time Domain Analysis Top Mic CWT Spectrogram Bottom Mic CWT Spectrogram
0 u2 c1 Time Domain Analysis Top Mic CWT Spectrogram Bottom Mic CWT Spectrogram
0 u2 c2 Time Domain Analysis Top Mic CWT Spectrogram Bottom Mic CWT Spectrogram
0 u3 c1 Time Domain Analysis Top Mic CWT Spectrogram Bottom Mic CWT Spectrogram
0 u3 c2 Time Domain Analysis Top Mic CWT Spectrogram Bottom Mic CWT Spectrogram
2 u1 c1 Time Domain Analysis Top Mic CWT Spectrogram Bottom Mic CWT Spectrogram
2 u1 c2 Time Domain Analysis Top Mic CWT Spectrogram Bottom Mic CWT Spectrogram
2 u2 c1 Time Domain Analysis Top Mic CWT Spectrogram Bottom Mic CWT Spectrogram
2 u2 c2 Time Domain Analysis Top Mic CWT Spectrogram Bottom Mic CWT Spectrogram
2 u3 c1 Time Domain Analysis Top Mic CWT Spectrogram Bottom Mic CWT Spectrogram
2 u3 c2 Time Domain Analysis Top Mic CWT Spectrogram Bottom Mic CWT Spectrogram

Data Collection Details

  • Total Samples: 12 audio recordings (3 users × 2 PINs × 2 attempts)
  • Audio Format: Dual-channel WAV files (8.0KB each)
  • Sampling: Structure-borne acoustic propagation via dual-microphone array

Visualization Components

Each data entry includes four key components:

  • Raw Audio: Interactive HTML5 audio player for the original dual-channel WAV file recording
  • Time Domain Analysis: Combined waveform visualization showing both top and bottom microphone signals
  • Top Mic CWT Spectrogram: Frequency-time representation (0-16kHz, log scale) of top microphone signal
  • Bottom Mic CWT Spectrogram: Frequency-time representation (0-16kHz, log scale) of bottom microphone signal

LESR Analysis

This section presents the Log-Energy Scaling Ratio (LESR) calculations for the LocAF data shown above, demonstrating how our novel LESR model extracts user-specific biometric features from the acoustic propagation patterns.

LESR Feature Extraction

Mathematical Foundation

The LESR exploits asymmetric propagation paths between top and bottom microphones:

$$\Delta LESR_k(t) = \ln\left(\frac{R_{top}(k,t)}{R_{bot}(k,t)}\right)$$

Where $R_{top}(k,t)$ and $R_{bot}(k,t)$ represent the energy in frequency subband $k$ at time $t$ for top and bottom microphones respectively.

LESR-Based User Discrimination

The LESR model captures user-specific biometric signatures through path-dependent frequency attenuation. Key properties include:

LESR Stability Analysis

Analysis shows LESR consistency across multiple taps of the same digit by the same user, demonstrating the reliability of our biometric approach. The stability is quantified through correlation coefficients and temporal variance measurements.

LESR Data Overview

The following table presents the complete dataset of LESR analysis results for all LocAF signals. Each row represents a specific user-PIN-attempt combination, with corresponding visualizations showing the LESR heatmap (frequency-time analysis) and detailed LESR time series.

LESR Analysis Parameters: 5ms window, 1ms hop, 16 frequency bands, 32kHz sampling rate

PIN User Attempt LESR Heatmap
(Frequency-Time Analysis)
LESR Time Series
(Detailed View)
0 u1 c1 LESR Heatmap (Frequency-Time Analysis) LESR Time Series (Detailed View)
0 u1 c2 LESR Heatmap (Frequency-Time Analysis) LESR Time Series (Detailed View)
0 u2 c1 LESR Heatmap (Frequency-Time Analysis) LESR Time Series (Detailed View)
0 u2 c2 LESR Heatmap (Frequency-Time Analysis) LESR Time Series (Detailed View)
0 u3 c1 LESR Heatmap (Frequency-Time Analysis) LESR Time Series (Detailed View)
0 u3 c2 LESR Heatmap (Frequency-Time Analysis) LESR Time Series (Detailed View)
2 u1 c1 LESR Heatmap (Frequency-Time Analysis) LESR Time Series (Detailed View)
2 u1 c2 LESR Heatmap (Frequency-Time Analysis) LESR Time Series (Detailed View)
2 u2 c1 LESR Heatmap (Frequency-Time Analysis) LESR Time Series (Detailed View)
2 u2 c2 LESR Heatmap (Frequency-Time Analysis) LESR Time Series (Detailed View)
2 u3 c1 LESR Heatmap (Frequency-Time Analysis) LESR Time Series (Detailed View)
2 u3 c2 LESR Heatmap (Frequency-Time Analysis) LESR Time Series (Detailed View)

LESR Analysis Details

  • Total Samples: 12 LESR analysis results (3 users × 2 PINs × 2 attempts)
  • Analysis Parameters: 5ms window, 1ms hop, 16 frequency bands, 32kHz sampling rate

LESR Visualization: Two types of analysis are shown: (1) LESR Heatmap provides frequency-time analysis showing LESR values across different frequency bands over time, and (2) LESR Time Series provides detailed temporal analysis with moving average. Click images to view in full resolution.

I-PIN System Workflow

System Overview

The I-PIN system operates through a three-stage pipeline that transforms raw acoustic signals into reliable biometric authentication decisions.

I-PIN System Workflow

System overview of I-PIN. The pipeline consists of three main components: (i) LESR-based Pre-processing, (ii) Identity Feature Extraction and Disentanglement, and (iii) Authentication with Multiple LocAF Samples.

Stage 1: LESR-based Pre-processing

Leverages on-device acoustic field to suppress noise and extract reliable LocAF segments:

  • Dual-microphone signal acquisition and synchronization
  • LESR calculation and frequency subband analysis
  • LESR-wavelet denoising with adaptive thresholding
  • Multi-resolution Stationary Wavelet Transform (SWT) processing

Stage 2: Identity Feature Extraction

PANNs encoder extracts fine-grained physiological features, contrastive learning disentangles identity from behavioral variations:

  • PANNs-based acoustic feature extraction (2048-dimensional)
  • LESR structure-propagation biometric features
  • Contrastive learning with triplet loss and binary cross-entropy
  • Identity-behavioral variation disentanglement

Stage 3: Multi-LocAF Authentication

Aggregates multiple LocAF samples during PIN entry for robust user authentication:

  • Feature vector comparison with enrolled templates
  • Multi-LocAF aggregation (N=5, M=3 parameters)
  • Similarity scoring and threshold comparison

Technical Components

1. Location-coupled Acoustic Fingerprint (LocAF)

Each PIN tap generates unique LocAF signals encoding finger anatomical traits and keypad structural attributes.

2. Log-Energy Scaling Ratio (LESR) Model

LESR exploits asymmetric propagation paths with key properties: Position-dependent, behavior-resilient, and noise-aware (environmental noise → $\Delta LESR \approx 0$).

3. LESR-Wavelet Denoising Framework

Self-adaptive algorithm: Multi-resolution SWT analysis → Band alignment → Stability scoring (MAD) → Selective reconstruction. Achieves 59.6 dB SNR improvement vs. 33.3 dB best baseline.

4. PANNs+LESR Integration

Hybrid architecture combines PANNs fine-grained acoustic features (2048-dim) with LESR structure-propagation biometric features. Contrastive learning with triplet loss + binary cross-entropy disentangles identity from behavioral variations.

Experimental Results

Overall Performance Metrics

99.4%
Authentication Success Rate
0.9%
False Positive Rate
99.3%
F1-Score
0.9%
Equal Error Rate
27
Participants
31,045
LocAF Samples

Performance Analysis Across Different Scenarios

The following analysis demonstrates I-PIN's robust performance across various temporal conditions, environmental scenarios, user postures, and force levels.

Temporal Performance

Temporal Performance Analysis

Performance consistency over time, showing stable authentication rates across different time periods.

Environmental Scenarios

Environmental Scenarios Performance

Authentication performance across different environmental conditions including mall, office, subway, and taxi scenarios.

User Postures

User Postures Performance

Performance analysis across different user interaction postures: tabletop, palm, and grip positions.

Force Levels

Force Levels Performance

Authentication accuracy across different tap force levels: hard, medium, and light pressure.

Environment-Specific Performance

Environment ASR (%) FPR (%) F1-Score (%) EER (%)
Mall 100.0 0.0 100.0 0.0
Office 100.0 1.0 99.5 0.5
Subway 99.8 1.4 99.2 1.3
Taxi 98.2 0.4 99.0 0.2
Hard Tap 99.8 0.4 99.7 0.2
Light Tap 99.4 0.8 99.3 0.7

Technical Component Performance

Component Metric I-PIN Value Best Baseline
LESR Denoiser SNR Improvement 59.6 dB 33.3 dB
Feature Quality Correlation 99.99% 99.98%
Multi-LocAF ASR Improvement 99.4% 92.6% (single)

Security Analysis

I-PIN provides comprehensive defense against multiple attack vectors through its propagation-enhanced acoustic biometric approach:

Attack Vector Traditional PIN I-PIN Defense Mechanism Effectiveness
Shoulder Surfing ❌ Vulnerable Visual-independent biometric authentication 100% resistant
Zero-effort Attack ❌ Vulnerable User-specific LocAF signatures FPR: 0.06%
Impersonation ❌ Vulnerable Anatomical trait encoding in LocAF FPR: $4.6 \times 10^{-4}$
Replay Attacks ❌ Vulnerable Structure-borne vs. airborne detection FPR: 0%
Acoustic Eavesdropping ❌ Vulnerable On-device structure-borne signal isolation 99.1% resistant
Environmental Noise ❌ Affected LESR-wavelet denoising (59.6 dB SNR) Robust operation

Comprehensive Evaluation

Experimental Setup:

  • Scale: 27 participants, 31,045 LocAF samples, IRB-approved study
  • Devices: 7 smartphones (Vivo X100, Huawei Pura70Pro, Xiaomi 8, Redmi Note 13, Huawei P20, iPhone 13, Xiaomi 14)
  • Environments: Mall, Office, Subway, Taxi + various postures (Tabletop, Palm, Grip) + force levels (Hard, Medium, Light)
  • Data Collection: 44.1 kHz dual-microphone recording across diverse conditions

User Study Results:

  • Device Generalization: Stable performance across 7 different smartphones (ASR: 0.95-0.96, FPR: ~0.01-0.03)
  • User Generalization: Average ASR > 0.98 across all participants with FPR < 0.02
  • PIN Robustness: Consistent performance across all digits (0-9), most achieving perfect success (1.0) with zero false accepts
  • Temporal Stability: 3-month evaluation shows sustained performance with minimal temporal drift

Component-wise Validation:

  • LESR Denoiser Impact: Markedly improves discriminability under strict security budgets (ASR >0.97 at FPR=0.001)
  • Full vs. Partial Components: Removing LESR features degrades performance (F1=0.62, ROC-AUC=0.95)
  • Multi-LocAF Benefit: 92.6% → 99.4% ASR improvement with aggregation
  • Parameter Optimization: K=16 subbands, L=8 SWT levels, (N=5, M=3) aggregation achieve optimal performance