Propagation-Enhanced Acoustic Biometrics | Noise-Resistant Mobile PIN Authentication [Under Development]

Abstract

We present I-PIN, the first PIN authentication system that exploits structure-borne propagation acoustics to achieve robust resistance against visual eavesdropping, replay attacks, and environmental noise. The core innovation lies in modeling each PIN tap as a Location-coupled Acoustic Fingerprint (LocAF) that jointly encodes the finger's anatomical traits and the keypad's structural attributes, with path-dependent frequency suppression during structural propagation naturally amplifying fine-grained biometric distinctions.

By exploiting propagation-enhanced acoustic biometrics and introducing a Log-Energy Scaling Ratio (LESR) to model frequency attenuation, we design a novel LESR–wavelet denoising method that preserves fine-grained identity features while resisting noise and replay attacks. A deep learning framework combining LESR-enhanced PANNs with contrastive learning further disentangles user identity from behavioral and environmental variations.

Key Contributions:

LocAF Technology: Novel acoustic fingerprinting based on structure-borne sound propagation
LESR Denoiser: Advanced noise reduction achieving 59.6 dB SNR improvement
Multi-LocAF Fusion: Enhanced accuracy through multi-location acoustic analysis
Robust Security: 100% resistance to shoulder-surfing and replay attacks

I-PIN illustration: I-PIN authenticates users via tap-induced acoustic biometrics, effectively preventing PIN leakage from shoulder-surfing and replay attacks.

I-PIN Application Demo Video

I-PIN Application Demo: Demonstrating the acoustic biometric-based PIN authentication system in real-world usage scenarios

                Experimental Results: Our system achieves 99.4% authentication success rate with 0.9% false positive rate 
                across diverse environments (mall, office, subway, taxi) using 27 participants and 31,045 LocAF samples. 
                The system demonstrates superior performance compared to existing acoustic and visual authentication methods.
            

LocAF Data Demonstration

This section demonstrates the core LocAF (Location-specific Acoustic Fingerprint) signals captured from real user interactions. We present data from 3 users tapping 2 different PIN digits (0, 2), with each digit repeated 2 times to show consistency and variability.

LocAF Data Overview

The following table presents the complete dataset of LocAF signals captured from our dual-microphone array system. Each row represents a specific user-PIN-attempt combination, with corresponding visualizations for time-domain analysis and frequency-domain CWT spectrograms.

Schematic of LocAF generation and propagation in a smartphone. A fingertip tap on each PIN keypad excites a LocAF signal that propagates through distinct structure-borne paths to the device's built-in top and bottom microphones.

PIN	User	Attempt
0	u1	c1
0	u1	c2
0	u2	c1
0	u2	c2
0	u3	c1
0	u3	c2
2	u1	c1
2	u1	c2
2	u2	c1
2	u2	c2
2	u3	c1
2	u3	c2

Data Collection Details

Total Samples: 12 audio recordings (3 users × 2 PINs × 2 attempts)
Audio Format: Dual-channel WAV files (8.0KB each)
Sampling: Structure-borne acoustic propagation via dual-microphone array

Visualization Components

Each data entry includes four key components:

Raw Audio: Interactive HTML5 audio player for the original dual-channel WAV file recording
Time Domain Analysis: Combined waveform visualization showing both top and bottom microphone signals
Top Mic CWT Spectrogram: Frequency-time representation (0-16kHz, log scale) of top microphone signal
Bottom Mic CWT Spectrogram: Frequency-time representation (0-16kHz, log scale) of bottom microphone signal

LESR Analysis

This section presents the Log-Energy Scaling Ratio (LESR) calculations for the LocAF data shown above, demonstrating how our novel LESR model extracts user-specific biometric features from the acoustic propagation patterns.

LESR Feature Extraction

Mathematical Foundation

The LESR exploits asymmetric propagation paths between top and bottom microphones:

$$\Delta LESR_k(t) = \ln\left(\frac{R_{top}(k,t)}{R_{bot}(k,t)}\right)$$

Where $R_{top}(k,t)$ and $R_{bot}(k,t)$ represent the energy in frequency subband $k$ at time $t$ for top and bottom microphones respectively.

LESR-Based User Discrimination

The LESR model captures user-specific biometric signatures through path-dependent frequency attenuation. Key properties include:

Position-dependent: Different tap locations exhibit distinct LESR patterns
Behavior-resilient: Consistent across varying tap forces and angles
Noise-aware: Environmental noise results in $\Delta LESR \approx 0$

LESR Stability Analysis

Analysis shows LESR consistency across multiple taps of the same digit by the same user, demonstrating the reliability of our biometric approach. The stability is quantified through correlation coefficients and temporal variance measurements.

LESR Data Overview

The following table presents the complete dataset of LESR analysis results for all LocAF signals. Each row represents a specific user-PIN-attempt combination, with corresponding visualizations showing the LESR heatmap (frequency-time analysis) and detailed LESR time series.

LESR Analysis Parameters: 5ms window, 1ms hop, 16 frequency bands, 32kHz sampling rate

PIN	User	Attempt
0	u1	c1
0	u1	c2
0	u2	c1
0	u2	c2
0	u3	c1
0	u3	c2
2	u1	c1
2	u1	c2
2	u2	c1
2	u2	c2
2	u3	c1
2	u3	c2

LESR Analysis Details

Total Samples: 12 LESR analysis results (3 users × 2 PINs × 2 attempts)
Analysis Parameters: 5ms window, 1ms hop, 16 frequency bands, 32kHz sampling rate

LESR Visualization: Two types of analysis are shown: (1) LESR Heatmap provides frequency-time analysis showing LESR values across different frequency bands over time, and (2) LESR Time Series provides detailed temporal analysis with moving average. Click images to view in full resolution.

I-PIN System Workflow

System Overview

The I-PIN system operates through a three-stage pipeline that transforms raw acoustic signals into reliable biometric authentication decisions.

System overview of I-PIN. The pipeline consists of three main components: (i) LESR-based Pre-processing, (ii) Identity Feature Extraction and Disentanglement, and (iii) Authentication with Multiple LocAF Samples.

Stage 1: LESR-based Pre-processing

Leverages on-device acoustic field to suppress noise and extract reliable LocAF segments:

Dual-microphone signal acquisition and synchronization
LESR calculation and frequency subband analysis
LESR-wavelet denoising with adaptive thresholding
Multi-resolution Stationary Wavelet Transform (SWT) processing

Stage 2: Identity Feature Extraction

PANNs encoder extracts fine-grained physiological features, contrastive learning disentangles identity from behavioral variations:

PANNs-based acoustic feature extraction (2048-dimensional)
LESR structure-propagation biometric features
Contrastive learning with triplet loss and binary cross-entropy
Identity-behavioral variation disentanglement

Stage 3: Multi-LocAF Authentication

Aggregates multiple LocAF samples during PIN entry for robust user authentication:

Feature vector comparison with enrolled templates
Multi-LocAF aggregation (N=5, M=3 parameters)
Similarity scoring and threshold comparison

Technical Components

1. Location-coupled Acoustic Fingerprint (LocAF)

Each PIN tap generates unique LocAF signals encoding finger anatomical traits and keypad structural attributes.

2. Log-Energy Scaling Ratio (LESR) Model

LESR exploits asymmetric propagation paths with key properties: Position-dependent, behavior-resilient, and noise-aware (environmental noise → $\Delta LESR \approx 0$).

3. LESR-Wavelet Denoising Framework

Self-adaptive algorithm: Multi-resolution SWT analysis → Band alignment → Stability scoring (MAD) → Selective reconstruction. Achieves 59.6 dB SNR improvement vs. 33.3 dB best baseline.

4. PANNs+LESR Integration

Hybrid architecture combines PANNs fine-grained acoustic features (2048-dim) with LESR structure-propagation biometric features. Contrastive learning with triplet loss + binary cross-entropy disentangles identity from behavioral variations.

Experimental Results

Overall Performance Metrics

99.4%

Authentication Success Rate

0.9%

False Positive Rate

99.3%

F1-Score

0.9%

Equal Error Rate

27

Participants

31,045

LocAF Samples

Performance Analysis Across Different Scenarios

The following analysis demonstrates I-PIN's robust performance across various temporal conditions, environmental scenarios, user postures, and force levels.

Temporal Performance

Performance consistency over time, showing stable authentication rates across different time periods.

Environmental Scenarios

Authentication performance across different environmental conditions including mall, office, subway, and taxi scenarios.

User Postures

Performance analysis across different user interaction postures: tabletop, palm, and grip positions.

Force Levels

Authentication accuracy across different tap force levels: hard, medium, and light pressure.

Environment-Specific Performance

Environment	ASR (%)	FPR (%)	F1-Score (%)	EER (%)
Mall	100.0	0.0	100.0	0.0
Office	100.0	1.0	99.5	0.5
Subway	99.8	1.4	99.2	1.3
Taxi	98.2	0.4	99.0	0.2
Hard Tap	99.8	0.4	99.7	0.2
Light Tap	99.4	0.8	99.3	0.7

Technical Component Performance

Component	Metric	I-PIN Value	Best Baseline
LESR Denoiser	SNR Improvement	59.6 dB	33.3 dB
Feature Quality	Correlation	99.99%	99.98%
Multi-LocAF	ASR Improvement	99.4%	92.6% (single)

Security Analysis

I-PIN provides comprehensive defense against multiple attack vectors through its propagation-enhanced acoustic biometric approach:

Attack Vector	Traditional PIN	I-PIN Defense Mechanism	Effectiveness
Shoulder Surfing	❌ Vulnerable	Visual-independent biometric authentication	100% resistant
Zero-effort Attack	❌ Vulnerable	User-specific LocAF signatures	FPR: 0.06%
Impersonation	❌ Vulnerable	Anatomical trait encoding in LocAF	FPR: $4.6 \times 10^{-4}$
Replay Attacks	❌ Vulnerable	Structure-borne vs. airborne detection	FPR: 0%
Acoustic Eavesdropping	❌ Vulnerable	On-device structure-borne signal isolation	99.1% resistant
Environmental Noise	❌ Affected	LESR-wavelet denoising (59.6 dB SNR)	Robust operation

Comprehensive Evaluation

Experimental Setup:

Scale: 27 participants, 31,045 LocAF samples, IRB-approved study
Devices: 7 smartphones (Vivo X100, Huawei Pura70Pro, Xiaomi 8, Redmi Note 13, Huawei P20, iPhone 13, Xiaomi 14)
Environments: Mall, Office, Subway, Taxi + various postures (Tabletop, Palm, Grip) + force levels (Hard, Medium, Light)
Data Collection: 44.1 kHz dual-microphone recording across diverse conditions

User Study Results:

Device Generalization: Stable performance across 7 different smartphones (ASR: 0.95-0.96, FPR: ~0.01-0.03)
User Generalization: Average ASR > 0.98 across all participants with FPR < 0.02
PIN Robustness: Consistent performance across all digits (0-9), most achieving perfect success (1.0) with zero false accepts
Temporal Stability: 3-month evaluation shows sustained performance with minimal temporal drift

Component-wise Validation:

LESR Denoiser Impact: Markedly improves discriminability under strict security budgets (ASR >0.97 at FPR=0.001)
Full vs. Partial Components: Removing LESR features degrades performance (F1=0.62, ROC-AUC=0.95)
Multi-LocAF Benefit: 92.6% → 99.4% ASR improvement with aggregation
Parameter Optimization: K=16 subbands, L=8 SWT levels, (N=5, M=3) aggregation achieve optimal performance

PIN	User	Attempt
0	u1	c1
0	u1	c2
0	u2	c1
0	u2	c2
0	u3	c1
0	u3	c2
2	u1	c1
2	u1	c2
2	u2	c1
2	u2	c2
2	u3	c1
2	u3	c2

PIN	User	Attempt
0	u1	c1
0	u1	c2
0	u2	c1
0	u2	c2
0	u3	c1
0	u3	c2
2	u1	c1
2	u1	c2
2	u2	c1
2	u2	c2
2	u3	c1
2	u3	c2

PIN	User	Attempt
0	u1	c1
0	u1	c2
0	u2	c1
0	u2	c2
0	u3	c1
0	u3	c2
2	u1	c1
2	u1	c2
2	u2	c1
2	u2	c2
2	u3	c1
2	u3	c2

PIN	User	Attempt
0	u1	c1
0	u1	c2
0	u2	c1
0	u2	c2
0	u3	c1
0	u3	c2
2	u1	c1
2	u1	c2
2	u2	c1
2	u2	c2
2	u3	c1
2	u3	c2

PIN	User	Attempt
0	u1	c1
0	u1	c2
0	u2	c1
0	u2	c2
0	u3	c1
0	u3	c2
2	u1	c1
2	u1	c2
2	u2	c1
2	u2	c2
2	u3	c1
2	u3	c2

PIN	User	Attempt
0	u1	c1
0	u1	c2
0	u2	c1
0	u2	c2
0	u3	c1
0	u3	c2
2	u1	c1
2	u1	c2
2	u2	c1
2	u2	c2
2	u3	c1
2	u3	c2