Generating Realistic Sound with Prosthetic Hand: A Reinforcement Learning Approach

Korea University1, University of Illinois Urbana-Champaign2, DGIST3
IEEE Engineering in Medicine and Biology Society (EMBC) 2024

Abstract

In this study, we tackle the complex task of enabling prosthetic hands to accurately reproduce sounds, a crucial aspect for distinguishing between different materials through auditory feedback. Sound identification, such as discerning a drywall tap from that on a brick wall, significantly enhances the functionality and user experience of prosthetic devices. However, achieving this level of auditory feedback in prosthetic hands poses considerable challenges. We utilize reinforcement learning (RL) techniques to train prosthetic hands in emulating human-like sound characteristics, focusing on key auditory signals like amplitude and onset timing. Our approach integrates a detailed analysis of these sound attributes to direct the prosthetic hand's movements for the sound generation that mimics natural human actions. We developed a tailored reward function incorporating amplitude, onset strength, and timing criteria to ensure the prosthetic hand's movements align closely with the intended human-like sound output.

Overview

MY ALT TEXT

(A) We first record a human-produced tapping sound and extract key features such as amplitude and onset. These features serve as the reference for the prosthetic hand to learn from and imitate. (B) The prosthetic hand then interacts with the environment, in this case, a drum pad, to generate sounds. It learns to adjust its finger movements based on the feedback it receives from the generated sound. We employ a reinforcement learning framework, specifically Proximal Policy Optimization (PPO), to train the hand's control policy. The policy maps the hand's current state to actions that produce the desired tapping motion. (C) To guide the learning process, we define a reward function that incorporates multiple sound features. The reward function compares the generated sound with the reference sound in terms of amplitude, onset strength, onset timing, and the number of hits. By maximizing this reward, the prosthetic hand learns to produce tapping sounds that closely resemble the human-produced reference. The policy is iteratively updated based on the earned rewards using the PPO algorithm until convergence.

Hardware Setting

We used PSYONIC Ability Hand, a prosthetic hand with six degrees of freedom. The hand was mounted on a 6-DOF PAPRAS robot arm, and one finger was controlled while the arm was held in a fixed pose. The sound was recorded using a ZOOM H6 recorder for 1-second intervals. Only mono sound information was used, despite the device's stereo capabilities. An 8-inch Eastar drum practice pad served as the tapping object, struck directly by the Ability hand to produce sound. The prosthetic hand performs a tapping motion while the position of its wrist and the height of the drum pad are fixed and generate sound.


MY ALT TEXT

The hardware setup includes PSYONIC Ability Hand, mounted on a 6-DOF PAPRAS robot arm. Sound recording is performed using a ZOOM H6 recorder. An 8-inch Eastar Drum Practice Pad is utilized as the tapping object.

Experimental Results

Tapping Motion Sequence

Tapping Motion Sequence

Snapshot of prosthetic hand movements for sound generation. (Top) The movement of the trained prosthetic hand when given a single beat sound of one-second duration. (Bottom) The movement of the trained prosthetic hand when given a double beat sound of one-second duration.

Learning Curve

Learning Curve

The learning curve of a prosthetic hand to make tapping sounds, single-beat and double-beats. The blue line is the smoothed reward, and the light blue area indicates the variance.

Waveform Comparison

Waveform Comparison

The waveforms of the reference sound and the generated sound for both single beats (Top) and double beats (Bottom). While the generated sound exhibits some motor noise, the two waveforms are remarkably well aligned along the time axis.

Onset Strength and Timing

Onset Strength and Timing

The onset strength and timing for both the reference sound and the generated sound in single beats (Top) and double beats (Bottom). The timing difference between the two beats in the generated sound closely aligns with the timing difference in the reference.

Demo Video

This video demonstrates the learning process of the prosthetic hand for generating single beat tapping sounds on a drum pad. It showcases the hand's motion from the initial stages of learning to the final learned policy.

Observe how the hand's finger movements gradually improve over the course of training. At the beginning, the tapping motion may appear erratic and inconsistent. However, as the learning progresses, the hand learns to adjust its finger trajectories and timing to create a single beat pattern that closely resembles the reference sound.

By the end of the training, the prosthetic hand exhibits a smooth and precise tapping motion, accurately reproducing the desired single beat sound on the drum pad. This demonstration highlights the effectiveness of our reinforcement learning approach in enabling the prosthetic hand to learn and generate realistic tapping sounds.

Paper

BibTeX

BibTex Code Here