Skip to content

Blinorot/LenslessMic

Repository files navigation

LenslessMic: Lensless Microphone

AboutInstallationPre-trained CheckpointsDatasetHow To UseCitationCreditsLicense

Open In Colab

About

This repository contains an official implementation of "LenslessMic: Audio Encryption and Authentication via Lensless Computational Imaging".

We represent audio as a time-varying array of images, which is captured by a lensless camera for encryption. Lensless reconstruction algorithms are used to recover audio signal. The method is applicable on different types of audio (speech/music) and different codecs. A codec-agnostic model trained on random data can also be used. LenslessMic serves as a robust audio encryption tool with a physical layer of security and as an authentication methods.

Demo samples are provided on the project page together with additional experiments. Models and datasets are stored in the HuggingFace Collection.

We provide a demo notebook here. You can also open it in Colab.

Installation

Install the environment and dependencies:

  1. (Optional) Create and activate new environment using conda or venv (+pyenv). We used PYTHON_VERSION=3.11.7.

    a. conda version:

    # create env
    conda create -n project_env python=PYTHON_VERSION
    
    # activate env
    conda activate project_env

    b. venv (+pyenv) version:

    # create env
    ~/.pyenv/versions/PYTHON_VERSION/bin/python3 -m venv project_env
    
    # alternatively, using default python version
    python3 -m venv project_env
    
    # activate env
    source project_env/bin/activate
  2. Install all required packages

    pip install -r requirements.txt

    Also follow VISQOL repo for the metric installation.

  3. Install pre-commit:

    pre-commit install

NeMo ASR toolkit installation:

  1. Create new environment:

    conda create --name nemo python==3.11.7
    conda activate nemo
  2. Install required packages:

    pip install "nemo_toolkit[asr]"

UTMOS installation:

  1. Create new environment:

    conda create --name utmos python==3.9.7
    conda activate utmos
  2. Install required packages:

    git clone https://huggingface.co/spaces/sarulab-speech/UTMOS-demo
    cd UTMOS-demo
    pip install -r requirements.txt

Raspberry Pi installation (for the dataset collection):

  1. Install requirements:

    pip install -r rpi_requirements.txt
  2. Install rawpy from source:

    git clone https://github.com/LibRaw/LibRaw.git libraw
    git clone https://github.com/LibRaw/LibRaw-cmake.git libraw-cmake
    cd libraw
    git checkout 0.20.0
    cp -R ../libraw-cmake/* .
    cmake .
    sudo make install
    sudo ldconfig
    
    pip install "cython<3"
    git clone --branch v0.16.0 https://github.com/letmaik/rawpy.git
    cd rawpy
    CFLAGS="-I/usr/local/include" LDFLAGS="-L/usr/local/lib" pip install --no-cache-dir .
  3. Install extra packages:

    pip install git+https://github.com/LCAV/LenslessPiCam.git
    pip install git+https://github.com/pvigier/perlin-numpy.git@5e26837db14042e51166eb6cad4c0df2c1907016
    pip install git+https://github.com/ebezzam/slm-controller.git
    
    sudo apt install --reinstall libcamera0 python3-libcamera python3-picamera2
    sudo apt install --reinstall libcamera0 libcamera-apps
    # add simlinks to your env
    ln -s /usr/lib/python3/dist-packages/picamera2 ROOT_DIR/LenslessMic/env/lib/python3.9/site-packages/picamera2
    ln -s /usr/lib/python3/dist-packages/picamera2-0.3.12.egg-info ROOT_DIR/ROOT_DIRLenslessMic/env/lib/python3.9/site-packages/
    ln -s /usr/lib/python3/dist-packages/libcamera ROOT_DIR/LenslessMic/env/lib/python3.9/site-packages/libcamera
    ln -s /usr/lib/python3/dist-packages/v4l2.py ROOT_DIR/LenslessMic/env/lib/python3.9/site-packages/v4l2.py
    ln -s /usr/lib/python3/dist-packages/prctl.py ROOT_DIR/LenslessMic/env/lib/python3.9/site-packages/prctl.py
    ln -s /usr/lib/python3/dist-packages/_prctl.cpython-39-arm-linux-gnueabihf.so ROOT_DIR/LenslessMic/env/lib/python3.9/site-packages/
    ln -s /usr/lib/python3/dist-packages/piexif ROOT_DIR/LenslessMic/env/lib/python3.9/site-packages/piexif
    ln -s /usr/lib/python3/dist-packages/pidng ROOT_DIR/LenslessMic/env/lib/python3.9/site-packages/pidng
    ln -s /usr/lib/python3/dist-packages/simplejpeg ROOT_DIR/LenslessMic/env/lib/python3.9/site-packages/simplejpeg
    ln -s /usr/lib/python3/dist-packages/pykms ROOT_DIR/LenslessMic/env/lib/python3.9/site-packages/pykms
    sudo ldconfig

Pre-trained checkpoints

To download pre-trained Descript Audio Codec (DAC), use the following command.

cd scripts
python3 download_dac.py --remote-path ""

Add --download_original to download original DAC weights. You can download only a specific version of DAC by indicating the path from the HF repo, like this:

cd scripts
python3 download_dac.py --remote-path "16x16_130_16khz/latest/dac/weights.pth"

The weights will be saved to data/dac_exps/. Use download_xcodec.py instead for the X-Codec.

Note

The configs for custom DAC, fine-tuned on Librispeech is located here.

To download pre-trained LenslessMic models use the following command:

cd scripts
python3 download_models.py --remote-path ""

If you want a specific checkpoint, use its folder name with / at the end. For example:

python3 download_models.py --remote-path "32x32_librispeech_mse_ssim_raw_ssim_PSF_Unet4M_U5_Unet4M/"

The description of the models is provided in the model card on HuggingFace.

Dataset

To download ready-to-use dataset, use the following command:

cd scripts
python3 download_dataset.py --repo-id "Blinorot/dataset_name" \
   --remote-path "" \
   --local-dir "MANDATORY_LOCAL_DIR"

Similar to the models' script, you can download only specific subfolders by setting remote-path. You must indicate the local directory. We recommend downloading to data/datasets/dataset_name. By default, our code assumes that the dataset names are renamed to:

  • librispeech for Blinorot/lensless_mic_librispeech.
  • songdescriber for Blinorot/lensless_mic_songdescriber.
  • random for Blinorot/lensless_mic_random.

The descriptions of the datasets are provided in the corresponding dataset cards on HuggingFace.

Dataset collection

To collect data, you need to download DAC/X-Codec weights first via commands written in Pre-trained checkpoints section.

Then, you need to run the following script to save your audio in video format:

python3 -m src.scripts.processing.convert_dataset dataset.part="test-clean" codec.codec_name="16x16_130_16khz"

Note

Choose another part and codec_name if you use another codec/partition. Also modify paths below with the corresponding names.

Then, upload it to HF:

# upload audio
python3 upload_dataset.py --local-dir "PATH_TO_ROOT/data/datasets/librispeech/test-clean/audio" --remote-dir "test-clean/audio" --ignore-unused-audio --ignore-reference-dir "PATH_TO_ROOT/data/datasets/librispeech/test-clean/16x16_130_16khz/lensed/"

# upload video
python3 upload_dataset.py --local-dir "PATH_TO_ROOT/data/datasets/librispeech/test-clean/16x16_130_16khz/lensed" --remote-dir "test-clean/16x16_130_16khz/lensed"

Download/Copy your dataset on the Raspberry Pi to an SSD. For example, call it PATH_TO_RPI_SSD/datasets/librispeech. Also create a new tmp_dir on your SSD. Then, run the following script:

DISPLAY=:0 TMPDIR=PATH_TO_RPI_SSD/tmp_dir python -m src.scripts.measure.collect_dataset_on_device_v3 -cn=collect_dataset_multimask input_dir=PATH_TO_RPI_SSD/datasets/librispeech/test-clean/16x16_130_16khz/lensed output_dir=PATH_TO_RPI_SSD/datasets/librispeech/test-clean/16x16_130_16khz/lensless_measurement n_files=null

If you are generating train and test sets, ensure to use different random seeds mask.seed= and include mask.reference_dir to avoid collisions.

Upload collected dataset on HF directly from your RPi.

cd scripts
python3 upload_dataset.py --local-dir "PATH_TO_PI_SSD/datasets/librispeech/test-clean/16x16_130_16khz/lensless_measurement" --remote-dir "test-clean/16x16_130_16khz/lensless_measurement"

Important

Do not forget to change username, repo-id, and paths to the ones you need. See scripts arguments for more details.

How To Use

We provide an example notebook here.

To train a model, run the following command:

python3 train.py -cn=CONFIG_NAME HYDRA_CONFIG_ARGUMENTS

Where CONFIG_NAME is a config from src/configs and HYDRA_CONFIG_ARGUMENTS are optional arguments. For example, this code will train Learned method from the paper using the default config with CLI modifications:

python3 train.py trainer.override=True dataloader.train.batch_size=1 \
   dataloader.inference.batch_size=1 \
   writer.run_name=32x32_librispeech_mse_ssim_raw_ssim_PSF_Unet4M_U5_Unet4M \
   codec.codec_name=32x32_120_16khz_original \
   reconstruction=32x32 optimizer.lr=1e-4 \
   loss_function.audio_l1_coef=0 loss_function.raw_codec_ssim_coef=1 \
   loss_function.raw_codec_l1_coef=0 transforms=padcrop_train \
   +loss_function.ssim_kernel=7 +loss_function.ssim_sigma=0.5 \
   +loss_function.raw_ssim_kernel=11

To run inference (evaluate the model or save predictions):

python3 inference.py HYDRA_CONFIG_ARGUMENTS

For example, to run inference for the Learned model, run:

python3 inference.py codec.codec_name=32x32_120_16khz_original \
   reconstruction=32x32 \
   inferencer.model_tag=32x32_librispeech_mse_ssim_raw_ssim_PSF_Unet4M_U5_Unet4M

[!NOTE] > inference.py assumes that you moved your model checkpoint to data/lensless_exps dir.

By default, the code saves reconstructed audio and video to data/datasets/reconstructed/{dataset_tag}/{partition}/{model_tag}.

To calculate metrics, use the calculate_metrics.py script. It has the same signature. However, you need to run NeMo first to get ASR transcriptions and speaker embeddings. To do so, use:

cd scripts
python3 run_asr.py --audio-dir PATH_TO_AUDIO_FILES
python3 run_speaker.py --audio-dir PATH_TO_AUDIO_FILES

You can add --recursive to apply ASR on all audio files in all subfolders. For UTMOS metric, follow the UTMOS repository.

We also provide a GAN-based version of the scripts (train_gan.py), in which a discriminator-based loss is added to enhance the reconstruction. However, we did not see any improvements using this approach.

Citation

If you use this repo, please cite it as follows:

@article{grinberg2025lenslessmic,
  title = {LenslessMic: Audio Encryption and Authentication via Lensless Computational Imaging},
  author = {Grinberg, Petr and Bezzam, Eric and Prandoni, Paolo and Vetterli, Martin},
  journal = {arXiv preprint arXiv:2509.16418},
  year = {2025},
}

Credits

This repository is based on PyTorch Project Template. It uses some code from LenslessPiCam project. We also use DAC and X-Codec code for neural audio codecs and some of the losses.

License

License

About

Official Implementation of "LenslessMic: Audio Encryption and Authentication via Lensless Computational Imaging"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published