Publications
Publications from our Network
Browse recent research publications from ELIZA students. Our work spans four key areas of AI research across seven German sites. Filter by research area and year to find specific topics.
-
- 14.07.2025
- Foundations of ML
- Foundations of ML: Computer Vision
- ML Systems
(Almost) Free Modality Stitching of Foundation Models
DOI: https://doi.org/10.48550/arXiv.2507.10015Foundation multi-modal models are often designed by stitching of multiple existing pretrained uni-modal models: for example, an image classifier with an text model. This stitching process is performed by training a connector module that aims to align the representation spaces of these uni-modal models towards a multi-modal objective. However, given the complexity of training such connectors on large scale web-based datasets coupled with the ever-increasing number of available pretrained uni-modal models, the task of uni-modal models selection and subsequent connector module training becomes computationally demanding. To address this under-studied critical problem, we propose Hypernetwork Model Alignment (Hyma), a novel all-in-one solution for optimal uni-modal model selection and connector training by leveraging hypernetworks.
-
- 24.04.2025
- Foundations of ML: Computer Vision
- Foundations of ML
A Perspective on Deep Vision Performance with Standard Image and Video Codecs
DOI:Resource-constrained hardware, such as edge devices or cell phones, often rely on cloud servers to provide the re quired computational resources for inference in deep vision models. However, transferring image and video data from an edge or mobile device to a cloud server requires coding to deal with network constraints. The use of standardized codecs, such as JPEG or H.264, is prevalent and required to ensure interoperability. This paper aims to examine the implications of employing standardized codecs within deep vision pipelines. We find that using JPEG and H.264 cod ing significantly deteriorates the accuracy across a broad range of vision tasks and models. For instance, strong com pression rates reduce semantic segmentation accuracy by more than 80% in mIoU. In contrast to previous findings, our analysis extends beyond image and action classification to localization and dense prediction tasks, thus providing a more comprehensive perspective.
-
- 28.03.2025
- Foundations of ML
Structured sampling strategies in Bayesian optimization: evaluation in mathematical and real-world scenarios
DOI: https://doi.org/10.1007/s10845-025-02597-2This study presents a comprehensive evaluation of initial sampling techniques within the context of Bayesian Optimization (BO), a machine learning technique intended for optimization of intricate and expensive functions. The findings reveal that while BO is inherently robust and effective across a wide range of optimization problems, the integration of structured initial sampling methods, such as Latin Hypercube Sampling (LHS) and fractional factorial design (FFD) in the context of Design of Experiments (DoE), can significantly alter its performance. The study highlights how LHS and FFD, followed by BO, can lead to substantial reductions in energy consumption—up to approximately 67.45% compared to average consumption in real-world applications like 3D printing optimization.
-
- 05.03.2025
- Foundations of ML
- Foundations of ML: Computer Vision
- Foundations of ML: Natural Language Processing
- ML Systems
Hyper-Align: Efficient Modality Alignment via Hypernetworks
DOI:Modern approaches to constructing multimodal models involve learning specialized modules known as connectors that align the representations of different unimodal models (for example, VLM – a combination of the Vision modality model and the Language modality model). However, due to the vast scale of multimodal pre-training datasets and large individual models, aligning representations across various pre-trained model combinations becomes computationally expensive. This challenge is further compounded by the frequent release of new models. To address this, we propose a novel method for aligning N combinations of pre-trained models using a hypernetwork called “Hyper-Align”, that approximates the weights of all possible connectors within a fixed compute budget, regardless of N. Our approach is computationally efficient, further retains the performance of multimodal tasks, matching or surpassing that of independently trained pairwise connectors. For instance, Hyper-Align finds the best possible unimodal pair configuration and subsequently generates their multimodal connector weights while being approximately 8× cheaper in terms of FLOP cost compared to state-of-the-art baselines.
-
- 26.12.2024
- Foundations of ML: Computer Vision
Evaluating Self-Supervised Learning in Medical Imaging: A Benchmark for Robustness, Generalizability, and Multi-Domain Impact
DOI:Self-supervised learning (SSL) has emerged as a promis ing paradigm in medical imaging, addressing the chronic challenge of limited labeled data in healthcare settings. While SSL has shown impressive results, existing studies in the medical domain are often limited in scope, focusing on specific datasets or modalities, or evaluating only iso lated aspects of model performance. This fragmented eval uation approach poses a significant challenge, as models deployed in critical medical settings must not only achieve high accuracy but also demonstrate robust performance and generalizability across diverse datasets and varying con ditions. To address this gap, we present a comprehensive evaluation of SSL methods within the medical domain, with a particular focus on robustness and generalizability. Using the MedMNIST dataset collection as a standardized bench mark, we evaluate 8 major SSL methods across 11 different medical datasets. Our study provides an in-depth analysis of model performance in both in-domain scenarios and the detection of out-of-distribution (OOD) samples, while ex ploring the effect of various initialization strategies, model architectures, and multi-domain pre-training. We further assess the generalizability of SSL methods through cross dataset evaluations and the in-domain performance with varying label proportions (1%, 10%, and 100%) to sim ulate real-world scenarios with limited supervision. We hope this comprehensive benchmark helps practitioners and researchers make more informed decisions when applying SSL methods to medical applications.
-
- 26.12.2024
- Foundations of ML
- Foundations of ML: Computer Vision
- Trans-disciplinary Applications
Evaluating Self-Supervised Learning in Medical Imaging: A Benchmark for Robustness, Generalizability, and Multi-Domain Impact
DOI: arXiv:2412.19124Self-supervised learning (SSL) has emerged as a promising paradigm in medical imaging, addressing the chronic challenge of limited labeled data in healthcare settings. While SSL has shown impressive results, existing studies in the medical domain are often limited in scope, focusing on specific datasets or modalities, or evaluating only isolated aspects of model performance. This fragmented evaluation approach poses a significant challenge, as models deployed in critical medical settings must not only achieve high accuracy but also demonstrate robust performance and generalizability across diverse datasets and varying conditions. To address this gap, we present a comprehensive evaluation of SSL methods within the medical domain, with a particular focus on robustness and generalizability. Using the MedMNIST dataset collection as a standardized benchmark, we evaluate 8 major SSL methods across 11 different medical datasets. Our study provides an in-depth analysis of model performance in both in-domain scenarios and the detection of out-of-distribution (OOD) samples, while exploring the effect of various initialization strategies, model architectures, and multi-domain pre-training. We further assess the generalizability of SSL methods through cross-dataset evaluations and the in-domain performance with varying label proportions (1%, 10%, and 100%) to simulate real-world scenarios with limited supervision. We hope this comprehensive benchmark helps practitioners and researchers make more informed decisions when applying SSL methods to medical applications.