Publications
Publications from our Network
Browse recent research publications from ELIZA students. Our work spans four key areas of AI research across seven German sites. Filter by research area and year to find specific topics.
-
- 26.04.2025
- Foundations of ML: Computer Vision
Gaussians-to-Life: Text-Driven Animation of 3D Gaussian Splatting Scenes
DOI: arXiv:2411.19233State-of-the-art novel view synthesis methods achieve impressive results for multi-view captures of static 3D scenes. However, the reconstructed scenes still lack "liveliness", a key component for creating 3D experiences. Recently, novel video diffusion models generate realistic videos with complex motion and enable animations of 2D images, however they cannot natively be used to animate 3D scenes as they lack multi-view consistency. To breathe life into the static world, we propose Gaussians2Life, a method for animating parts of high-quality 3D scenes in a Gaussian Splatting representation.
-
- 24.04.2025
- Foundations of ML: Computer Vision
- Foundations of ML
A Perspective on Deep Vision Performance with Standard Image and Video Codecs
DOI:Resource-constrained hardware, such as edge devices or cell phones, often rely on cloud servers to provide the re quired computational resources for inference in deep vision models. However, transferring image and video data from an edge or mobile device to a cloud server requires coding to deal with network constraints. The use of standardized codecs, such as JPEG or H.264, is prevalent and required to ensure interoperability. This paper aims to examine the implications of employing standardized codecs within deep vision pipelines. We find that using JPEG and H.264 cod ing significantly deteriorates the accuracy across a broad range of vision tasks and models. For instance, strong com pression rates reduce semantic segmentation accuracy by more than 80% in mIoU. In contrast to previous findings, our analysis extends beyond image and action classification to localization and dense prediction tasks, thus providing a more comprehensive perspective.
-
- 28.03.2025
- Foundations of ML
Structured sampling strategies in Bayesian optimization: evaluation in mathematical and real-world scenarios
DOI: https://doi.org/10.1007/s10845-025-02597-2This study presents a comprehensive evaluation of initial sampling techniques within the context of Bayesian Optimization (BO), a machine learning technique intended for optimization of intricate and expensive functions. The findings reveal that while BO is inherently robust and effective across a wide range of optimization problems, the integration of structured initial sampling methods, such as Latin Hypercube Sampling (LHS) and fractional factorial design (FFD) in the context of Design of Experiments (DoE), can significantly alter its performance. The study highlights how LHS and FFD, followed by BO, can lead to substantial reductions in energy consumption—up to approximately 67.45% compared to average consumption in real-world applications like 3D printing optimization.
-
- 26.12.2024
- Foundations of ML: Computer Vision
Evaluating Self-Supervised Learning in Medical Imaging: A Benchmark for Robustness, Generalizability, and Multi-Domain Impact
DOI:Self-supervised learning (SSL) has emerged as a promis ing paradigm in medical imaging, addressing the chronic challenge of limited labeled data in healthcare settings. While SSL has shown impressive results, existing studies in the medical domain are often limited in scope, focusing on specific datasets or modalities, or evaluating only iso lated aspects of model performance. This fragmented eval uation approach poses a significant challenge, as models deployed in critical medical settings must not only achieve high accuracy but also demonstrate robust performance and generalizability across diverse datasets and varying con ditions. To address this gap, we present a comprehensive evaluation of SSL methods within the medical domain, with a particular focus on robustness and generalizability. Using the MedMNIST dataset collection as a standardized bench mark, we evaluate 8 major SSL methods across 11 different medical datasets. Our study provides an in-depth analysis of model performance in both in-domain scenarios and the detection of out-of-distribution (OOD) samples, while ex ploring the effect of various initialization strategies, model architectures, and multi-domain pre-training. We further assess the generalizability of SSL methods through cross dataset evaluations and the in-domain performance with varying label proportions (1%, 10%, and 100%) to sim ulate real-world scenarios with limited supervision. We hope this comprehensive benchmark helps practitioners and researchers make more informed decisions when applying SSL methods to medical applications.
-
- 24.12.2024
- Foundations of ML
- Trans-disciplinary Applications
StaR Maps: Unveiling Uncertainty in Geospatial Relations
DOI: https://doi.org/10.48550/arXiv.2412.18356The growing complexity of intelligent transportation systems and their applications in public spaces has increased the demand for expressive and versatile knowledge representation. While various mapping efforts have achieved widespread coverage, including detailed annotation of features with semantic labels, it is essential to understand their inherent uncertainties, which are commonly underrepresented by the respective geographic information systems. Hence, it is critical to develop a representation that combines a statistical, probabilistic perspective with the relational nature of geospatial data. Further, such a representation should facilitate an honest view of the data's accuracy and provide an environment for high-level reasoning to obtain novel insights from task-dependent queries. Our work addresses this gap in two ways. First, we present Statistical Relational Maps (StaR Maps) as a representation of uncertain, semantic map data. Second, we demonstrate efficient computation of StaR Maps to scale the approach to wide urban spaces. Through experiments on real-world, crowd-sourced data, we underpin the application and utility of StaR Maps in terms of representing uncertain knowledge and reasoning for complex geospatial information.
-
- 04.12.2024
- Foundations of ML: Computer Vision
- Applications in Autonomous Systems
BIM-based AI-supported LiDAR-Camera Pose Refinement
DOI: https://doi.org/10.48550/arXiv.2412.03434This paper introduces BIMCaP, a novel method to integrate mobile 3D sparse LiDAR data and camera measurements with pre-existing building information models (BIMs), enhancing fast and accurate indoor mapping with affordable sensors. BIMCaP refines sensor poses by leveraging a 3D BIM and employing a bundle adjustment technique to align real-world measurements with the model. Experiments using real-world open-access data show that BIMCaP achieves superior accuracy, reducing translational error by over 4 cm compared to current state-of-the-art methods. This advancement enhances the accuracy and cost-effectiveness of 3D mapping methodologies like SLAM. BIMCaP's improvements benefit various fields, including construction site management and emergency response, by providing up-to-date, aligned digital maps for better decision-making and productivity.