Gold Nanoparticles

FragNet: A Graph Neural Network for Molecular Property Prediction with Four Layers of Interpretability

FragNet is an interpretable Graph Neural Network model. FragNet allows for the identification of key atoms, bonds, molecular fragments, and the connections between these fragments that are essential for predicting a specific molecular property. It is especially valuable for understanding the significance of connections between fragments in molecules with substructures that do not rely on traditional covalent bonds. Fragnet can also quantify the contribution of fragments the property prediction, allowing the identification of fragments that may improve or degrade a property value.
Paper: https://arxiv.org/pdf/2410.12156
Code: https://github.com/pnnl/FragNet

Impact of Molecular Representations on Deep Learning Model Comparisons in Drug Response Predictions

Deep learning (DL) is key to predicting drug responses in cancer, but its effectiveness is often limited by inconsistent benchmarks and data sources. To address this, we introduce the CoMParison workflow for Cross Validation (CMP-CV), an automated framework for training models with user-defined parameters and metrics. We benchmark several drug representations—graphs, molecular descriptors, fingerprints, and SMILES—to evaluate their predictive performance. Our results show that molecular descriptors and Morgan fingerprints perform slightly better overall, though performance varies across different regions of the descriptor space. This underscores the importance of domain-specific model comparisons. Our work is part of the CANcer Distributed Learning Environment (CANDLE), advancing model comparison for more effective drug response prediction.
Paper: 10.26434/chemrxiv-2024-g95j5-v2
Code: Models HPC workflow

Outlier-Based Domain of Applicability Identification for Materials Property Prediction Models

Machine learning models are widely used for predicting material properties, but their practical application is limited by uncertainty in their performance on unseen materials. Since predictions depend on the quality of training data, different regions of the material feature space are predicted with varying accuracy. Identifying these regions allows for assessing the confidence level of predictions, determining when and how to use the model based on accuracy requirements, and improving performance in error-prone domains. In this work, we propose a method to identify applicable domains within a large feature space and introduce analysis techniques to better understand these domains and their subdomains.
Paper: https://arxiv.org/pdf/2302.06454
Code: https://github.com/pnnl/doa

Evaluation of deep learning architectures for aqueous solubility prediction

Accurately predicting aqueous solubility is critical in pharmaceuticals, environmental science, and energy storage. Despite extensive research, achieving reliable solubility predictions remains challenging. This study evaluates deep learning methods for solubility prediction, develops a versatile model for organic molecules, and examines the impact of data, molecular representations, and model architectures on performance. Using the largest available solubility dataset, we explore molecular descriptors, SMILES, molecular graphs, and 3D atomic coordinates with four neural network architectures: fully connected networks, recurrent networks, GNNs, and SchNet. Our results show that molecular descriptors perform best, with GNNs also yielding strong results. We also analyze error patterns, identify key molecular features for prediction, and assess how data availability affects model performance through transfer learning and dataset size studies.
Paper: https://pubs.acs.org/doi/full/10.1021/acsomega.2c00642
Code: Solubility Prediction Models