Machine Learning with Noisy Labels

Machine Learning with Noisy Labels PDF

Author: Gustavo Carneiro

Publisher: Elsevier

Published: 2024-03-01

Total Pages: 314

ISBN-13: 0443154422

DOWNLOAD EBOOK →

Most of the modern machine learning models, based on deep learning techniques, depend on carefully curated and cleanly labelled training sets to be reliably trained and deployed. However, the expensive labelling process involved in the acquisition of such training sets limits the number and size of datasets available to build new models, slowing down progress in the field. Alternatively, many poorly curated training sets containing noisy labels are readily available to be used to build new models. However, the successful exploration of such noisy-label training sets depends on the development of algorithms and models that are robust to these noisy labels. Machine learning and Noisy Labels: Definitions, Theory, Techniques and Solutions defines different types of label noise, introduces the theory behind the problem, presents the main techniques that enable the effective use of noisy-label training sets, and explains the most accurate methods developed in the field. This book is an ideal introduction to machine learning with noisy labels suitable for senior undergraduates, post graduate students, researchers and practitioners using, and researching into, machine learning methods. Shows how to design and reproduce regression, classification and segmentation models using large-scale noisy-label training sets Gives an understanding of the theory of, and motivation for, noisy-label learning Shows how to classify noisy-label learning methods into a set of core techniques

Machine Learning Methods with Noisy, Incomplete or Small Datasets

Machine Learning Methods with Noisy, Incomplete or Small Datasets PDF

Author: Jordi Solé-Casals

Publisher: MDPI

Published: 2021-08-17

Total Pages: 316

ISBN-13: 3036512888

DOWNLOAD EBOOK →

Over the past years, businesses have had to tackle the issues caused by numerous forces from political, technological and societal environment. The changes in the global market and increasing uncertainty require us to focus on disruptive innovations and to investigate this phenomenon from different perspectives. The benefits of innovations are related to lower costs, improved efficiency, reduced risk, and better response to the customers’ needs due to new products, services or processes. On the other hand, new business models expose various risks, such as cyber risks, operational risks, regulatory risks, and others. Therefore, we believe that the entrepreneurial behavior and global mindset of decision-makers significantly contribute to the development of innovations, which benefit by closing the prevailing gap between developed and developing countries. Thus, this Special Issue contributes to closing the research gap in the literature by providing a platform for a scientific debate on innovation, internationalization and entrepreneurship, which would facilitate improving the resilience of businesses to future disruptions. Order Your Print Copy

Learning from Imperfect Data: Noisy Labels, Truncation, and Coarsening

Learning from Imperfect Data: Noisy Labels, Truncation, and Coarsening PDF

Author: Vasilis Kontonis (Ph.D.)

Publisher:

Published: 2023

Total Pages: 0

ISBN-13:

DOWNLOAD EBOOK →

The datasets used in machine learning and statistics are \emph{huge} and often \emph{imperfect},\textit{e.g.}, they contain corrupted data, examples with wrong labels, or hidden biases. Most existing approaches (i) produce unreliable results when the datasets are corrupted, (ii) are computationally inefficient, or (iii) come without any theoretical/provable performance guarantees. In this thesis, we \emph{design learning algorithms} that are \textbf{computationally efficient} and at the same time \textbf{provably reliable}, even when used on imperfect datasets. We first focus on supervised learning settings with noisy labels. We present efficient and optimal learners under the semi-random noise models of Massart and Tsybakov -- where the true label of each example is flipped with probability at most 50\% -- and an efficient approximate learner under adversarial label noise -- where a small but arbitrary fraction of labels is flipped -- under structured feature distributions. Apart from classification, we extend our results to noisy label-ranking. In truncated statistics, the learner does not observe a representative set of samples from the whole population, but only truncated samples, \textit{i.e.}, samples from a potentially small subset of the support of the population distribution. We give the first efficient algorithms for learning Gaussian distributions with unknown truncation sets and initiate the study of non-parametric truncated statistics. Closely related to truncation is \emph{data coarsening}, where instead of observing the class of an example, the learner receives a set of potential classes, one of which is guaranteed to be the correct class. We initiate the theoretical study of the problem, and present the first efficient learning algorithms for learning from coarse data.

Machine Learning Methods with Noisy, Incomplete Or Small Datasets

Machine Learning Methods with Noisy, Incomplete Or Small Datasets PDF

Author: Jordi Solé-Casals

Publisher:

Published: 2021

Total Pages: 316

ISBN-13: 9783036512877

DOWNLOAD EBOOK →

In many machine learning applications, available datasets are sometimes incomplete, noisy or affected by artifacts. In supervised scenarios, it could happen that label information has low quality, which might include unbalanced training sets, noisy labels and other problems. Moreover, in practice, it is very common that available data samples are not enough to derive useful supervised or unsupervised classifiers. All these issues are commonly referred to as the low-quality data problem. This book collects novel contributions on machine learning methods for low-quality datasets, to contribute to the dissemination of new ideas to solve this challenging problem, and to provide clear examples of application in real scenarios.

Learning from Hierarchical and Noisy Labels

Learning from Hierarchical and Noisy Labels PDF

Author: Wenting Qi

Publisher:

Published: 2023

Total Pages: 0

ISBN-13:

DOWNLOAD EBOOK →

One branch of machine learning algorithms is supervised learning, where the label is crucial for the learning model. Numerous algorithms have been proposed for supervised learning with different classification tasks. However, fewer works question the quality of the training labels. Training a learning model with noisy labels leads to decreased or untruthful performance. On the other hand, hierarchical multi–label classification (HMC) is one of the most challenging problems in machine learning because the classes in HMC tasks are hierarchically structured, and data instances are associated with multiple labels residing in a path of the hierarchy. Treating hierarchical tasks as flat and ignoring the hierarchical relationship between labels can degrade the model’s performance. Therefore, in this thesis, we focus on learning from two types of difficult labels: noisy labels and hierarchical labels.

Test Collection Based Evaluation of Information Retrieval Systems

Test Collection Based Evaluation of Information Retrieval Systems PDF

Author: Mark Sanderson

Publisher: Now Publishers Inc

Published: 2010-06-03

Total Pages: 143

ISBN-13: 1601983603

DOWNLOAD EBOOK →

Use of test collections and evaluation measures to assess the effectiveness of information retrieval systems has its origins in work dating back to the early 1950s. Across the nearly 60 years since that work started, use of test collections is a de facto standard of evaluation. This monograph surveys the research conducted and explains the methods and measures devised for evaluation of retrieval systems, including a detailed look at the use of statistical significance testing in retrieval experimentation. This monograph reviews more recent examinations of the validity of the test collection approach and evaluation measures as well as outlining trends in current research exploiting query logs and live labs. At its core, the modern-day test collection is little different from the structures that the pioneering researchers in the 1950s and 1960s conceived of. This tutorial and review shows that despite its age, this long-standing evaluation method is still a highly valued tool for retrieval research.

On Boosting and Noisy Labels

On Boosting and Noisy Labels PDF

Author: Jeffrey D. Chan

Publisher:

Published: 2015

Total Pages: 56

ISBN-13:

DOWNLOAD EBOOK →

Boosting is a machine learning technique widely used across many disciplines. Boosting enables one to learn from labeled data in order to predict the labels of unlabeled data. A central property of boosting instrumental to its popularity is its resistance to overfitting. Previous experiments provide a margin-based explanation for this resistance to overfitting. In this thesis, the main finding is that boosting's resistance to overfitting can be understood in terms of how it handles noisy (mislabeled) points. Confirming experimental evidence emerged from experiments using the Wisconsin Diagnostic Breast Cancer(WDBC) dataset commonly used in machine learning experiments. A majority vote ensemble filter identified on average that 2.5% of the points in the dataset as noisy. The experiments chiefly investigated boosting's treatment of noisy points from a volume-based perspective. While the cell volume surrounding noisy points did not show a significant difference from other points, the decision volume surrounding noisy points was two to three times less than that of non-noisy points. Additional findings showed that decision volume not only provides insight into boosting's resistance to overfitting in the context of noisy points, but also serves as a suitable metric for identifying which points in a dataset are likely to be mislabeled.

Advances in Data and Information Sciences

Advances in Data and Information Sciences PDF

Author: Mohan L. Kolhe

Publisher: Springer Nature

Published: 2020-01-02

Total Pages: 679

ISBN-13: 9811506949

DOWNLOAD EBOOK →

This book gathers a collection of high-quality peer-reviewed research papers presented at the 2nd International Conference on Data and Information Sciences (ICDIS 2019), held at Raja Balwant Singh Engineering Technical Campus, Agra, India, on March 29–30, 2019. In chapters written by leading researchers, developers, and practitioner from academia and industry, it covers virtually all aspects of computational sciences and information security, including central topics like artificial intelligence, cloud computing, and big data. Highlighting the latest developments and technical solutions, it will show readers from the computer industry how to capitalize on key advances in next-generation computer and communication technology.

Artificial Neural Networks and Machine Learning – ICANN 2022

Artificial Neural Networks and Machine Learning – ICANN 2022 PDF

Author: Elias Pimenidis

Publisher: Springer Nature

Published: 2022-09-06

Total Pages: 784

ISBN-13: 3031159195

DOWNLOAD EBOOK →

The 4-volumes set of LNCS 13529, 13530, 13531, and 13532 constitutes the proceedings of the 31st International Conference on Artificial Neural Networks, ICANN 2022, held in Bristol, UK, in September 2022. The total of 255 full papers presented in these proceedings was carefully reviewed and selected from 561 submissions. ICANN 2022 is a dual-track conference featuring tracks in brain inspired computing and machine learning and artificial neural networks, with strong cross-disciplinary interactions and applications. Chapter “Sim-to-Real Neural Learning with Domain Randomisation for Humanoid Robot Grasping ” is available open access under a Creative Commons Attribution 4.0 International License via link.springer.com.

Machine Learning and Knowledge Discovery in Databases: Research Track

Machine Learning and Knowledge Discovery in Databases: Research Track PDF

Author: Danai Koutra

Publisher: Springer Nature

Published: 2023-09-16

Total Pages: 758

ISBN-13: 3031434153

DOWNLOAD EBOOK →

The multi-volume set LNAI 14169 until 14175 constitutes the refereed proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2023, which took place in Turin, Italy, in September 2023. The 196 papers were selected from the 829 submissions for the Research Track, and 58 papers were selected from the 239 submissions for the Applied Data Science Track. The volumes are organized in topical sections as follows: Part I: Active Learning; Adversarial Machine Learning; Anomaly Detection; Applications; Bayesian Methods; Causality; Clustering. Part II: ​Computer Vision; Deep Learning; Fairness; Federated Learning; Few-shot learning; Generative Models; Graph Contrastive Learning. Part III: ​Graph Neural Networks; Graphs; Interpretability; Knowledge Graphs; Large-scale Learning. Part IV: ​Natural Language Processing; Neuro/Symbolic Learning; Optimization; Recommender Systems; Reinforcement Learning; Representation Learning. Part V: ​Robustness; Time Series; Transfer and Multitask Learning. Part VI: ​Applied Machine Learning; Computational Social Sciences; Finance; Hardware and Systems; Healthcare & Bioinformatics; Human-Computer Interaction; Recommendation and Information Retrieval. ​Part VII: Sustainability, Climate, and Environment.- Transportation & Urban Planning.- Demo.