Self-supervised and in-context learning techniques for automated optical inspection

Figueira, Joaquín

Self-supervised and in-context learning techniques for automated optical inspection

dc.contributor.author

Figueira, Joaquín

dc.date.accessioned

2025-11-05T20:29:06Z

dc.date.available

2025-11-05T20:29:06Z

dc.date.issued

2025-11-04T18:03:09Z

dc.date.issued

2025-11-04T18:03:09Z

dc.date.issued

2025

dc.identifier

http://hdl.handle.net/10230/71773

dc.identifier.uri

http://hdl.handle.net/10230/71773

dc.description.abstract

Treball fi de màster de: Erasmus Mundus joint Master in Artificial Intelligence (EMAI)

dc.description.abstract

Supervisora: Lejla Batina Co-Supervisor: Faysal Boughorbel

dc.description.abstract

Automated Optical Inspection (AOI) is a family of techniques used to find defects and anomalies in electronic devices from high-quality photographs of different regions of an integrated component and its packaging. Current methods use computer vision models and image preprocessing pipelines specific to each chip design and manufacturer. As a result, the current deep learning approach for AOI requires a long retraining process whenever new devices are introduced or significant covariate shifts occur in the input image distribution. In this work, we adapt and evaluate different pre-training techniques (DINO, iBOT, and MAE) for small vision transformers (ViT and FasterViT) to streamline the design process of AOI semantic segmentation models and shorten the training time needed to adapt the models to new input conditions. We use a custom, relatively small dataset for model pre-training with only 7000 unlabeled images, showing how the pre-training strategies perform well in small data regimes. Furthermore, we introduce a set of retrieval-based scene understanding techniques to solve the task of semantic segmentation of wire-bonded devices with virtually no training time in labeled data. Our results demonstrate how our custom pre-trained encoders and retrieval strategies outperform comparable convolutional architectures pre-trained using full supervision in semantic segmentation, both in speed and quality, when training time is constrained. Moreover, we show how our proposed image retrieval strategies generalize to existing ViT models pretrained on different datasets, and how the techniques can be used to predict images of a single device and produce high-quality segmentation masks using a relatively small number of labeled training images. Finally, we show how the retrieval strategies outperform fine-tuned, convolutional encoder-decoder models in the context of out-of-distribution, unseen images.

dc.format

application/pdf

dc.language

eng

dc.rights

Llicència CC Reconeixement-NoComercial-SenseObraDerivada 4.0 Internacional (CC BY-NC-ND 4.0)

dc.rights

https://creativecommons.org/licenses/by-nc-nd/4.0/

dc.rights

info:eu-repo/semantics/openAccess

dc.subject

Aprenentatge

dc.title

Self-supervised and in-context learning techniques for automated optical inspection

dc.type

info:eu-repo/semantics/masterThesis

Fitxers en aquest element

Fitxers	Grandària	Format	Visualització
No hi ha fitxers associats a aquest element.

Aquest element apareix en la col·lecció o col·leccions següent(s)

Treballs d'estudiants [4945]