Abstract:
This tutorial aims to conceptually connect different neural network architectures to provide a deeper understanding of advancements in the field. To achieve this, we will use standard mathematical tools as our guide. We will begin with standard feed-forward neural networks, examining them through the lens of projection matrices, and then move on to Recurrent Neural Networks (RNNs). Next, we will explore the Attention Mechanism and conclude with the Transformer architecture. Throughout the discussion, we will demonstrate how understanding the key building blocks of these architectures can help decipher their functionality and provide insights into their respective training and implementation challenges. This talk is designed for a broad audience with some prior exposure to machine learning concepts.