A-I 570: Deep Learning

Overview

A-I 507 Deep Learning is a three-credit class. Dive deep into the fascinating realm of artificial intelligence with our specialized Deep Learning course, designed to propel students into the forefront of AI technology and innovation. This course is meticulously structured to provide both foundational knowledge and advanced skills in neural network architectures and algorithm implementation. Through hands-on, practical experiences, you’ll be expertly prepared for real-world applications, making this an essential step in your educational journey and career development in AI.

Prerequisites:

Students must be in the AI_MPS or DAAN_MPS majors. SWENG_MSE students may enroll with division approval.
Required – STAT 500 or equivalent.
Recommended – DAAN 862
Students should have preliminary programming skills in Python.

Syllabus

Overview

Throughout this course, you will learn to design neural network architectures and training procedures via hands-on assignments. You will also have the opportunity to prove your skills by building small projects in cutting-edge Deep Learning libraries. The course will also emphasize hands-on experience and assignments to implement algorithms.

Course Objectives

After successfully completing this course, you will be able to demonstrate:

Knowledge of deep learning algorithms and techniques
The ability to identify suitable learning tasks to which these techniques can be applied
Understanding of some of the current limitations of deep learning techniques
Formulation of deep learning problems, setting up and running computational experiments and evaluation of results, reuse of common deep learning libraries and packages.

Course Materials

Python for Data Analysis (Optional): Free E-Book available for Penn State students through O'Reilly Learning. Includes instructions to access the book with a Penn State email. ISBN: 9781491957660.

Dive into Deep Learning (Recommended)
- A digital version of this text is available on the web at https://d2l.ai.
Python for Data Analysis (Optional)
- Free E-Book: An online version Python for Data Analysis (2017) is available at no cost to you as a Penn State student. You can access the E-Book through O’Reilly LearningLinks to an external site.. (Use your Penn State email to access the book.) (ISBN: 9781491957660)

Required Software

Open-Source Software: Anaconda with Python and Jupyter Notebooks.

(*) Instructions for accessing this software will be provided in the lessons.

Optional Software

Penn State Provided Software:

All Penn State students have free access to Microsoft Office 365. This includes access to Word, Outlook, Excel, PowerPoint, Sway, Teams, etc.

You also have a free Adobe Creative Cloud account with access to products such as Photoshop, InDesign, Spark, Acrobat, Illustrator, and more! Please visit Penn State Adobe Support for more information,

Proctored Exams

Proctored Exams – None.

Grading and Examinations

A grade is given solely on the basis of the instructor’s judgment as to the student’s scholarly attainment (see the Penn State Graduate Degree Programs Bulletin, p. 41). The following grading system applies to graduate students:

“A” (Excellent) indicates exceptional achievement.
“B” (Good) indicates substantial achievement.
“C” (Satisfactory) indicates acceptable but substandard achievement.
“D” (Poor) indicates inadequate achievement and is a failing grade for a graduate student.

Assignment Details

Students will be evaluated on their understanding of the course material by completing assignments that demonstrate their ability to apply material contained in the lessons and activities.

Students will be expected to complete assignments, Scientific Reading, Project Proposal, Presentation, and Deliverables, as well as Discussion Forums. Final grades will be calculated as follows:

Assignment	Points	Quantity	% of Final Grade
Individual Assignments	100	3	30%
Literature Review	100	1	10%
Project Proposal	100	1	10%
Project Presentation	100	1	10%
Project Deliverables	100	1	30%
Discussion Forums	100	multiple	10%

Grades will be based on the following scale:

A = 94 – 100, A- = 90 – 93, B+ = 87 – 89, B = 84 – 86, B- = 80 – 83, C+ = 77 – 79, C = 70 – 76, D = 60 – 69, and F = 60 and below.

Homework Assignments (Individual)

Homework assignments will be given periodically. Due dates are noted in the course schedule. Doing the homework promptly and carefully is necessary for learning the material. Collaboration with fellow students is allowed and encouraged on homework. However, each student must turn in their own written work which reflects their own understanding of the material. You will submit your work via Turnitin, which is an originality service detector, empowering you to do your best, original work.

Practice Exercises: There are also several ungraded practice exercises throughout the course. Although these are ungraded, it is important for you to complete these guided practice exercises. If you need additional support on any of these exercises, please contact your instructor.

Group Discussions

Discussions are provided as an opportunity to connect with your peers about the lesson content, to share ideas, questions, and resources. You will be evaluated on the quality of your initial post as well as your replies and responses to the posts of your peers.

Literature Review

You will be assigned a scientific paper on various topics related to Deep Learning and you will write a review in your own words. You are expected to work independently on the paper and use whatever material that you have at your disposal. Scientific Reading & Review is worth 10% of your grade. You will submit your paper via Turnitin.

Group Project

The project is worth 50% of your total grade. (10% for Project Proposal, 30% for deliverables, and 10% for presentation). This is a team project and each team should have at least two members. Your team will select a challenging and original problem to solve in your project. At the end of the semester, you will upload the final report (10+ pages excluding the title page). You will be provided with a template and instructions on what points need to be covered. A list of different data sources will also be shared with the class. You may download the data from this source or you are free to work on your own data set if you have any (related to your companies/workplace etc.). When completing group work, your team may choose any method of collaboration or communication that is most effective (Google Docs, Microsoft Teams, Zoom, etc.) If you need any assistance using these technologies, contact Penn State IT Support. You will present your group project by creating a video using Zoom.

Course Topics

Introduction to Deep Learning
Neural Networks Basics
Shallow Neural Networks
Deep Neural Networks
Optimization
Improving Deep Neural Networks
Convolutional Neural Networks (CNNs)
Advanced CNNs
Computer Vision & Advanced Computer Vision
Generative Adversarial Networks (GANs)
Recurrent Neural Networks (RNNs)
Advanced RNNs
Transformers

*subject to change

Sample Lesson

Linear Regression
Neural Networks
Block Implementation

Regression

Regression is a type of supervised learning. The term regression refers to a set of problems that aims to explain the relationship between one outcome variable (called dependent variables) and one or more independent variables (called predictor variables). For example, the house price prediction is a regression problem in which the house purchase price (dependent variable) can be estimated from predictors or variables such as the number of bedrooms, the lot size, and location etc. The overall idea of regression is to examine which variables in particular are significant predictors of the outcome variable.

Neural networks can be seen as natural extension of linear regression.

The characteristics of the biological neuron:

Thousands of synapses
Active dendrites
Cell recognizes 100’s of unique patterns
Learn by growing new synapses

Recall the comparison we made in Lesson 1 between the biological neuron and the artificial neuron.

Implementing Blocks

From an implementation perspective, a block is a Python class, which inherits from the tf.keras.Model that presents a block in Keras. A customized block class has the constructor function__init__(self) to perform the necessary initialization and the call()to define the forward propagation of the block, that is, how to obtain the required output based on the input.

The block class must thus define the forward propagation function call()and must store any required weights and biases. It is worth noting that the block does not need to define the backpropagation function to compute the gradients. In fact, the block class will inherit the auto differentiation function from its parent (tf.keras.Model).

You only need to define the code for the forward propagation function and manage the weights and biases (model parameters) if required by your customized block.

The following snippet implements a MLP model with one fully-connected hidden layer with 256 units and ReLU activation, followed by a fully-connected output layer with 10 neuron units and no activation function.

#TensorFlow Sample Block Code

import tensorflow as tf

net = tf.keras.models.Sequential([
    tf.keras.layers.Dense(256, activation=tf.nn.relu),
    tf.keras.layers.Dense(10),
])

X = tf.random.uniform((2, 20))
net(X)

Learning Outcomes

Upon successfully completing this course, you will be equipped with:

Knowledge of deep learning algorithms and techniques
The ability to identify suitable learning tasks to which these techniques can be applied
Understanding of some of the current limitations of deep learning techniques
Formulation of deep learning problems, setting up and running computational experiments and evaluation of results, reuse of common deep learning libraries and packages.

During this course you will expand your knowledge of the lesson topics each week.

L1-L2
L3-L4
L5-L6
L7-L8
L9-L10
L11-L12
L13-L14

Lesson 1: Introduction to Deep Learning

We begin this lesson with an exploration of the fascinating world of Artificial Intelligence, focusing on how Deep Learning is revolutionizing industries. To set a strong foundation, we will cover essential math concepts like Linear Algebra, Calculus, and Probability, all of which are critical to understanding the mechanics of machine learning. Finally, we’ll dive into Python programming and set up Jupyter Notebook to prepare for hands-on coding throughout the course.

Interactive Lecture: Engage in a guided introduction to Artificial Intelligence and Deep Learning, exploring real-world applications and current trends in the field.
Define Machine Learning components: such as Data, Models, Objective Functions, and Optimization Algorithms

Python Environment Setup: Follow step-by-step instructions to install Python and configure Jupyter Notebook for future coding projects.
Discussion Activity: Collaborate with peers to discuss how foundational math concepts connect to practical applications in machine learning and deep learning.

By the end of this lesson, you will have a clear understanding of big data’s role in modern society and the ethical principles that guide responsible data collection practices.

Lesson 2: Neural Networks Basics

In this lesson, we’ll dive into the foundational steps involved in constructing a basic neural network. You’ll gain an understanding of key processes such as data preprocessing, defining the architecture of a neural network, specifying loss functions, training the network, and evaluating its performance. These building blocks will set the stage for implementing more advanced techniques later in the course

Concept Walkthrough: Break down each step of building a neural network, from data preprocessing to performance evaluation, with visual aids and examples.
Data Preprocessing Task: Practice cleaning, normalizing, and splitting data into training and testing sets.

Training and Evaluation Hands-On: Train your neural network using sample data and evaluate its accuracy with performance metrics.
Peer Review Session: Share your neural network model with classmates and receive feedback on your design and implementation.

By the end of this lesson, you will have a foundational understanding of how neural networks are constructed and trained, from data preprocessing to performance evaluation. You’ll be equipped to define key concepts such as linear regression, explain the components of a regression model, and understand the role of the loss function and gradient descent in optimizing neural networks.

Lesson 3: Shallow Neural Networks

In this lesson, we introduce Multilayer Perceptrons (MLPs), one of the foundational architectures in artificial neural networks. You’ll learn how MLPs are structured with layers of neurons and connections, and how they are trained to learn from examples. We’ll implement MLP networks from scratch, gaining hands-on experience while exploring key challenges such as optimizing architecture and improving training efficiency.

Introduction to MLPs: Learn the structure of Multilayer Perceptrons, including layers, neurons, and connections, and understand their role in neural network architectures.
Ethics in Data Collection: Explore the ethical implications of data collection, including privacy, consent, and transparency.

Regularization Techniques: Implement weight decay and dropout to improve model generalization and performance on unseen data.
Numerical Stability: Understand the importance of proper parameter initialization to avoid problems such as vanishing or exploding gradients during training.

By the end of this lesson, you’ll have a clear understanding of how MLPs work, how to analyze their behavior during training, and how to address common challenges to create better-performing models. This lesson provides practical Python implementations to reinforce your learning and build confidence in applying these techniques.

Lesson 4: Deep Neural Architectures

This lesson focuses on the advanced aspects of constructing deep learning models using platforms like TensorFlow and PyTorch. Building on the foundations established in previous lessons, we delve deeper into customizing layers and blocks to create powerful, flexible neural network architectures. You’ll also learn about weights and biases initialization and how these choices impact model performance.

Custom Layer and Block Design: Learn to create and stack custom layers and blocks to build complex, flexible neural network architectures.
Weights and Biases Initialization: Explore advanced initialization techniques and their impact on training stability and model performance.

Weights and Biases Initialization: Experiment with different initialization strategies and evaluate their effects on model convergence and performance.
Code-Along Workshop: Follow guided examples to construct, train, save, and deploy a deep learning model with advanced features.

By the end of this lesson, you’ll move beyond being an end-user of AI frameworks to a power developer, equipped with the skills to design and implement complex models tailored to your specific needs.

Lesson 5: Optimization

This lesson focuses on the vital process of machine learning optimization and its role in creating accurate and high-performing models. We will begin by clarifying the distinction between model parameters, which are determined during training, and hyperparameters, which are set before training and define the structure of your model. Understanding this distinction is essential to mastering the optimization process.

Understanding Parameters and Hyperparameters: Participate in an interactive session to identify and differentiate model parameters (e.g., weights and biases) and hyperparameters (e.g., learning rate, epochs, and optimizers).
Hyperparameters vs. Parameters: Learn the distinction and why hyperparameter tuning is essential for model performance.

Learning Rate Adjustment Workshop: Observe how varying learning rates impact model convergence and performance.
Discussion Session: Share insights and challenges encountered during hyperparameter tuning with peers to deepen your understanding.

By the end of this lesson, you’ll have a clear understanding of optimization concepts and the knowledge to apply them effectively, enabling you to elevate your models’ performance with precision.

Lesson 6: Improving Deep Neural Networks

In this lesson, we focus on strategies to improve the performance and efficiency of deep neural networks. As models become more complex, ensuring they train effectively while maintaining high accuracy requires careful optimization and fine-tuning. You’ll learn techniques to address common challenges like overfitting, vanishing gradients, and slow convergence, enabling your models to perform better on unseen data.

Batch Normalization: Accelerate training and improve stability by normalizing activations within the network.
Regularization: Learn how techniques like dropout and L2 regularization help prevent overfitting.

Optimizer Comparison: Compare the performance of SGD, Adam, and RMSprop optimizers on the same model to see their effects.
Fine-Tuning Pretrained Models: Use transfer learning to fine-tune a pretrained model for a specific dataset or task.

By the end of this lesson, you will have a clear understanding of advanced strategies to enhance the performance and effectiveness of deep neural networks. You’ll be equipped to identify and apply techniques that optimize training and improve model outcomes in machine learning applications.

Lesson 7: Convolutional Neural Networks (CNNs)

This lesson introduces Convolutional Neural Networks (CNNs), a powerful and efficient type of neural network designed to handle high-dimensional data, such as images, while avoiding the pitfalls of overfitting and high parameter counts. We’ll explore how CNNs exploit spatial structure in data, making them ideal for tasks like image recognition, object detection, and beyond.

Efficiency of CNNs: Learn why CNNs require fewer parameters and are faster to train compared to MLPs for structured data.
Reducing Overfitting: Discover how CNNs mitigate overfitting by exploiting data structure and using fewer parameters.

Hyperparameter Tuning: Experiment with different architectures and learn how hyperparameter choices impact CNN performance.
Building a CNN: Create a basic convolutional neural network to classify images in a dataset.

By the end of this lesson, you’ll understand why CNNs have become a go-to technique for analyzing structured data and how they achieve remarkable results in fields ranging from computer vision to natural language processing.

Lesson 8: Advanced CNNs

We will start the lesson by building on your understanding of Convolutional Neural Networks (CNNs), this lesson delves into advanced deep neural network architectures that have revolutionized computer vision and other fields. You’ll explore landmark models such as AlexNet, VGG, Network in Network (NiN), GoogLeNet, and DenseNet. These networks demonstrate how innovative designs enhance performance and efficiency, offering solutions to complex AI challenges.

AlexNet: Understand how AlexNet introduced deeper architectures and advanced training techniques for improved image recognition.
DenseNet: Study the principles of densely connected layers to reduce redundant parameters and improve gradient flow.

Case Study: Analyze real-world applications of these architectures in fields like image recognition, medical imaging, and autonomous vehicles.
Hands-On Design: Use a programming framework to implement a simplified version of one of the deep networks discussed in the lesson.

By the end of this lesson, you will have a comprehensive understanding of advanced deep neural network architectures, including AlexNet, VGG, NiN, GoogLeNet, and DenseNet. You’ll be equipped with the knowledge to design and develop these networks, applying their innovative principles to solve complex AI challenges.

Lesson 9: Computer Vision

In this lesson, you will dive into the exciting field of computer vision, exploring advanced techniques to enhance image classification and object detection systems. You will learn to apply methods like image augmentation and fine-tuning to improve model generalization and tackle real-world challenges. Additionally, you’ll gain hands-on experience with object detection concepts such as bounding boxes, anchor boxes, and multiscale detection, and implement cutting-edge algorithms like R-CNN, SSD, and YOLO. By the end, you will be equipped with the skills to design and develop sophisticated computer vision systems for various applications.

Image Classification Improvement: Learn advanced techniques such as image augmentation and fine-tuning to improve image classifier performance.
Semantic Segmentation and Style Transfer: Apply segmentation to identify areas of interest in images and use style transfer to generate creative visual outputs.

Bounding Box Exercises: Experiment with bounding boxes, anchor boxes, and multiscale detection to understand their functionality.
Algorithm Implementation: Write code to apply R-CNN, SSD, or YOLO for object detection tasks.

By the end of this lesson, you’ll be equipped with the knowledge and skills to design and implement sophisticated computer vision systems, paving the way for impactful applications in various fields.

Lesson 10: Generative Adversarial Networks (GANs)

In this lesson, you will be introduced to Generative Adversarial Networks (GANs), an innovative approach to generative modeling that has revolutionized fields like image, video, and voice generation. GANs consist of two neural networks—a Generator and a Discriminator—that compete with each other to produce new, realistic content. You’ll explore the architecture of GANs, their training process, and their application in creating data that mimics real-world examples.

GAN Architecture: Learn the structure and interaction of the Generator and Discriminator networks in a GAN.
Generative Models: Understand the concept of generative modeling and how GANs produce realistic data.

Hands-On Implementation: Code a simple GAN model, implementing both Generator and Discriminator networks.
Team Project Development: Collaborate with your peers to incorporate GANs into your ongoing team project.

Lesson 11: Recurrent Neural Networks (RNNs)

In this lesson we will introduce you to sequence models, an essential area of supervised learning that handles data in sequential form. Sequence models are widely used in applications such as financial time series prediction, speech recognition, DNA sequencing, music generation, and sentiment classification. At the core of these models are Recurrent Neural Networks (RNNs), which excel in processing sequential data by leveraging their “memory” to analyze input sequences effectively.

Sequence Modeling: Understand the basics of sequence models and their applications in supervised learning.
Text Preprocessing: Discover the steps for preparing textual data, including tokenization and encoding for input into sequence models.

Markov Model Analysis: Study the Markov process and its application to sequential data problems.
RNN Implementation: Build a Recurrent Neural Network in TensorFlow and Keras to tackle a sample sequential data task, such as sentiment classification.

By the end of this lesson, you’ll be well-prepared to apply RNNs to solve real-world problems involving sequential data, setting a strong foundation for advanced machine learning tasks.

Lesson 12: Advanced RNNs

In this lesson we will build on your knowledge of standard RNNs, this lesson focuses on advanced sequence models designed to overcome the limitations of simple RNNs, such as the vanishing gradient problem. You’ll explore Gated Recurrent Units (GRUs) and Long Short-Term Memory (LSTM) networks, which use specialized gates to manage long-term dependencies in sequential data. These architectures are critical for solving complex sequence learning problems across various domains.

Gated Recurrent Units (GRUs): Understand how GRUs use update and reset gates to handle long-term dependencies in sequence data.
Deep and Bidirectional RNNs: Explore the benefits of stacking multiple RNN layers and using bidirectional computations for improved performance.

Deep and Bidirectional RNNs: Implement deep and bidirectional RNNs to enhance sequential data processing.
Encoder-Decoder Application: Build an encoder-decoder model using RNNs and apply it to a sequence-to-sequence dataset.

Lesson 13: Transformers

In this lesson we will introduce you to transformers, a groundbreaking neural network architecture that has redefined sequence-to-sequence (Seq2Seq) models. We begin with an exploration of Seq2Seq models built with recurrent neural networks, such as LSTMs, and their application in tasks like machine translation. From there, we’ll delve into the attention mechanism, a powerful tool that enables models to focus on the most relevant parts of input sequences, significantly improving Seq2Seq performance.

Seq2Seq Models: Learn the fundamentals of sequence-to-sequence models and their use in machine translation and other tasks.
Transformers: Explore the architecture and functionality of transformers, including the concept of self-attention.

Pretrained Model Hands-On: Leverage publicly available transformer models, such as BERT or GPT, to develop a high-quality machine translation system
Team Project Integration: Apply transformers or attention mechanisms to your ongoing team project to enhance its capabilities.

By the end of this lesson, you will have a thorough understanding of sequence-to-sequence models, the role of attention mechanisms in enhancing model performance, and the transformative power of transformer architectures. You will be equipped to implement attention-based Seq2Seq models and utilize pre-trained transformer models to build high-quality machine translation systems, preparing you to excel in advanced sequence modeling tasks.

Lesson 14: Project Presentation and Evaluation

Congratulations on reaching the final week of the course! This lesson focuses on the culmination of your efforts through project delivery and oral presentations. Each team will prepare a professional presentation to showcase the results of their project, recording it on Zoom and sharing it with the class for review and feedback. This is an opportunity to demonstrate your work, reflect on your progress, and engage with peers to discuss the outcomes of your projects.

Professional Presentation Preparation: Develop and deliver a polished project presentation using PowerPoint and Zoom.
Reflection and Growth: Evaluate your team’s performance and reflect on the challenges and successes of the project.

Project Submission: Submit your final project deliverables, ensuring all results are reproducible and clearly documented.
Peer Evaluation: Review and evaluate another team’s presentation, providing three points of strength and three areas for improvement, along with any questions or insights.

Unlocking Your Potential

Career Impact
Real World Example

Earning a Master’s in Artificial Intelligence, particularly with specialized coursework like A-I 570: Deep Learning, opens the door to a wide range of impactful and high-demand career opportunities. Deep learning is at the forefront of advancements in AI, powering innovations across industries such as healthcare, finance, technology, transportation, and entertainment. With expertise in building and training neural networks, mastering natural language processing, and applying computer vision techniques, graduates are well-prepared to tackle complex challenges and drive meaningful change in both established organizations and cutting-edge startups.

The career impact of this degree extends beyond technical roles, as professionals with deep learning expertise often influence strategic decision-making and innovation. Graduates can pursue roles such as Machine Learning Engineer, Data Scientist, AI Research Scientist, or Deep Learning Specialist, where they design systems that automate processes, predict trends, and solve critical problems. As AI continues to evolve and expand, professionals equipped with deep learning knowledge are not only poised for rewarding careers but also positioned to lead the transformation of industries worldwide.

One real-world example of a deep learning project addressing climate change challenges can be seen in the field of extreme weather prediction. A government agency wanted to implement a system to analyze vast amounts of climate data to predict hurricanes, droughts, and heatwaves more accurately. However, the challenge lay in the complex nature of climate data, which spans across time and space, and in the lack of tools capable of integrating this data effectively. The initial attempts used traditional statistical models, which failed to capture the intricate patterns and interactions in the data. This resulted in inaccurate predictions, leading to insufficient preparedness for extreme weather events and significant economic and societal costs.

To overcome these limitations, a team of deep learning researchers and data scientists was brought in to develop a robust solution. They used advanced time series forecasting and spatiotemporal data analysis techniques to train neural networks capable of understanding the complex relationships in the data. The team leveraged convolutional and recurrent neural networks to process spatial and temporal aspects of the climate data and incorporated visualization tools to make predictions accessible to policymakers. By applying these models, they created a system that could accurately predict weather patterns, giving communities and governments weeks of advance warning.

This implementation not only improved disaster preparedness but also provided insights into long-term climate trends, enabling better resource allocation and planning. The success of this project highlighted the transformative potential of deep learning in addressing pressing global issues like climate change.