Home | Abhishek Naik's Website

About Me

I just finished my Ph.D. at the University of Alberta with Richard Sutton, in which I developed simple and practical algorithms from first principles for long-lived artificial decision-making systems.

In particular, I developed algorithms within the reinforcement-learning framework for continuing (non-episodic) problems—in which the agent-environment interaction goes on ad infinitum—with the goal of maximizing the average reward obtained per step. Empirically, the algorithms are easy to implement and use.

I love space! And I want to use my AI expertise in space sciences and technology. I envision a future where artificial systems will have human-like intelligence and adaptability, making space exploration significantly easier and safer for our species. To this end, I am currently working at the National Research Council of Canada (NRC) as a postdoc fellow, where I do RL research for improving space science and technology 🚀 🛰️

You can find my resume here (last updated: May 2024).

Some updates

(Sep 2024) Started RL research for space applications at NRC!
(Mar 2024) Defended my Ph.D. dissertation! :D
(Feb 2024) Presented the Amii AI seminar to an audience of 100+ people [link to recording]
(Nov 2023) Khurram Javed and I won the natHACKS hackathon! [Demo]
(Aug 2023) Oral spotlight talk at the Science of Intelligence institute’s summer school at Berlin
(May 2023) Started working with AlbertaSat, which is UofA’s student group that designs, builds, and operates nano satellites!
(April 2023) Oral presentation of my internship work on RL in recommender systems at a workshop in WWW’23 at Austin
(Jun 2022) Started an internship at Google Brain
(Apr 2022) Passed my Ph.D. candidacy exam!
(Mar 2022) Paper accepted at RLDM 2022
(Dec 2021) Presented a lecture on ‘The Essentials of RL’ at the 3rd Nepal AI Winter School
(Sep 2021) Paper accepted at NeurIPS 2021
(July 2021) Co-hosted a ICML Social on Continuing Problems in RL
(May 2021) Paper accepted at ICML 2021
(May 2021) Presented two posters at NERL 2021 (one submitted, one invited)
(Apr 2021) Paper accepted in the Journal of AI Research (JAIR)
(Jan 2021) Started TA-ing for Rich Sutton’s CMPUT609 RL-2 course
(Dec 2020) Helped organize the Policy Optimization in RL tutorial at NeurIPS 2020. We made some cool interactive notebooks; links on the website!
(Oct 2020) Presented our work on ‘Personalized Brain State Targeting via Reinforcement Learning’ at the 3rd Neuromatch conference (more Q/A at the 9:58:41 mark)

Ph.D. Dissertation

This dissertation develops simple and practical learning algorithms from first principles for long-lived agents. Formally, the algorithms are developed within the reinforcement learning framework for continuing (non-episodic) problems, in which the agent-environment interaction goes on ad infinitum, with the goal of maximizing the average reward obtained per step.

There are three main contributions:

Foundational one-step tabular learning algorithms for average-reward prediction and control.
Multi-step prediction algorithms for average-reward prediction, some of which are proved to converge with linear function approximation.
Reward centering to improve discounted-reward algorithms.

All of the above contributions are grounded in theory. My experiments show that the performance of the proposed algorithms is robust to the choice of their parameters—making them easy to use.

Defended in March 2024. [Dissertation PDF, Defense Slides, Defense Seminar video]

Publications and Pre-prints

During my Ph.D., I focused on algorithms that can learn continually throughout an agent’s lifetime. In particular, I designed algorithms for non-episodic problems such that an agent can learn to achieve its goals from a single stream of experience (without resets or timeouts).

Some more interesting work on the way… ;)
Reward Centering [PDF]

Abhishek Naik, Yi Wan, Manan Tomar, Richard S. Sutton
In Reinforcement Learning Conference (RLC), 2024.
Investigating Action-space Generalization in RL for Recommender Systems [PDF]

Abhishek Naik, Bo Chang, Alexandros Karatzoglou, Martin Mladenov, Ed H. Chi, Minmin Chen 
Oral presentation at the Decision Making for RecSys workshop, WWW, 2023.
Multi-Step Average-Reward Prediction via Differential TD(lambda) [PDF]

Abhishek Naik, Richard S. Sutton
In The Conference on Reinforcement Learning and Decision Making (RLDM), 2022.
Average-Reward Learning and Planning with Options [PDF]

Yi Wan, Abhishek Naik, Richard S. Sutton
In Advances in Neural Information Processing Systems (NeurIPS), 2021.
Towards Reinforcement Learning in the Continuing Setting [PDF]

Abhishek Naik, Zaheer Abbas, Adam White, Richard S. Sutton
In Never-Ending Reinforcement Learning (NERL) Workshop, ICLR 2021.
Learning and Planning in Average-Reward Markov Decision Processes [PDF]

Yi Wan^*, Abhishek Naik^*, Richard S. Sutton
In International Conference on Machine Learning (ICML), 2021.
Discounted Reinforcement Learning is Not an Optimization Problem [PDF]

Abhishek Naik, Roshan Shariff, Niko Yasui, Richard S. Sutton
In Optimization Foundations of Reinforcement Learning Workshop, NeurIPS 2019.
MADRaS: Multi Agent DRiving Simulator [PDF]

Anirban Santara, Sohan Rudra, Sree Aditya Buridi, Meha Kaushik, Abhishek Naik, Bharat Kaul, Balaraman Ravindran
In Journal of Artificial Intelligence Research (JAIR), 2021.
RAIL: Risk-Averse Imitation Learning [PDF]

Anirban Santara^*, Abhishek Naik^*, Balaraman Ravindran, Dipankar Das, Dheevatsa Mudigere, Sasikanth Avancha, Bharat Kaul
In International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), 2018.
Identifying User Survival Types via Clustering of Censored Social Network Data [PDF]

S Chandra Mouli, Abhishek Naik, Bruno Ribeiro, Jennifer Neville
Technical Report, ArXiv:1703.03401, 2017.

Talks

Reinforcement Learning in Continuing Problems Using Average Reward

Ph.D. Defense Public Seminar, March 2024
[Video, Slides]
An Experimentalist’s Venture into RL Theory: Two Successes and a Failure

Amii AI Seminar, Feb 2024
[Video, Slides]
Unifying Perspectives on Intelligence: What RL adds to  the Common Model of the Agent

Science of Intelligence Institute’s Summer School, Aug 2023
[Slides]
Essentials of Reinforcement Learning

3rd Nepal Winter School in AI, Dec 2021
[Slides]
Towards Reinforcement Learning in the Continuing Setting

NERL workshop at ICLR 2021, May 2021
[Slides]
Personalized Brain State Targeting via Reinforcement Learning

The 3rd Neuromatch Conference, Oct 2020
[Video (more Q/A here), Slides]
Learning and Planning in Average-Reward MDPs

 Tea Time Talks, RLAI lab and Amii, Aug 2020
[Video, Slides]
On Intelligence: A Glimpse of the Diversity in Natural Intelligence

 Amii AI Meetup, June 2020
[Video, Slides]
Figuring Out How the Mind Works: At the Exciting Intersection of RL, Psychology, and Neuroscience

 Cognitive Psychology Seminar, Dept. of Psychology, University of Alberta, March 2020
[Video, Slides]
Discounting — Does It Make Sense?

 Tea Time Talks, RLAI lab and Amii, Aug 2019
[Video, Slides]

Work Experience

National Research Council of Canada

Postdoc Fellow; Sep 2024 – ongoing; Ottawa, Canada

At NRC, I am doing reinforcement-learning research with a particular focus on applications in the space industry. For instance, we are trying to make satellite communications orders of magnitude faster using RL. I am also mentoring graduate students through collaborations with universities.

AlbertaSat

Software, Automation, and Testing Team Member; April 2023 – ongoing; Edmonton, Canada

AlbertaSat is University of Alberta’s student group that designs, builds, and operates nano satellites. My role is to simulate various operational and safety scenarios of our upcoming satellite to ensure it can robustly achieve all the mission objectives. I am also doing some explorative RL/AI work for them on the side!

Google Research, Brain Team

Research Scientist Intern; June 2022 – Sep 2022; Toronto, Canada
With Bo Chang and Alexandros Karatzoglou.

Investigated methods for action-space generalization in RL for large-scale recommender systems like YouTube.

Huawei Research

Research Internship; May 2019 – Sep 2019; Edmonton, Canada
With Hengshuai Yao.

Worked on establishing an appropriate problem formulation for control in continuing tasks with function approximation.
Surveyed the literature on the average reward problem formulation for MDPs, and its connection with reinforcement learning.
Some of the work started here was presented at the NeurIPS 2019 Workshop on Optimization Foundations of Reinforcement Learning (OPTRL 2019).

Intel Labs

Research Internship; May 2017 – Jul 2017; Bengaluru, India
With Bharat Kaul

Started work on a multi-agent version of the TORCS driving simulator (MADRaS) compatible with OpenAI Gym.
Proposed and implemented a novel risk-averse imitation learning framework, achieving upto 89% improvement over the state-of-the-art in terms of tail-end risk at several physics-based control tasks.
This project was presented at AAMAS 2018.

Purdue University

Research Internship; May 2016 – Jul 2016; Indiana, USA
With Bruno Ribeiro

Engineered temporal features to design a binary probabilistic classifier to categorise the expected lifespan of new users based on their initial activity.
Created and curated one of the richest social-media datasets and released it for public use via a technical paper.

Amazon Development Centre

Technical Internship; May 2015 – Jul 2015; Chennai, India
With Sravan Bodapati and Venkatraman Kalyanapasupathy

Built a classifier to determine the start-reading-location of books.
Now in production, this feature helps Kindle users start reading a book quicker after downloading it, without having to flip through pages like acknowledgements or copyright notices. If you use a Kindle, you have used this feature :)

Master’s Thesis

This thesis was a part of my integrated Bachelor’s + Master’s program in the Dept. of Computer Science and Engineering at the Indian Institute of Technology Madras in Chennai, India, supervised by Professor Balaraman Ravindran. Defended in May 2018.

My goal was to make self-driving cars a reality in my country, India. Towards this end, I modeled it as a multi-agent learning problem in a safety-critical application and:

proposed a risk-averse imitation learning algorithm that had lower tail-end risk w.r.t. the then state-of-the-art,
trialled a curriculum-based learning approach for multi-agent RoboSoccer, and
extended the TORCS simulator to release the first open-source driving simulator that supports multi-agent training — MADRaS (has 100+ stars on Github).

[Thesis PDF, Defense Slides]

Teaching Experience

Reinforcement Learning II (CMPUT609) (x4)

Jan - Apr 2024, 2023, 2021, 2020; Dept. of Computing Science, University of Alberta
As a teaching assistant for Professor Rich Sutton’s class of ~30 graduate students, I presented some lectures, helped create the assignments and lectures, and spent a suprisingly large amount of time grading.
Reinforcement Learning I (CMPUT397)

Sep 2020 - Dec 2020; Dept. of Computing Science, University of Alberta
Helped Professor Martha White teach a class of ~150 undergraduate students.
Reinforcement Learning (CS6700)

Jan 2018 - May 2018; Dept. of CSE, IIT Madras
As the Head Teaching Assistant of this course offered by Professor Balaraman Ravindran, I created and evaluated tutorials, programming assignments, and exams for a class of about 90 undergraduates and graduates.
Principles of Machine Learning (CS4011)

Aug 2017 - Nov 2017; Dept. of CSE, IIT Madras
As one of Teaching Assistants of this course offered by Professor Balaraman Ravindran and Professor Mitesh Khapra, I created and evaluated tutorials, programming assignments, and quizzes for a class of about 90 undergraduates.

Reinforcement Learning Specialization on Coursera [Link]

Jan 2019 - Oct 2019; University of Alberta
As one of the ‘Subject Matter Expert’s, I developed programming assignments, multiple-choice quizzes, and slides for the four courses that form the RL Specialization, released in late 2019. There have been more than 80k enrolments till now!

Community Service

Co-organizer, ICML 2021 Social on Continuing (Non-episodic) Problems in RL

July 2021
Had insightful discussions with a bunch of people about the state of research in continuing problems and where we should go from here.
Co-organizer, NeurIPS 2020 Tutorial on Policy Optimization in RL

Dec 2020
Alan Chan, Shivam Garg, Dhawal Gupta and I created a set of notebooks to highlight some aspects of policy-gradient methods, such as the effects of a baseline. Thanks to Sham Kakade, Martha White, and Nicolas Le Roux for giving us the opportunity!
Organizer, Tea Time Talks 2020, Amii and RLAI lab

June 2020 – Aug 2020
Organized and moderated the talks of 40+ speakers over the course of 12 weeks (in a virtual format for the first time). Full playlist here.
Executive Member, Computer Science Graduate Students’ Association, University of Alberta

Apr 2019 – Apr 2020
Along with representing the interests of the graduate students to the department, I helped organize activities which support their well-being – physically and emotionally, academically and personally – to make University of Alberta a home away from home, especially for international students.
Volunteer, Centre for Autism Services Alberta

Jan 2019 - Mar 2020
As a part of the Centre’s Community and Therapeutic program, I helped organize recreational activities for individuals in the age range of 5-20 affected with the Autism Spectrum Disorder. The aim was to create a fun and supportive atmosphere for the individuals to interact with each other and have a good time.

Interests and Hobbies

Ice-hockey

One of the fastest sport in the world, with an exhausting 60 minutes of action (yes, even while watching). The wizardry these athletes pull off while on skates is a delight to watch (shoutout to Connor McDavid! #LetsGoOilers). I am currently ~~learning ice-skating in order to start playing ice-hockey by early 2020!~~ ~~learning to play ice-hockey!~~ playing hockey in a league!

Formula 1

There’s hardly anything as spectacular as this confluence of science and engineering which gives the world these lean, mean, and beautiful machines, with some of fittest athletes on the planet battling fearlessly at speeds excessive of 300 kmph over 20+ challenging tracks all over the world. Current favorite track: Spa Francorchamps, Team: Forza Ferrari forever!

Books

If I had to pick one ~~thing~~ of the few things I could do all my life, it would be reading (sports comes first). With three fat bookshelves overflowing with books back home, and many more in my handy Kindle, there are actually times when I am happy to see long queues, presenting another opportunity to dive into my latest book. Some of my favorite authors are Adrian Tchaikovsky, Ted Chiang, Andy Weir, Michael Crichton. I also read non-fiction, mostly about intelligence. I have had the pleasure of leading the Making Minds reading group for 3+ years at the University of Alberta. Check out my Goodreads page!

Space

I’ve found space fascinating since I was a kid. Over the past few years, my go-to sci-fi subgenre is first contact and inter-galactical travel. But my interest in space has had a massive resurgence thanks to Kerbal Space Program and Everyday Astronaut. Instead of core AI, I ~~might want to start~~ have started a career in Space x AI!

Photography and Traveling

I love visiting and documenting quaint, spectacular places, and digging into the local cuisine. Till I figure out where to showcase some of my favorite pictures, here is my old Flickr account. I also enjoy trekking and hiking into the wilderness. After skydiving, bungee jumping, scuba diving, parasailing, I’m looking forward to hang gliding and cliff jumping!

Contact Me

Email ID

Reward Centering [PDF]

Investigating Action-space Generalization in RL for Recommender Systems [PDF]

Multi-Step Average-Reward Prediction via Differential TD(lambda) [PDF]

Average-Reward Learning and Planning with Options [PDF]

Towards Reinforcement Learning in the Continuing Setting [PDF]

Learning and Planning in Average-Reward Markov Decision Processes [PDF]

Discounted Reinforcement Learning is Not an Optimization Problem [PDF]

MADRaS: Multi Agent DRiving Simulator [PDF]

RAIL: Risk-Averse Imitation Learning [PDF]

Identifying User Survival Types via Clustering of Censored Social Network Data [PDF]

Reinforcement Learning in Continuing Problems Using Average Reward

An Experimentalist’s Venture into RL Theory: Two Successes and a Failure

Unifying Perspectives on Intelligence: What RL adds to the Common Model of the Agent

Essentials of Reinforcement Learning

Towards Reinforcement Learning in the Continuing Setting

Personalized Brain State Targeting via Reinforcement Learning

Learning and Planning in Average-Reward MDPs

On Intelligence: A Glimpse of the Diversity in Natural Intelligence

Figuring Out How the Mind Works: At the Exciting Intersection of RL, Psychology, and Neuroscience

Discounting — Does It Make Sense?

National Research Council of Canada

AlbertaSat

Google Research, Brain Team

Huawei Research

Intel Labs

Purdue University

Amazon Development Centre

Reinforcement Learning II (CMPUT609) (x4)

Reinforcement Learning I (CMPUT397)

Reinforcement Learning (CS6700)

Principles of Machine Learning (CS4011)

Reinforcement Learning Specialization on Coursera [Link]

Co-organizer, ICML 2021 Social on Continuing (Non-episodic) Problems in RL

Co-organizer, NeurIPS 2020 Tutorial on Policy Optimization in RL

Organizer, Tea Time Talks 2020, Amii and RLAI lab

Executive Member, Computer Science Graduate Students’ Association, University of Alberta

Volunteer, Centre for Autism Services Alberta

Ice-hockey

Formula 1

Books

Space

Photography and Traveling

Recent Posts

Unifying Perspectives on Intelligence: What RL adds to  the Common Model of the Agent