Posts

Showing posts from 2024

Week 1 Shorts - Section 1 Summary of Introduction to AI Safety, Ethics, and Society by Dan Hendrycks

Image
Link This book/course is available for free at the link shared above. This is a massive 550+ page book that's basically a doorstopper if I print it out! I'm reading it section by section, one at a time, and making notes as I go. I'll be summarizing each part here for my own sanity and for anyone else crazy enough to read along with me on this wild ride! tl;dr Here is a section-by-section summary of the document "Overview of Catastrophic AI Risks": Introduction The chapter introduces major societal risks from AI, emphasizing the potential for catastrophic outcomes. It highlights the rapid acceleration of technological development, noting the exponential growth of the gross world product as shown in Figure 1.1. The text compares the current technological advancements to historical milestones, suggesting AI could usher in unprecedented change. It stresses that while technological advancements have benefited humanity, they also increase the potential for destruction,

On the Necessity and Challenges of Safety Guardrails for Deep Learning Models

Image
In recent years, deep learning models, especially so-called transformer and diffusion models, became the powerhouses of the AI world. They demonstrated superhuman or near-superhuman performance in tasks such as natural-language processing, image generation, and others. Yet their superiority or their expressive capacity also brings trouble, as it makes them more dangerous and more entangled within our reality. In this article we address issues of safety guardrails for deep learning models, and their unique challenges of explainability. My favorite Human Analogy is Understanding Minds and Brains You and I often arrive at decisions — you to behave, and me to predict how you might behave — when we each have very little insight into the other’s inner control loops. And we’re often highly successful in gaining trust and safety that way, thanks to societal conventions, laws, governance — the ‘rules of the game’ that create robust norms for behaviour. The same dynamic applies to machine learni

Does Fine-Tuning cause more Hallucinations?

Image
  In the past few years, the gains in capability of large language models (LLMs) have been teased out by pre-training on vast text corpora — the vast mass of raw data essentially ensembles factual knowledge parametrically in the model — and, after this, supervised fine-tuning is added to deliberately shape the model towards particular behaviors. This often involves a ‘soft gold standard’ by training the model on outputs from human annotators or on other language models which didn’t itself have access to the same knowledge, but can ‘hallucinate’ new facts. This raises the question: how does an LLM integrate new (extrapolated) facts beyond the knowledge it’s ‘seen’ during pre-training, and what impact does this have on hallucinations? The study “ Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations ?” explores the implications of fine-tuning large language models (LLMs) on new factual knowledge. Researchers employed a novel method, Sampling-based Categorization of Knowledge (S

Revisiting the Five Dysfunctions of a Team!

Image
  About 20 years ago, I was considering transitioning from an individual contributor role to a management position at the company where I was employed at the time. I picked up Patrick Lencioni's book. As time passed, I leveraged all the learnings from this reading and put them into practice. Introduction After a brief piece of prose that defines organizational health and explains why it is so often neglected despite being ‘an imperative for any business that wants to succeed’, Lencioni signals to his readers that the book they are about to read is in fact a fable — it’s fiction. Still, it deals with the challenges that teams face. Underachievement The story is about Kathryn Petersen, the new CEO of the American technology company DecisionTech, who inherits a team of talented but dysfunctional people. Lencioni introduces his ensemble cast of characters and sets up the team dynamic, including a few of the initial warning signs that flag dysfunction. Lighting the Fire Kathryn decides