AI Scientists and the Humans Who Love them with Dr. Ian Cero
Episode Description:
In this episode of the Never The Same Podcast, host Tony Pisani engages in a thought-provoking discussion with Dr. Ian Cero about the transformative potential of AI in scientific research, particularly in the vital field of suicide prevention. Together, they explore the revolutionary ways AI can generate new scientific ideas, design and conduct experiments, and even draft research papers.
Dr. Cero, a clinical psychologist and early career researcher, shares his candid journey from initial skepticism to recognizing AI as an indispensable collaborator in his work. The conversation dives deep into the ethical, practical, and philosophical implications of this shift: How might AI redefine the role of human researchers? What opportunities and challenges does this bring? And what should early career scientists do to adapt and thrive in an AI-driven research landscape?
From high school debate champion to clinical psychologist and statistician, Dr. Cero draws on his diverse background to highlight how skills like listening and precision are more crucial than ever in this era of rapid technological change.
Guest:
- Dr. Ian Cero, clinical psychologist and faculty member at the University of Rochester’s Center for the Study and Prevention of Suicide, merges statistical expertise and psychology to tackle complex problems, from social networks to suicide risk.
Host:
- Dr. Tony Pisani: Dr. Tony Pisani is a professor, clinician, and founder of SafeSide Prevention, leading its mission to build safer, more connected military, health, education, and workplace communities.
Referenced Resources
- The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
- Wingman-Connect Program increases social integration for Air Force personnel at elevated suicide risk: Social network analysis of a cluster RCT
- Attention Is All You Need
- Opportunities and challenges of diffusion models for generative AI
- The coal gas story. United Kingdom suicide rates, 1960-71
Transcript
Tony: Welcome to Never the Same, a podcast exploring how people and ideas have changed over time, and where they might be going in the future, especially as this relates to suicide prevention. As Heraclitus said, "No man ever steps in the same river twice. For it's never the same river, and he's not the same man."
Well, speaking of Never the Same, today we're diving into a topic that could fundamentally change the landscape of scientific research, including suicide prevention. And that is AI scientists conducting research on their own. I'm going to be joined by Dr. Ian Cero, a faculty member at the University of Rochester and the Center for the Study and Prevention of Suicide.
He's a colleague of mine who brings a unique blend of clinical psychology and advanced statistical expertise, and as I learned in this interview, philosophy too, into the field. Just a note before we begin, I'm a professor at the University of Rochester. This podcast is separate from my work there, but part of the same mission to contribute to science and suicide prevention and promote well-being.
So without any further ado, I introduce you to my conversation with Ian Cero about AI scientists. Well, Dr. Ian Cero, thank you so much for being here. I've been looking forward to talking with you in general and also about the topic that we'll have today. In the intro that I just shared, we learned that you're a clinical psychologist and also have advanced statistical expertise.
But what we didn't hear is that you also were a high school debate champion in the state of Minnesota. Is this true?
Ian: Yeah. First off, thanks for having me. Second, how'd you, all right. You're digging up the deep stuff. Yeah. Technically, in 2006, I was the Minnesota state debate champion.
Tony: How do you become a debate champion of a state?
Ian: Of a state? In the strict, like legal sense, how do you do it? There's a state tournament, and you win that.
Tony: You win that.
Ian: Yeah. But, if you're asking like the more developmental process, yeah, there's a high school debate team. And when I was a freshman, as far as I can remember, I was literally the worst person on the team.
The way that you can tell that is that it's a partner activity. So you need to have two people debating against two other people. And sometimes, freshmen or sophomores, there won't be enough people, and so there'll be a leftover person. And that person's allowed to show up.
They're allowed to go to rounds, but they're not allowed to advance. And so I think there were 13 of us, if I remember the math at the very beginning of the year, and for several tournaments, I actually didn't even have a partner, right? Everyone was spoken for. But, I stuck around for a while, and I really liked it, and with a little bit of luck and a little bit of practice, I ended up making it to the end.
Tony: So how do you feel like that? What, how, what from that experience do you bring forward into your work as a clinical psychologist or as a research scientist?
Ian: Sure. So I had a really hard time persuading people, when I was applying to graduate school in clinical psychology, that this would actually be like a good, worthwhile thing to happen because the reaction that you get from psychologists who are the people who are interviewing you is, you seem like you're going to be like a really combative kind of person. And it seemed that way because it was true.
But I did have this other advantage that I think took longer to explain to people, which is that being in a competitive debate kind of context—and I imagine that law is somewhat similar, although I'm not a lawyer—it really matters to be able to listen to someone and hear with precision what they're talking about. And you need to be able to do that quickly and you need to be able to do it accurately.
One of the things that you learn when there's like a win or a loss or, some kind of public refutation on the line, is it really hones your attention to what is this person's exact objection to what I'm saying, or to something else that's happening in the world that they think they can solve. That's a skill that I think is really central to both delivering psychotherapy, but also to science, right? What's the exact mechanism that is promoting or mitigating suicide risk?
I think a lot of the conversations that we have with people will lack that kind of precision. People will say things like, they'll list categories of risk, or they'll list like flavors or to use a trendy word these days, vibes of risk. And that's helpful, but there's a real benefit to going another step or two in terms of what exactly are we talking about?
Tony: Yeah.
Ian: And I think it's difficult to find another activity, like other than debate, that hones that so quickly because it's so punishing when you fail at it and so success-inducing when you get good at it, that it gets shaped up fast.
Tony: That is really interesting. And actually, it's part of why I'm excited to talk with you about the article that we're going to discuss today on the topic of AI, or artificial intelligence. There was an article published that I shared with you where a group has basically programmed what they call an AI scientist. And the headline is The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery. I'm going to ask you to describe it in a second, but I think that your ability to listen with precision, as I think you said, is really needed in where we are with AI right now.
I think people have a lot of impressions, and they have some visceral reactions—whether they're optimistic, frightened about it, or just pessimistic, or some combination. But I'm hoping that as we look at this and its implications, I think that this could potentially, pretty fundamentally, change what it means to be a scientist and what we might be able to discover that can help people. But I think we do need some precision, and I'd really, I think, benefit from your level of precision, which also is formed by a real understanding of psychologists, what it means to be a psychotherapist and an early career scientist.
Ian: Sure.
Tony: Can you just talk a moment before we jump into this about what your research is about right now?
Ian: Sure. So over the last, say, five or six years, my research has really centered on the relationship between the social network that a person is embedded in and their suicide risk. So, that involves the pattern of relationships. Who's connected with whom? Are you connected with a bunch of other people who are also going through a hard time, and does that spin up a kind of cycle of difficulty? Or can we modify, somehow, the collection of relationships around you so that frankly very little about you needs to change, but you are able to capture more support from the relationships that you already have?
There are two sides of that. One is a heavily quantitative, theoretical, and abstract side. So one of the papers that I published recently was on whether you can just add random connections to a social network and disrupt or prevent suicide clusters from forming.
And so there are a lot of assumptions that go into that.
Tony: When you say add random connections to a network, first of all, I think we probably should acknowledge our mutual mentor in this, Dr. Peter Wyman, who I think probably introduced both of us to some of these ways of applying social network. I know you had worked with those kinds of data before, but when you say you're working on, or you published a paper about adding random connections to a social network, can you bring that down to the ground for me?
Ian: Yeah. So if you imagine a social network as a collection of people and the relationships between them, it's usually pretty easy to represent that as a bunch of dots—people—connected by different lines that represent their relationships. And you could make that more complicated if you wanted.
If those lines have arrows, we might use that to represent an unrequited relationship. This would be an unrequited relationship. This would be one person thinks the other is a friend, but the other doesn’t consider them to be. Or you could also imagine hierarchical relationships being expressed like that, like parent and child. If there are no arrows, you might think of that as a symmetrical or reciprocal relationship, like siblings or friends who understand each other to be friends.
And if we assign, say, different values or experiences or risk factors to the dots, you might imagine them as having different colors or different sizes. And, you know, we can express all this mathematically, but for people at home, it’s sometimes easier to think of it visually. So imagine that you’ve got one of these kind of spider web diagrams. What you’ll tend to see in the real world—and again, this is work done by Peter Wyman and a number of other people—is that people who are expressing some sort of concern about suicide will tend to be connected to one another, right? And the way that this would look on one of those diagrams would be, you’ll see pockets or clusters of suicide risk.
And so, for brevity, let’s just call that a suicide cluster. Those patterns, we think, might be forming because I borrow a lot of my own risk from you, right? If you imagine, let’s say, engaging in an unhealthy level of drinking puts me at greater risk for something like depression.
Tony: Maybe we both do that.
Ian: Right. And maybe we do it together. And maybe there are even some times where I’m feeling like maybe I don’t totally want to do it, but you give me some encouragement because you’re having an especially tough day. Or maybe the reverse happens the next day. Maybe you’re feeling like, “Ah, maybe I want to make some changes and go to the gym instead,” or, “I got to work on something a little more productive.”
And then I say, “Oh man, but Tony, I really could use your company.” And that’s an example of how we might keep each other similar to one another. Totally incidentally, totally unconsciously. But you could imagine maybe you’re influenced two steps away, right? So you know somebody that I don’t who also has some risk factors. Maybe again, it’s drinking just because that’s the one that we’ve been going with.
I don’t know that person. I don’t interact with them, but they’re turning up the dial on your drinking in that same way. And then you turn up the dial on my drinking, and that’s a way that you can be affected by someone that you don’t even know.
Tony: And that can happen in both directions, positive directions as well as negative ones.
Ian: Right. So we also know happiness follows that pattern. Happy people tend to make one another happier, and there are easy ways that you can think about it. It’s pleasant to be around happy people.
Tony: Yeah. So then in your paper, how did you test whether adding connections helps?
Ian: Sure. So I should clarify for people that these are some simulated data, which will be relevant later.
Tony: Yeah, that’s what I was wondering about.
Ian: So one thing that’s difficult about looking at real-world data is, I don’t know what the true answer is. So if I grab a social network, and I ask people, “Who are your friends? Who do you hang out with? Oh, by the way, and how much do you drink? And how much are you thinking about suicide? And how depressed do you feel?” and I notice a cluster formed—all of the people who are thinking or concerned about suicide are interacting with one another—there could be three reasons that happened, and this is annoying, right?
It would be really convenient if there were only one. It could be influence, sometimes called contagion, which is what we were just talking about. Your behavior sort of influences mine. It could also be that we both were already heavy drinkers and we like each other because of that, right? So our risk developed independently, and then we form a relationship based on our similarity.
That’s called homophily.
Tony: Homophily?
Ian: Homophily.
Tony: Like things go together.
Ian: Yeah. So Greek homo- (same), phylo- (friend), friends are friends with each other because they’re the same.
Tony: Birds of a feather flock together.
Ian: Exactly, right. And you see this, not just in humans; you see this all across a number of different species. There are lots of reasons to expect it in all different kinds of networks. And then the last variable is the least exciting one but also is probably at play a lot of the time. We might know each other and be similar to one another for reasons that are totally independent of our relationship.
So let’s imagine that you and I work at a factory together. We know each other because we’re coworkers, and we are similar because we both are the kinds of people who would get hired at the factory. And then if the factory shuts down, we both experience the damage of job loss at the same time.
Tony: But it doesn’t have to do with being friends.
Ian: Right. But you’ll see a cluster of concern or distress or maybe even suicide risk. So we don’t know, is it you who influenced me? Were we similar and found each other because of that? Or was there something else?
Tony: Something else. So you simulated data. Could you explain that part of it? Because I think as we begin to dive into this AI scientist discovery, that’s the kind of data it’s using right now. So what is simulated data?
Ian: Right. So we’ve gone through the reasons why your real-world data are tough to use. We don’t know what the true answer is. So sometimes it’s a good strategy to simulate data, which would be like asking a computer to come up with some random numbers—a random starting point—and then say, “Alright, we have some random people. These are fake people.” They might just be represented by simple numbers like age and gender and then maybe suicide risk.
And arrange those people in a particular pattern with one another following some rules. So in our simulation, those rules were: if you try to mimic your friends, if a majority of your friends are a different color than you or a different risk value than you, you flip your color or you flip your risk value to be like other people. If a majority of them are already like you, you stay that way. And if it’s 50/50, you flip a coin.
What you’ll see over time in relatively small, human-sized networks is that simple model behaves relatively realistically, right? You’ll see these kinds of waves of risk spreading throughout a network. You’ll see people who are at risk tend to be associated with one another. And you’ll also see clusters form that look like real-world clusters where the people who are suicidal just stay that way—they get stuck.
Tony: So now in your—this is great because you’ve had hands-on relationships with using these kinds of simulations, and you’ve seen that they can act like what happens in the real world. Is this idea of using simulated data something that’s considered to be a valid approach to science? Is it something that other people do too? I assume you’re not the first person to do it.
Ian: I am certainly not.
Tony: So can you—is that like—why do people in general feel like, “Okay, we can learn things from data that we specified things about?”
Ian: Right. So take the example that we’ve been working with. We’ve created a sort of artificial world. It’s a really simple world, and that’s, you know, a disadvantage, right? It might not mirror the real world, but it’s also an advantage in that I know everything that’s happening, and I know all of the relationships between all of the variables and how they’re causing one another.
So if I perform some kind of intervention on that artificial world and it leads to a change, I can generally be pretty confident about what caused that change.
Tony: And it might not be definitive, like, “Okay, now we know this for sure,” but it would then be like maybe a signal to do some other kind of work with that.
Ian: Or yeah, you shouldn’t stop at a simulation, but you probably want to start with one. And it can tell you, “Okay, if my assumptions are right...” And there are a lot of assumptions, right? We’re assuming that people can be described in relatively simple ways. We’re assuming that people affiliate with one another in the way that I’ve described, which is sort of like, I just sort of mimic the majority.
To the extent that those assumptions are true, then your findings will also be true, right? There’s a sort of airtight mathematical relationship between things that I assumed and things that happened in my simple artificial world. And that’s the value. The disadvantage is, in order for something to be simple enough to simulate in a computer, it usually needs to be an oversimplification. And that means that the real scary part is whether my results mirror something in the real world.
Tony: And then that you could go on and look at. Okay.
Ian: Right. So you would test that after you come up with your initial hypothesis.
Tony: Okay. So then let’s shift from that. We may come back to it; we’ll meander our way. But let’s now focus on AI.
Ian: Okay.
Tony: A lot of people think about AI as being like a chatbot. Where you ask a question or you make a well-engineered prompt, and it will give you back some information or recombine some information. And that’s probably the main way that right now people interact with AI.
So, this is a little different from that. So, I wonder if you could walk us through what this group did in creating an AI scientist, because I want to find out—is this going to make scientists obsolete? And I think I also want to explore with you, what does this mean? You’re starting your research career.
Ian: Yeah, it’s ominous.
Tony: What’s it going to mean for people starting their research careers or people who are further along in it too?
Ian: So you want me to give away my secret sauce? How I’m going to handle the upcoming AI apocalypse?
Tony: Yes, that’s exactly.
Ian: I’ll let you all in on my prepper plan.
Tony: Yeah. Yeah. So help us understand what did this—and this is just one example of a lot of things going on in AI, but I think it’ll be helpful for us to just focus on this one, and maybe we’ll get to some other topics too. So tell us about the AI scientist that this group—it’s an interesting group, from across different countries, England, Canada, there’s a company involved, right?
Ian: Yeah. There’s private sector and academia.
Tony: And academic collaboration, which is a really interesting thing too. So walk me through it.
Ian: Alright. So it’s interesting that you say that most people’s experience of large language models like ChatGPT is as a chatbot right now—something that you can interact with—because I think that if you actually just hold that idea in your head, you can get most of this paper.
You could maybe even imagine doing this paper yourself, doing the experiments in this paper by hand, slowly, just typing away with a chatbot.
Tony: Okay.
Ian: And so, what they do is they ask this sort of question, this quest, they open with this question, which is, could we essentially replace a scientist from end to end, from start to finish? And can we do that as a kind of proof of concept? I don't think anyone who was working on this paper thought that this would be the last paper on the topic. I think they wanted to kick it off as one of the first.
And so they say, look, when you think about it, being a scientist involves a couple of different processes. So you've got reading some old literature, looking at that literature and deciding what are some questions that are inspired by it, honing in on questions that are more and less interesting, more and less worthwhile, generating some hypotheses that might answer those questions, coming up with experiments that can confirm or deny your different answers or hypotheses.
And then you’ve got to look at those results and analyze them and describe those results in a paper. That's essentially what we do right now. And then, the last step would be, is that paper any good, is your finding good? And so it would need to pass the peer review process.
There have been a bunch of parts of this that have been done in other papers. And what they try to do here is essentially get an AI—or actually a collection of different AIs—to perform all of those peer review parts of the process one by one.
Tony: Okay. Let’s take a—
Ian: Yeah.
Tony: For those who are watching, we can pull up a—
Ian: Oh, yeah.
Tony: A figure that will help show what you just said. And for those who are listening in audio only, we will have a link to this in the episode notes. But this diagram shows the steps that you just described so eloquently, that the scientific process involves idea generation and then creating experiments.
So it's idea generation, seeing is what you're doing novel, determining how novel—novel meaning new. Is this, am I about to test something new, figuring out a way of testing it, getting the data to do that.
Ian: Right, right.
Tony: Testing it, getting results from it, and then writing a paper.
Ian: Yeah.
Tony: And even peer reviewing the paper.
Ian: Yeah. Yeah. One thing worth keeping in mind is there are actually two agents—two robots, if you like—in this study. One that's generating the scientific results and another that's evaluating them. It's both a scientist and reviewer.
Tony: So it's not like me peer-reviewing my own paper.
Ian: No, although it would be a little bit like your twin or clone reviewing your paper. So the likelihood that it's “Wow, self, that's a really good idea,” is a little higher. But they do some creative things to help reassure us that the review process sort of mirrors what it would be like in the real world.
So where should we start? Should we go feature by feature, how they pulled it off? Because I really think that you could do this just typing with a machine; you just couldn’t do it as quickly as they did.
Tony: Yeah. Yeah. Yeah. Why don't we start with idea generation? Because I have some questions about that. And I also shared that we were going to be talking with one of our colleagues, Anna Defayette, and she also had some questions when I told her I’d be talking with you about it. So why don’t we start with the idea generation part, which for me as a researcher is like the hardest one. Basically taking what’s known—or, which is not always 100% known—
Ian: Right, because you need something to be open. There’s no science left to do if you know everything.
Tony: Right. So there’s some knowns, there’s some unknowns, and then putting that together into, “Oh, here’s an idea worth testing.” That’s really hard. That takes years—or took me at least years—to come to the way of thinking that gets you to that idea that you’re ready to test. So what does it do for coming up with an idea?
Ian: So it starts with a template, which is a set of instructions that it gets for generating a novel idea. And they actually publish their code. You can look at what these instructions are. In fact, if you wanted to, you could download this machine and do some work to get it up and running.
It should, in principle, be able to replicate the results of this paper, maybe even generate some of your own.
Tony: So what’s that template like?
Ian: So they—I don’t have it in front of me.
Tony: I don’t know if you know every detail of it, but—
Ian: So it involves instructions to comprise a description and a description of what the collection of ideas that it generates might be like. So you could imagine if you were doing that with a machine, if you were just typing it out, you could say something like, “Alright, I’m interested in this area of research. Here are some background papers that I’m going to upload to you. What are some next directions that you might go in?” And then it could give you a bullet of, say, four or five.
Tony: So let me stop right there because actually it brings me to one of the questions, which is that the paper says that this gives us—and here’s the quote—"endless, affordable creativity and innovation."
Ian: Sure.
Tony: Endless, affordable creativity. I want some of that.
Ian: Yeah. High praise, high praise.
Tony: So, the question that our mutual colleague had was, can a robot be truly creative?
Ian: Sure. So these are some philosophical questions. One thing that we didn’t get into earlier is that after I was done debating, I had more questions than answers. And so, my undergraduate work was in philosophy.
Tony: This is like a goldmine.
Ian: I wish. And so I had this experience—
Tony: Psychology, statistics. I’m talking to the right guy.
Ian: I’ve meandered. The thing about being a philosophy student in 2008 was, you’re really annoying at parties. You bring up all these questions like, what does it mean?
Tony: Is that different now in 2024?
Ian: No, I’m just annoying in a more relevant way, in a way that’s harder for people to dismiss. And you’re learning about all these things like, what does it mean to be conscious? What’s free will? What makes something art? All of these sort of foundational questions—or your audience might be familiar with the trolley problem.
Tony: I’m not. Other people probably are.
Ian: I think it was originally introduced by Philippa Foot, but the idea is to—
Tony: Philip the foot?
Ian: Philippa, Philippa. Like the feminine version—
Tony: Like Vinny the neck or something.
Ian: No. And she says, imagine that there’s a runaway trolley running down on a train track. It can’t stop, and there are five people tied to the track.
Tony: Oh, I think I have heard this before.
Ian: It’s fairly popular. And you have the choice to pull a lever and divert the trolley onto a track where there’s only one person tied down. So the question is, is it morally justified to kill one to save those five in this scenario?
And there are researchers who have replicated this across a bunch of different countries, and basically 80% of people say, “Yeah, that makes sense. It’s unfortunate, but it makes sense.” Now let’s do a little twist. Imagine that there is, again, a runaway trolley running down the track, and it’s heading toward five people who are tied to the track. They will surely die if the trolley collides with them again. Only this time, you’re standing on a footbridge over the track, and there’s a—usually it’s described as a fat man, just like a really big, rotund person. But sometimes the more modern version is a workman with a heavy backpack full of tools.
The question is, is it morally justified to push him onto the track in front of the trolley? He will surely die, but his tools will derail the train. So like the second one is—they’re both gruesome.
Tony: Philosophers are pretty, yeah, yeah.
Ian: The second one is a lot worse.
Tony: Tough.
Ian: And I promise this will relate to AI in a second.
Tony: Oh, I’m sure it will.
Ian: Okay. So most people, again cross-nationally, independent of things like religion, age, gender, tend to think that’s not okay. They tend to say, “Alright, the first thing, you kind of had me. The second one, I’m really against that. I don’t like that.” And what’s interesting is the math is the same in both of them, right? That we’ve got this 5-to-1 trade-off. What’s different is that you’re using someone as a means rather than an end. Like the person is being sacrificed.
Tony: The sort of innocent person. They’re not already in a bad situation.
Ian: Right. You’re using him as a tool, and his death is sort of instrumental to achieve your end, as opposed to the first case where the person who dies because you diverted the trolley onto another track—in that case, it’s just an unfortunate consequence.
Tony: It’s a lever that’s the means.
Ian: Right, right. So to connect this to AI, the reaction of creativity, and the notion of creativity—the reaction that I got from lots of people when I would bring this up early in my twenties was, “What’s wrong with you?” And once they got over that, once they calmed down from the, “You ruined my night,” they would say, “Who cares? This doesn’t—like I get that this is a dilemma; I get that it’s unfortunate, but we never see people tied to trolley tracks. Why does it matter to address this question or figure out what’s right or wrong in different circumstances of this kind of flavor?”
And all of that’s changing now when we’re in a situation where increasingly machines are making decisions on their own. So if you have a self-driving car whose brakes don’t work, it needs to make a decision whether to hit an old lady or a young child—or maybe two old ladies and one young, healthy child, right? Suddenly those decisions need to be made in advance, and you need to figure them out relatively well.
All of these foundational questions in philosophy that we used to be able to shrug off because humans are really good at handling them on the fly, suddenly we need prospective, clear, and interpretable answers to them. And creativity is one of those cases. What does it mean to be creative? Does it mean to do something that just hasn’t been done before? If that’s the case, it seems like machines are capable of doing that in a strict sense. Like if I say, “ChatGPT, can you come up with a combination of words that’s never been uttered before?”
Tony: It can do that.
Ian: I don’t have any reason to think that it couldn’t do that.
Ian: If you mean something more profound by creativity, like to be able to do something new that’s never been done before, at least hasn’t been done very often, and is insight-generating, well, they increasingly seem to be able to do that, right?
Tony: Tell me what you mean by insight-generating.
Ian: It’s hard to measure, but you could imagine coming up with a test where we get a large number of people who ask a machine some important questions. Maybe they even ask it questions about literature. Or maybe they ask it to come up with poems. And we don’t tell them that the machine has generated the poems—this would be like a Turing test kind of approach. And if a large number of people say, “This is a really creative poem,” it seems reasonable to say that the machine has done something creative.
Tony: Okay. So, let me think about that as it relates to the day-to-day life of a researcher.
Ian: Yeah.
Tony: I’m pretty sure that I don’t think I’ve really ever been creative. Because I think there are probably people who come up with like truly new things, but I don’t think—I’m definitely not one of them. I think I can take different ideas from different domains or different observations and combine them in some way that, you know, could be useful or could be tested.
Ian: Yeah, you can remix.
Tony: I can remix. I think I’m a remixer.
Ian: Sure. For what it’s worth, I think I am too.
Tony: Okay, okay. Is that, like, the first kind of creativity? Or is that creativity by the way that you were just describing it?
Ian: I think certainly there are creative remixes.
Tony: Okay.
Ian: So I think—
Tony: Is that what the AI scientist is doing? Is this AI scientist doing something like what I’m doing when I’m taking different ideas and putting them together in a different way to come up with something hopefully useful towards suicide prevention—in my case?
Ian: Short answer, yes. The way that these machines work is by—maybe we start with the simple first approximation. These are pattern-matching machines.
We give them large amounts of text, and those pieces of text get broken up into different tokens, which are just like clusters of characters, usually three or four letters long. But you can imagine them as just words. And it learns from very large amounts of text—and I mean truly large, like the size of the internet and more.
It learns which combinations of words tend to follow other combinations. It learns that when it sees a question that involves the word “when,” often, though not always, the responses to those questions will involve some reference to time or a date, right? When it hears “how” or “why,” the stuff that follows is usually a little bit more open-ended but tends to follow a sort of causal pattern—“because A happened and then that caused B.”
And they just do a lot of that. So if you feed them a new prompt—a prompt that would be similar to the kind of prompt that a scientist might consider in their own mind, or maybe would be having a conversation with someone else at lunch—they’ll do the same kind of thing that a scientist would do.
If I said, “Here are three or four papers. What kinds of questions emerge for you after having read them? And do any of them seem like productive research avenues?” A lot of what you might be doing internally is just predicting what kinds of words would make sense after a query like that.
Tony: Yeah, yeah. And just to venture, you and I have started to do that, and even together. So we sat down and started to say, threw some of our ideas out there and then see what kinds of ideas this generates. So we’ve been doing that.
Ian: So we gave it a prompt, and we said, “Can you come up with anything else?”
Tony: Yeah. And?
Ian: And it did. And frankly, it came up with—if I remember something, if I remember right, it was something like five or six ideas. They were all clearly related to our topic. And that makes sense because it’s supposed to predict what would generally be a coherent response.
And I think three of its ideas we threw out pretty quickly. We thought, those are generic and they’re related, but we don’t like them. But there were two—
Tony: One or two that was, “I wouldn’t have thought about that, but now let’s go with that.”
Ian: And we asked it. We said, “Can you tell us more about number two? What would it look like?”
Tony: Yeah. One of the things I love about it—and it’s a little bit of this endless, affordable creativity and innovation—is that it never gets tired.
Ian: Almost. I get rate-limited a lot. It gets tired only in a trivial sense. In principle, it never gets tired.
Tony: Yeah, yeah. And I find that so liberating, that I don’t have to limit the number of iterations or directions that I go just because I might be boring another person. Now, I still love talking with other people too.
Ian: I plan on keeping most of my human relations.
Tony: Most of them. I mean, for the most part, that doesn’t replace talking with people. Although, I think that’s a concern—a legitimate concern—that people have.
Ian: Yeah. Well, you can imagine how valuable this would be for something like a therapist, right? If it can learn the pattern in a right way for how to emotionally support someone—you know, I get tired after not very many patients, and I—don’t tell my employer—but I’m also kind of expensive, at least relative to one of these machines.
You can imagine much more availability of higher-quality and lower-cost mental health care, which is a real crisis in this country, if you get it right.
Tony: Yeah. And I’ve done some counseling with an AI. It was quite helpful.
Ian: I could buy that. Yeah.
Tony: At the same time, I think people will feel a little squeamish or squirmish?
Ian: I’m on that list.
Tony: And I’m on that list too. That’s the thing. Because, say, but we can’t, you can’t really—it isn’t—I’ll tell you from just my experience so far—now this is very early days—but although I found it helpful, it didn’t satisfy some part of me that really wanted to, that like appreciates that coming from somebody that I know as a person.
Now, if you had blindfolded me and made it sound like a person, would it have been—or I thought it was a person—I don’t know, as I knew it wasn’t. But I think that’s, yeah. You can certainly, because this is just the beginning, you could really see where that’s going. And I think we’ll veer into, in a moment, what does that really mean for somebody, especially earlier in their career?
If I have, let’s say, 20 or 25 more years to my career, you probably have 45 or 50. And the amount of progress that will happen in 50 years—it’ll be more than the next 20. And so we have different—we’re in different situations, and I think it’ll be interesting to talk about that.
But let’s look at the next phase though. So after idea generation, it was the experimenting.
Ian: Okay. So, a machine’s come up with an idea or a collection of ideas, and it’s been instructed to iterate on those ideas. In the same way that you and I asked the machine, “What are your sort of five ideas related to our issue?” And then we said, “Tell us more about number two.” And so it elaborates number two. These machines do that too. And then they select from a list of improved ideas, which they think is the best.
Tony: Can I just pause for a second?
Ian: Yeah.
Tony: But when I’m doing it, like I’m adding some stuff to the mix as a person. Do they—does this AI scientist—
Ian: How does it go?
Tony: Go.
Ian: Yeah. The experimenters are using something called an API, which is short for application programming interface. But it’s basically a way to talk to a machine with code. So you and I normally type into the chat box.
You can also—you can just do this—send commands to it from a programming language like Python, R, or C++. You could come up with your instructions in advance. You could say, “I want you to follow five steps. And I want step one to be generate some ideas. I want step two to be, run some code.”
And then you can have that conversation with the machine in an automated way. And that’s what they’re doing here.
Tony: So it would be like, instead of me saying, “Okay, what would be a good idea?” I’d just write a program that says, “What would be a good idea?”
Ian: Yeah, you write a program that asks the question.
Tony: That asks the question for me. Right.
Ian: Yeah. It has the text in it. It has a command that says, “Hey, send this text to the machine as though we were having a chat. Oh, by the way, and the text that I want you to send is, can you come up with some new ideas related to this topic?”
Tony: And is there any—is there any—could there be large language model variation in how that gets phrased or handed off, or are those more hard-coded at this stage?
Ian: As far as I know, the instructions that they use for each step are all hard-coded.
Tony: Okay. But it could be, right? You could have—
Ian: You could imagine—
Tony: Two large language models, two of these AIs talking to each other.
Ian: Right. Where there’s a sort of a prompt generator that says—
Tony: Like you are a scientist prompting an AI scientist.
Ian: Right.
Tony: You want to walk them through these number of steps. And then you tell the other one, “You are a scientist coming up with ideas,” and you can have them.
Ian: Yeah. You could.
Tony: They didn’t do that in this particular case.
Ian: They didn’t do that here because, A, I think that would be more work. B, I think even though it would be insight-generating, there would be many more failures, right?
Once you have two machines interacting with each other, the likelihood that a semicolon gets misplaced, and then the whole thing crashes for reasons that are scientifically uninteresting—I think that the risk of that goes up. But yeah, you could in principle do that.
And that might be a straightforward next step to a study like this. But in this case, as far as I know, all of their instructions to the machine are hard-coded.
Tony: So they got to that point, came up with, “Okay, this is an idea we’re going to test.” Now what—we both work in suicide prevention.
Ian: Sure.
Tony: And I want to talk, in a bit, about how will—how would this apply? But what is the topic area that’s being covered here?
Ian: Yeah, they have three. And if I remember right, it’s diffusion modeling, they look for ways to optimize language transformers, and then a silly word called grokking.
Tony: Yeah, man.
Ian: Do you want—me, yeah, do you want like an overview, or—
Tony: I’ve got to hear about grokking.
Ian: Sure.
Tony: But in general, these are all within computer science.
Ian: Machine learning, yeah.
Tony: Right. So the—right now, the topic area—
Ian: Right.
Tony: You haven’t applied this to medicine or—
Ian: No.
Tony: Or any—but—
Ian: No, and this is important.
Tony: All within— I know. That’s why I’m—so this is all within computer science.
Ian: Right.
Tony: But gimme the grokking—you gotta give them.
Ian: Sure. Okay. Grokking is a phenomenon where a machine learning model, like a neural network, will go from sort of good success on its training data, and it’ll flatline for a while as it learns more and more.
It doesn’t appear to be getting much better for a while. And then all of a sudden there will be a spike in its progress.
Tony: And it groks it.
Ian: Yeah.
Tony: And there’s even an AI called Grok.
Ian: Yes. It’s a term, I think, from a Robert Heinlein novel, Stranger in a Strange Land, that’s about a human who’s raised by Martians, and the Martians have this concept called grokking, which means to understand something really deeply in a sort of fundamental way.
You can think of it like—for your audience over 30—in the original Matrix, when Neo is getting shot at and he just wills the bullets to stop because he has this sudden epiphany about what the Matrix is and how it works—that’s grokking.
Tony: Okay.
Ian: He’s good before, but suddenly he gets something.
Tony: And in this case. It’s how do you get one of these models to when it’s to—
Ian: To exhibit.
Tony: To all of a sudden get a lot better.
Ian: Correct. And we’ve only demonstrated it in small datasets.
Tony: And all of these things are within computer science.
Ian: Yes, yeah.
Tony: And why is that important right now?
Ian: I think critical for interpreting these results because the problems that it’s pursuing—I’m not a computer scientist myself. I’m a shrink who spends a lot of time in spreadsheets. So I have a little bit of awareness of what’s going on, but my impression—
Tony: And you can code.
Ian: Yeah, and I can code. So I’ve done work that’s not AI, but involves some of these—like a bunch of work with APIs, for example, I’ve done that. And in computer science, my impression is that the problems and the solutions to those problems—what they would look like—are much better defined than in a large number of other areas, namely social science and medicine.
We know what the problem is that we’re trying to solve, and maybe we don’t know what the solution is, but we would know that we had it if we found it. We know what the success criteria look like, and that’s something that a machine should be much better at than if you come into social science where we’re like, “Hey man, suicide prevention. What’s up with that?” That’s going to be something where a machine performs—
Tony: There are so many unknowns and unknown unknowns. It’s what I think people call a wicked problem.
Ian: Yeah.
Tony: It’s, by definition, what a wicked problem is, right? So there’s—and this is a, in the current state where we’re miles from that.
Kilometers for our international audience that we have here. So we’re far from that. So that’s an important limitation here.
Ian: Right.
Tony: That we’re talking about that—you would expect the machine to be better at this kind of research than maybe the kind of research that we do.
Ian: It’s starting out on a—For now, we don’t have any reason to think—it couldn’t get good at social science, but that it probably would take a little bit more than this, right? So this is a proof of concept model that is starting out on a finely groomed track.
Tony: Do you have any intuitions about what more it would take?
Ian: I can tell you what I think would go wrong, and maybe we can work backward from what the success criteria might be. At the end of this paper, they say, “We’ve developed a machine that we think is basically as good as an early machine learning researcher.” It can come up with some new ideas, it can write down some of those ideas, its math is mostly solid, but not quite there, and it can produce mid-quality papers.
And I have the impression that in psychology, and then in the social sciences more broadly, we’re facing a number of issues that would make it hard for a machine to learn what success would look like. For example, a large number of our papers don’t replicate when they’re put to relatively strict tests. It’s unclear—if you look in social psychology, for example—how would social psychology know if it were done? If we had answered all of the questions that we wanted to—or at least the subset of them that we can answer—it’s unclear what that would look like.
And that might be hard in physics, but I think you could imagine—physicists talk about a theory of everything, and you get the impression that many physicists would probably agree if we had found it. And I don’t know what that would be in psychology.
Tony: Let me just ask a follow-up on that because I don’t understand yet. But like in our field—the thing that I devote my whole career to, and I know you too—is wanting to reduce suicide. Wouldn’t we know if we started to see that?
Ian: Yeah. Suicide might be a good place for this machine to start for that reason. It’s high stakes, so you would have to be careful. But part of the reason that I’m here doing suicide research is because I’ve been interested in mental health broadly for a long time. The reason I selected suicide was at least in part because it’s a better-defined problem.
It’s often unclear—like a little bit unclear—what’s the point where a drug overdose transitions into a suicide attempt? Where someone just knows that they’ve taken too much of an opioid or much more than their usual dose? They might even be worried that it’s a little bit lethal, but they just sort of don’t reach for the Narcan, right? Is that a suicide attempt? It’s unclear. But for a significant percentage of suicide attempts, we have someone reporting, “Yeah, I was trying to die.” Or maybe there’s evidence like a suicide note after someone’s deceased, right? So we have a clearer sense of what success would be. It would be that number going down alongside high approval from the people who are receiving our services.
Tony: Yeah, finish.
Ian: But that would be a lot harder in well-being research.
Tony: Oh, have you achieved it or not? Yeah, I see. I see that. So the outcome part—this is where I need your precision here—because the, what led to the outcome part is really hard with suicide.
Ian: Yeah.
Tony: Right? Because it’s such a multi-determined thing, and we’re not very good at knowing those things. But that’s different from what the outcome is—that you would know if you were reducing it.
Ian: And you could see that at a population level, right? And we’ve seen it before. Like we know that in the UK when coal gas gets replaced and the carbon monoxide access in the population goes down, you can see a clear line that the suicide rate drops precipitously and that it drops in a way that corresponds with the availability of carbon monoxide in your house.
Tony: That’s not an experiment.
Ian: But it’s clear.
Tony: It’s very compelling—the timing and relationship to one another.
Ian: Yeah.
Tony: I see. That’s interesting. So you actually think that, yeah, that’s something—something like this would be more amenable to something that has a countable outcome like suicide.
Ian: Yeah. And compare that with depression, right? So you said a moment ago, if I heard you, you said something along the lines of—and I agree with you—the suicide outcome is relatively clear, but the things that lead to suicide are often very unclear.
Depression is pretty similar without the clarity of outcome, right?
Tony: I see. Okay.
Ian: So you can imagine suicide is a tricky problem to solve, but you can imagine harder ones. Like just subtyping depression into different flavors or categories is something that we all have a sense would matter or be valid, right? There do seem to be, when you’re working with patients, different kinds of flavors of depression. But doing that quantitatively, as far as I know, has so far evaded us, despite the fact that we all share an intuition that there are different depressions—orally—we can’t find them.
This completes another section. Continuing seamlessly.
Tony: So coming back here, The AI Scientist: Towards a Fully Automated Open-Ended Scientific Discovery—comes up with the data.
Ian: Yeah.
Tony: Now this gets back to our conversation about simulated data.
Ian: Yeah.
Tony: How does it do that?
Ian: So in one case, in the case that I think they review in the paper, they pick out one to highlight. It uses data from a benchmark dataset. So this is a dataset that computer scientists had put together, because if you can do well on this dataset, we might expect that you’ve come up with a real success across a broad range of related topics, right?
Imagine that we made it a really high priority to recognize cats and discriminate them from dogs. A benchmark dataset might have a large collection of pictures of cats and pictures of dogs in a bunch of different positions and arrangements. And each one of them is labeled “cat,” “dog,” “cat,” “dog.”
And we know for a bunch of existing models how good they are at discriminating cats from dogs. This machine, in at least the article that they highlight, was using some benchmark datasets for a process called diffusion, which has to do with image generation.
Tony: Okay.
Ian: And so it can compare its performance on its new idea, which involved creating branches of a neural network. That was the idea that it came up with. It can compare the performance of its idea to existing success with those datasets. And in other cases, it did simulate some data. If you wanted an example of where simulated data might be appropriate, I imagine some of the papers on grokking involve that.
Demonstrating grokking in real data is relatively difficult to do for all the reasons that we described before, right? Is a machine suddenly coming to learn some hidden pattern, or did it just get better at a noisy dataset? In the examples of grokking that you usually see in the literature, you have a very clean, artificially generated dataset that might be something like a sine wave or an algebraic operation where we say to the machine, “Okay, you’ve got some inputs, A, B, and C, and then some outputs, D, and we want you to learn the relationship between the inputs and the outputs.”
And so maybe it’s A plus B divided by C will give you D. So it’s a very simple algebraic relationship. And we generate those data and we say, “Figure it out.” What will happen over time is the machine will just memorize the dataset that it’s been given to you, and it’ll take a close approximation guess.
And then over time, grokking occurs when it figures out the pattern. Okay. So you could imagine going to the AI scientist and saying, “Okay, you have this proposal for making grokking happen more or faster or better. Or maybe you have a proposal for something that you can do to disrupt that process, which will help explain why it happens.”
Because it’s an open question—why machines behave in this way that sort of seems human-like, where you have that “ah-ha” moment in a math class—or maybe you don’t, but you hear about it. People say that happens to them. What the machine would do then is send instructions to another helper called Ader, which is a way of generating code programmatically.
It’s a program that people use to generate code suggestions while they’re programming. And the AI scientist in this paper has access to that. So it can say, “Alright, I’ve got this idea. I’m trying to generate some code. Can you help me generate the code for this experiment that I’ve outlined?”
It gets the code, the code then gets executed, and then the results come back. And it interprets those results with a prompt like, “Here are your results. What do you think these mean in light of the idea that you had earlier?”
Tony: And then it will say, “Here’s what I think they mean.” And then, I think the part about writing it up as a paper seems like, in some ways, the easiest part.
Ian: Trivial, yeah.
Tony: Because you’re just saying, “This is what introductions are like, this is what the method section is like, this is what results are like,” and they have to put that together.
Ian: Yeah. And the parts—it does make mistakes—but the parts of that it gets wrong are like formatting issues. Like the table that it generates sometimes doesn’t fit on the page, right? And those are things that, frankly, I’ve done.
Tony: Yeah.
Ian: So it makes a lot of mistakes there, but they’re not interesting mistakes.
Tony: So you said before—in fact, the paper says exactly—“This AI could generate hundreds of interesting, medium-quality papers.”
Ian: Yeah.
Tony: I’ve written a few medium-quality papers.
Ian: I aspire to write thousands of medium-quality papers.
Tony: But—
Ian: We can only dream.
Tony: So what—I mean, what benefit is there to more medium-quality papers?
Ian: Sure. If you imagine the collection of papers that this machine could produce as all being perfectly median, right? Like the exact middle of quality that could ever exist.
Tony: Okay.
Ian: That could be valuable. It could fill in a lot of gaps in the theories that we have, especially in fields like suicide, where there aren’t that many people working, right? We have a number of questions that might be important to answer. Those questions could inform larger trials, and they might make those trials better. But we don’t have the human power. So you could imagine even if 100% of the papers that are produced are all medium quality, there could be value there. There could be some bad things that happen, but there could be value.
Ian: If you then instead say, “Not literally every paper is going to be medium. They're just, on average, medium. Some are going to be a little bit worse, but some of them might be substantially better.”
Tony: Did that happen here?
Ian: In this paper? No. But there are examples of it that have happened recently. If I remember right, Google had a reinforcement learning algorithm that came up with a new approach for making matrix multiplication faster. And this is important because basically everything that's happening underneath the hood of a machine learning model is just some flavor of matrix algebra.
Tony: If you can make that faster...
Ian: If you can make that faster, you can make everything that you do on a machine faster.
Tony: Okay.
Ian: And we've come up with some shortcuts in the past. It's also possible to prove mathematically that only a certain number of shortcuts will ever exist, right? You won't—you can't cut it in half, but you can cut it, you know, into fours. The machine came up with something that no one had thought of before that does appear to work. So it has been done, and that's incredibly valuable.
Tony: Okay. And I guess there's a greater—the more you do, the more, if there is some curve around medium...
Ian: Right.
Tony: The more you do, the more you may get some of those outliers that are better. And some that are worse.
Ian: To use a venture capital term, it might find a unicorn, right? It might give us the next quantum mechanics, would be what you're betting on.
Tony: And to my own personal experience with working with these AIs for idea generation, it also wouldn’t get tired. Because, yeah, there’s expense in running cycles—it takes computer time, it takes electricity, it takes a lot of things—but compared to, really, almost like a handful of research studies that can be done in a year by people...
Ian: Oh, yeah.
Tony: It could keep going.
Ian: Yeah. This thing did dozens of studies in a week.
Tony: That's pretty powerful.
Ian: It is. And you could imagine many of them running in parallel. And assuming that they're all producing something that is, A, novel—so we didn't know it before—and of medium quality, that could be a really big deal.
Tony: Yeah, yeah. And there's probably this—we're talking about suicide because that's something that's our area—but there are lots of problems in medicine and all different areas that there aren't enough hours in the day to make progress. Meanwhile, people are dying and struggling. So being able to add to that body of research could be really useful.
Ian: And imagine if you have a child with a rare disease, where there just isn’t the money for a pharmaceutical company to be doing research on a new drug. It might be the case that this could be a cheaper way of doing that, right?
You could imagine these areas like cardiovascular disease, right? Everybody’s going to get it if you wait long enough. Cancer, everybody’s going to get it if you wait long enough. So there’s a lot of financial incentive to do that work. The rarer the disease, the less money is in it to take the high-risk gamble for a pharmaceutical company to solve the problem.
Tony: But this is a lower risk.
Ian: This, yeah.
Tony: And I guess I want to talk a little bit about now the simulated data thing. Because, and when I saw commentary on this paper on the web, that was one of the big objections.
Ian: Alright.
Tony: But from my perspective, that also seems maybe that could be something that the computer asks humans to do. Like, you could work out some things in a simulation, and then if it wanted to do what you said before—like you said before with your data that you did the social network simulation, and you showed—but there's some assumptions you wanted to, you would like to test in the real world to see if they replicate the assumptions you made in your paper.
Maybe a good role for us would be to go get the real-world data—like actually get tissue samples once it's been shown through simulations—and then ask, “What would you like us to do with this?” Now, this, I think, transitions us a little into the, what does this mean for the future of scientific research and for—and maybe getting specifically to your own research.
What do you—where does your mind go? Because we’re not historians of science, and we’re not going to be able to say where science is going. But, you know, that’s never stopped us from offering an opinion. But I think you do really have to come to terms with, what am I going to do with this?
Ian: I think you should.
Tony: That actually brings us back to this question about simulated datasets. Because when I was looking at some comments about this paper online, one of the big objections was, “This is simulated datasets.” And there’s probably a lot of different things we could say about that. But one of the things that gave me the idea is that maybe one of the roles that humans can play in this loop is collecting real-world data.
Ian: Sure.
Tony: Like, we could—if the AI scientist came up with a good idea, we could collect that data and be able to test the assumptions maybe after it’s been tested by simulation, which I think maybe brings us to the question of like our role, where is this going? Neither of us can probably say for sure. And we’ll look at this in probably a very short time and laugh at where we thought it was going. But I think—I’m interested, though, in what you’re thinking now as an early career scientist about what this means. And do you think people should be thinking about that? Yeah. What are your thoughts?
Ian: Yeah. I definitely don’t think that the right move is to bury your head in the sand on this one. I think the proposal that humans—sort of human scientists—take instructions from the machine and then become the sort of data collectors strikes me as an especially dystopian world for a scientist.
Most people go into science to spend time thinking and dreaming, and that’s the fun part of the job. And then, you eventually discover, as you have shared with me, that a lot of being a lead scientist is being a good administrator. It is an important job, a job that we can’t let go of, but it involves much less paper writing and thinking and daydreaming than what you wanted it to be.
Tony: Leadership.
Ian: Yeah. It’s more leadership and maybe—dare I say—a little bit less science. It’s part of the scientific process, but you don’t necessarily feel like a scientist when you’re running meetings or when you’re sending emails. So let’s take that insight and say that there are parts of the scientific process that are all very important, but that are more and less pleasant and interesting. What we would be describing here would be handing over to machines the part of science that we like doing the most. And what we would be saying is, “Because the machine can’t do any of the tedious, boring stuff, we’ll do the tedious and boring stuff.”
And so at least from the perspective of the scientist, that version of the future sounds pretty unpleasant.
Tony: And also, it’s probably also only a matter of time. Once these AIs have bodies and are robots, they can do it perfectly well themselves. So we would also make ourselves eventually obsolete that way anyway.
Ian: Yeah.
Tony: So then how do you think about it for yourself? We can’t say where all of science is going or how every person should think about it. I agree with you that people should think about this though. There are two kinds of objections that kind of bug me. One is when people argue on the basis of what AIs are unable to do right now.
Where it’s, “This is just a matter of time. They can’t yet; they can’t yet, even—they don’t have agency. They don’t have moral judgment.” Those things are all coming.
Ian: Yeah.
Tony: The other one is where we underestimate how much it’s going to change us.
Ian: Yeah.
Tony: This idea of, “Well, we’ve always been the same through every technological change, even though electricity changed society a lot, the internet changed society a lot. We, fundamentally, as human beings, are doing the same sort of—most of the same jobs broadly that we were doing. We’re still doing the same kinds of activities. The same things drive us.” And so people underestimate, based on, “Well, you know, all it is is a chatbot,” or they underestimate just because we haven’t changed that much in relation to previous technologies.
Ian: Which is a dubious assumption.
Tony: Which is—we’ve changed a lot. But I mean, from my perspective, this is like an order of magnitude bigger for potentially really changing how we interact with each other, our work, and society.
Ian: And as a psychologist, my recommendation is to take that seriously rather than to ignore it. If you think about much smaller-scale changes, we almost never say, “Close your mind to change,” or, “Don’t even think about it; it’s not even that big of a deal.” We say—like for much smaller-scale changes—changing jobs or moving across town, we say, “Look, yeah, there are going to be some consequences, and maybe some of them you won’t like, and maybe there’ll be some new opportunities. But what’s important is that you grow with what’s coming because it’s going to.
Ian: And so you can choose maybe not how good or bad the future is, but you can choose how you approach it. And I think approaching it is better than avoiding it. I think, in terms of what that future will look like—maybe I can respond to your two objections.
The first objection, if I remember right, was that there are a number of things that the machines can’t do very well. So, for example, large language models (LLMs) have this notorious pathology where they just have a hard time comparing—
Tony: Yeah, yeah. One, they hallucinate things that didn’t happen—
Ian: Yeah.
Tony: But it’s not clear to me that they’re any better or worse at that than us. I misremember citations all the time.
Ian: But they can’t do things that a human would generally not get wrong. Like comparing two numbers. If you say, “Is three bigger than two?” You can rephrase that question in a number of different ways, and a human will almost never get them wrong, except when they’re very tired or they’re asked very quickly. LLMs, for whatever set of reasons, have a hard time comparing two numbers.
Tony: And so people say, “Oh, look at that. They’re not even—they can’t even compare two numbers.” Like, “I’m not going to worry about this.”
Ian: “My five-year-old could beat that.”
Tony: “My five-year-old could beat that.”
Ian: But what they’re ignoring is that machine has an intimate knowledge of quantum mechanics. So it’s like, yeah, okay, it colors outside the lines sometimes, but it has more physics in its mind than any physicist alive right now. Whether it can apply that in a thoughtful and helpful way, and whether it makes certain different types of mistakes, is of course up for debate, and we should consider it. But it’s not a reason to think—it’s not a reason to bet against the machine, right?
And so that’s issue number one, right? Yeah, they do make mistakes. And it’s worth considering the fact that a lot of what we need in science, especially science where people’s lives are on the line, is trust, right? We need someone that we can talk to and consult and who we can debate with about what the right choices are to make in a clinical trial. And a machine that can’t compare two numbers, even if it’s really good at quantum mechanics, might just not meet the minimum level of trust that we care about right now, despite its vast knowledge in some domains.
That’s—I don’t want to diminish the importance of that point. But it is also the case that they’re pretty smart.
Your second objection, if I remember right, was like, humanity has always stayed the same. One, I don’t—
Tony: Not my objection.
Ian: But the objection that you’re the interlocutor for.
Tony: Yeah.
Ian: I don’t think that’s true, right? If you look back a mere 20 seconds ago, the United States was dominated by manufacturing jobs. And so, if you asked what the day-to-day experience of the average, typically male American would be, it would be working with your hands and being relatively mobile on your feet throughout the day, and having some grip strength because you’re picking up things and putting them down.
And we transitioned into being a knowledge-focused economy relatively quickly, where we worked with our minds. And we thought—we knowledge workers, you and me—we thought that we were relatively insulated from the automation that took over before, because who could automate thinking? Until we did, right?
So even within the lifetimes of many people alive today, the internet didn’t exist, and then it did, and then it became ubiquitous. And then social media didn’t exist, and then it became ubiquitous. And smartphones—and that’s changed a lot of the way that we live our lives, including how we remain healthy, right? Just the presence of social media and the ubiquity of smartphones, that are relatively effective computers in your pocket all the time, has really changed the attention economy, which is something that we didn’t even used to have to think about because you couldn’t capture people’s attention at will.
Tony: And this is even faster.
Ian: And this is even faster than that, because all those changes happened in the course of history 20 seconds ago. This is two seconds ago.
Tony: And we might—one thing I—this might be a kind of serious rabbit hole, but we’re also not talking in this conversation—maybe we’ll have another one where we do—about neuroengineering, which is that we may also, one other way that we may more fundamentally change is that we may change our biology. With implanted chipsets and things where we can potentially get superhuman abilities that are actually melded into our own biology.
So there may be—and this, and AI is part of that, because it’s part of what enables us to learn what are the—how to mimic brain signals and read them.
Ian: Yeah.
Tony: So anyway, there’s a lot of change happening, and you’re really—I’m with you on this—don’t ignore this.
Ian: Yeah.
Tony: Because I do think a lot of people are.
Ian: Yeah.
Tony: I mean it’s very common for me that people say, “They’re not going to replace x,” or just—
Ian: Therapy is one that people feel relatively safe about. Science is one that, until recently, I felt pretty safe about. I don’t think that sense of safety is proportional to the progress that we’ve actually observed in a short time.
Tony: Is there anything about this that you’ve changed your mind on?
Ian: Yeah. So I think initially my plan was to not use the machine. And my reason was because I wanted to preserve the sort of uniqueness of my ideas.
Tony: In your researching?
Ian: In my own research. You could imagine that something that could happen would be—I talked to the machine about all of my ideas, it gives me feedback, and let’s say that feedback is even good and persuasive. These machines basically find the most probable collection of ideas. And so it might drag me to what’s most likely, or most common. Or it might drag me to the mean of the paper quality distribution, right?
I was worried that maybe that would happen. And probably to some degree it is. But after having worked a little bit, I’ve written now a grant with substantial help from the AI. I have now used it to do a lot of writing tasks. So I ask, when I’m just stuck on maybe two sentences, “Can you give me two or three different versions of this sentence?” And oftentimes it’ll come up with better ones, or at least different ones that I’m too tired to think of late at night.
Tony: And it unsticks you.
Ian: And I think the quality of products that I’ve produced is higher. I’ve used it for idea generation with you. I’ve used it to learn math concepts that I was aware existed but were hard to understand. What I needed was what you’ve described before, which is a very patient teacher and tutor.
I remember with Claude one time a couple of months ago, I was trying to learn something related to Kalman filters and random walks. So a random walk is just a random up-and-down process of a point moving in time. And a Kalman filter is a way of taking recent past behavior and predicting, within a range, where that point’s going to end up.
And I have a master’s degree in statistics. I’m relatively familiar with probability. And I wanted to use it for a project that I was going to work on, which—maybe we’ll be back and talk about it if it’s a success, and maybe we’ll edit this if it never happens—but that was going to be a piece of a project, and I needed to know how that worked.
And so I had derived, by hand, which I am trained to do, the conditional probability of where the next point is supposed to be given where it just was, with some additional noise. Like, maybe you’re not quite certain exactly where it was, but you can be within a ballpark. And I kept being told by the machine, “Listen, bro, you’re leaving out this important random variable that’s this other conditional variable conditioning on noise.” And I was like, “No, you don’t understand. I know that you’re in the early stage as an LLM, and I’m grateful to have you here as a thought partner, but I think—”
Tony: “I’m the superior of the two thought partners.”
Ian: And we went back and forth—I kid you not—I burned a Friday morning on this, when I maybe should have been prepping for another meeting. Because I was just so committed to getting this machine back on track so I could have the part of the discussion that I was planning on having that we ended up not getting to. At the end of that hour, I kept saying, “No, you don’t understand.” And it kept responding, “No, you don’t understand, sir.”
And it was right. At the end. There was this fundamental feature of the relationship between random processes moving in space and what we can say about the future based on what we’ve seen in the past that I was forgetting and leaving out.
Tony: And you really learned through that.
Ian: I did. And that changed the way that I was thinking about that project. And I think if I had been relying on my old assumption, the simulations that I was about to conduct that would confirm or deny for me whether this proof of concept was worth moving forward on, I would have gotten a bunch of null results. A bunch of weak results suggesting that this isn’t a promising idea, that the Kalman filter isn’t finding where the point should be—you should just give up.
And I think I wouldn’t have been able to tell the difference. I think it’s likely—who can say for sure—I think it’s likely that this machine saved that project in a way that it didn’t intend or know about.
Tony: And potentially, like, a good idea that could have gone—
Ian: Right.
Tony: —to waste and not helped people, which—steps away from it. But that’s really interesting.
Ian: Because it was like, “Hey, you missed a minus, you missed a...”
Tony: So you’ve—so that’s a pretty big change in your own, ’cause you felt, you’re saying not too long ago, you were thinking—
Ian: Eighteen months ago, yeah.
Tony: Eighteen months.
Ian: Eighteen months ago, I was pretty resistant. I thought, “This is gonna be like the image generation stuff. I don’t think that if you are a social scientist today, generative models for images should really be threatening you,” right? They’re a useful tool. They’re super cool. If I were in media, I would care more. But I am not threatened by DALL-E. I was never trying to make images. And so the fact that a machine can do it—
Tony: That’s a diffusion model that generates images. D-A-L-L-E.
Ian: I thought text generation would be like that. That it would be, “Look, I’m not trying to write copy. I’m not a journalist. I don’t need to generate a bunch of text quickly. What I need is thoughtful text generated over a long period of time that tells me something about the world.” So I thought it was more dismissible, and I think I was wrong.
Tony: So what does this mean then for you?
Ian: I’ve thought a lot about what the future is going to look like. And I don’t know.
Ian: I don’t know if this change will happen really quickly because these machines get super good. I could also imagine change happening relatively slowly over the course of my career because uptake is so low. It’s hard for me to imagine NIH saying, “Oh yeah, we’re totally on board.” This relatively conservative institution that prides itself on consistency and thoughtfulness, and like making sure that we’re putting our money in the most promising projects—I’m not sure that they’re the kind of organization that’s going to be especially excited about the next 10x scientists. They could be, but you could see the government saying like...
Tony: NIH is the National Institute of Health where we get a lot of funding for our work and funds an incredible amount of health research.
Ian: Yeah.
Tony: I feel more optimistic about NIH being innovation-friendly than that.
Ian: But I don’t see them like—in the way that you could imagine a tech startup saying, “We are going to start a newspaper where there are no journalists—it’s just the machine.” And they might say, “We’re coming guardrails-off because we think even if it makes mistakes, it’s worth it.” I think NIH could be much slower, much more cautious.
Tony: And there are mechanisms that promote more high-risk, high-reward approaches. And there’s also other—I think it’s good for people to know that there are other federal agencies that support very avant-garde research, like the National Science Foundation, which supports a lot of innovative work.
Ian: Or the AFSP.
Tony: The American Foundation for Suicide Prevention—they have a blue sky kind of mechanism.
Ian: Right.
Tony: But—so maybe just—so you’re thinking, “How’s this going to change me over the long term?” How—so knowing what this has achieved and knowing that this will only get better, not likely to get worse, right? It’ll at least get better—we don’t know how much better, or how fast.
Ian: Right.
Tony: What does this change for you, if anything, in the next six months?
Ian: Sure. For me, my intention is to probably use the machine more. I already have conversations with it for idea generation purposes. I already use it for refining existing writing.
I think using it for generating mechanisms for testing my hypotheses—rather than just, “Hey, can you describe this area of the literature that I remember from a couple of years ago, but I’m not up on? Can you remind me what the major findings are there?”—I already do that. I think for me, adding in more components of machine-assisted—not just idea generation but idea testing and proposals for what would make scientific design stronger or good.
One idea that I think is underrated in this paper is I think peer review can just go away now. I think our current peer review system is totally broken. I think it isn’t a good filter for quality. I don’t have the impression when I’m an author submitting a paper that my papers are improved after peer review. I don’t have the impression that when I am a reviewer, my opinion is respected by the authors. It often seems like a war of attrition, where whoever submits or rejects the most times just wins.
And this could make the review process more transparent and consistent. It could make it cheaper, and it could make the products that come out of humans better.
Tony: That’s really interesting. In fact, the paper—this is a quote from the paper: “A fully AI-driven scientific ecosystem, including not only AI-driven researchers but also reviewers, area chairs, and entire conferences.”
It’s funny. I’ve had a different experience with peer review, which does improve the papers that I’ve had. But—
Ian: You’ll have to introduce me to your anonymous reviewers.
Tony: But I totally agree that the system is broken. It’s very difficult to get people to review. It takes a very long time. It’s very difficult to give the time to do it that really would help.
Ian: Which is probably why it’s so dicey sometimes.
Tony: And I can tell you as a reviewer for the National Institute of Health, which is a role I take really seriously, the last time I did reviews, I felt like I had my hands tied behind my back because you’re disallowed from using AI in your review process.
Ian: Yeah. And I want to talk about that.
Tony: And it was really hard. And I felt, “Wow, this is actually making it worse.”
Ian: Yeah.
Tony: I understand that there’s a lot of reasons, and I don’t think—I don’t know if it’ll be like that forever. Some of them have to do with intellectual property, right? Because when you—if I were to upload a review, then that could—
Ian: Yeah.
Tony: Depending upon how your pricing plan and your—terms—it could become part of the training data. But I also think there’s something else to that right now. I think it’s just, “Whoa, we’re not ready for this.”
Ian: Right.
Tony: But I could really see—it seems like, at minimum, there should be a few AIs in every study section.
Ian: Oh, yeah.
Tony: What did you want to talk about with respect to that?
Ian: Can I clarify what my memory of the NIH guidance on the use of AIs is from, I think, maybe 12 months ago? Maybe you can correct me if I remember it differently. I remember reading it carefully, and I remember thinking that either all or a majority of their objections—there was a heavy emphasis on the intellectual property of the person submitting the grant, and that there’s a sort of privacy and terms-of-service violation if you upload a grant to an AI, because then the company that owns Claude (Anthropic) or the company that owns ChatGPT (OpenAI) will then have a copy of that grant, and you’ve given someone else’s material away. Ian: I think that if—the way that I read that was, “This seems like an objection that’s really behind the ball.” And NIH, if you’re out there, I love you, please fund me. I look up to you and I admire you, but I think you made a mistake. By making a near-term objection that’s going to go away in about five minutes. And here’s how.
Tony: Is away already.
Ian: If you have an open-source AI model, which you can download onto your computer and run locally, I am pretty sure that none of the reasons that they listed for why you can’t use an AI apply anymore. And that is an option.
Tony: It’s already true for the enterprise versions of those—
Ian: Oh, do their terms of service say we’re not going to hang on to this anymore? Okay.
Tony: So it’s already true. But I don’t think—I think that was just a stopgap—probably a stopgap reasoning because we’re just not ready. And I wish they had said that.
Ian: Yeah. Which, yeah, I think that would have been fine. We were talking earlier about how like the personal reaction that a lot of people have is to say, “This is just—this is going to go away in the wind,” or, “I’m going to bury my head in the sand. This isn’t a big deal. Nothing’s going to change.” This is what it could look like if an institution does that. Where they say, “Look, we’re pretty sure that since 2006 or 2007, privacy has been the kind of thing that we’re allowed to successfully object to,” related to social media and like the concerns of 10 years ago.
But in this new world, privacy matters, but it’s not the problem. The problem is handing over some agency to a machine that could be more competent than you, but it could also be less competent. And I feel like it’s okay for them to be worried about that, and I would like for them to say it.
Tony: And I can tell you, I believe that there is work going on at NIH—it’s not—
Ian: So you think internally there’s, okay.
Tony: Oh, yes.
Ian: I’m not invited to this meeting, so I don’t know.
Tony: There are people who are way smarter than both of us combined working on this at NIH. But I think it still was unfortunate that the initial statements couldn’t have been a little bit more transparent about the tension. Because it’s a real tension. Are we ready for the science that gets funded by our country for machines to have an input into that? Maybe we’re just not ready for that.
Ian: Yeah.
Tony: And I think that seems like a very legitimate thing to say. We know it would be very useful in most circumstances, but there are also some dangers, and there are things we’re not ready for. So we are disallowing it right now. And that might be where they do get to.
And again, a lot of people are working on AI safety that I don’t know about or understand.
Ian: People—like you said, people smarter than us.
Tony: Yeah.
Ian: At least smarter than me. I’m not so sure yet, but at least smarter than me.
Tony: So the next six months—it sounds like you’re wanting to move into that diagram we looked at before a little bit further from idea generation into maybe more of these kinds of experimentations. What would you—so understanding this as you do and having taken in this paper—what about somebody who is in the midst of starting a research career and doesn’t really know where to start? Just knows... So it could either just—because I think just saying, “They’re coming,” doesn’t necessarily help. Because then it’s just like paralyzing or frightening. But not really where you encouraged us to go, which is an approach-focused coping, or deal with change by approaching it, not avoiding it.
Based on your own life, people that you’re working with, peers that you’re working with, I’m just curious if you have any ideas about where a person who doesn’t know, who doesn’t have the level of computational knowledge that you have, statistical—where would they start?
Ian: Sure. So I used to teach an R programming class. So most of the—I don’t teach anymore, but most of the teaching that I used to do was focused on statistics. And there’s a really common statistics program that a lot of people will learn on, which is SPSS. It looks a lot like Excel, and there’s a lot of pointing and clicking.
Because it’s what most psychologists and many other social scientists learn on, they become comfortable with it, and they kind of know how everything works. So even if they’re not statistics experts, they feel safe in that ecosystem. It has some limitations: A, it costs money; and B, it’s hard to reproduce your analyses because it’s hard to save pointing and clicking buttons.
So one of the courses that I’ve taught more recently is how to start doing statistics with code. And there’s a programming language for that called R. It’s free, it’s easier to reproduce your analysis, so there’s more transparency—you can upload your code. There are lots of reasons to use it. Tony: Okay.
Ian: And what people would say is, “Yeah, but I’m just like not very good at it. I agree with you that it’s better, but to do the same thing, I have to spend two or three times as much effort. And actually, there are a lot of things that I don’t know how to do, and maybe I could learn, but I don’t have a way of knowing whether it would be worth it for this project.”
The answer that I gave them is also what I did for myself with R and then with ChatGPT and what I use for any kind of new technology: with each new project or task, I set a time limit. And I say, “I’m going to try to do the first 30 minutes of this in R, or in the new program.”
And if, at the end of that 30 minutes, I feel like I’m really not making a lot of progress, I’m going to stop. If at the end of that 30 minutes I feel like, “Yeah, I’m behind where I would be if I were using the more familiar program, but I do feel like I can keep doing this for a little while longer,” I set the timer longer. And it’ll probably take two or three projects in R, or in ChatGPT or Claude, before you can go all the way from start to finish.
So there’s a learning curve and there’s some loss. You’re going to pay a cost in terms of efficiency for trying to do something new at first. But this is a way of managing your own fear, right? If you’re worried that, “Look, this might be a wild goose chase,” it’ll maybe still be a goose chase, but you can limit how wild it gets.
And I think what that would look like for early career scientists looking into ChatGPT or Claude or Gemini from Google would be: just go to the machine and start typing something about your project and get a sense of what its reactions are like, and set a timer. “I’m going to do this for five prompts; if I don’t think it’s given me some good answers after five questions, I’m going to set it down until the next project where I’ll try again. If it does seem like we’re getting somewhere, even if it’s slowly, maybe I’ll extend the timer a little bit longer.” And that’s a way of balancing the cost-benefit tradeoff as well as managing your own kind of anxiety about getting taken for a ride or losing out on the efficiency that you already had.
Tony: Yeah, that’s a really good framework. Helpful.
I’ll tell you one way that I’ve done a lot of learning is—because the rules, even at our university, are not clear about how much people should be using these. Eventually, I think our university will be getting an enterprise with ChatGPT. My son’s university, the University of Michigan, already provides that to all students.
Ian: Oh, and so that starts to manage the privacy concerns too. Okay.
Tony: But—and just that, “Okay, you now have the endorsement of the university.” Because none of us want to do something wrong.
Ian: Agreed.
Tony: But we need to also continue learning. So one of the ways I think is just using it in my personal life.
Ian: Yeah.
Tony: I found that to be a great way of learning. So I spent more time than I really needed to—for example, I fed one of the LLMs all of our service records for one of our cars. And I wanted it to—because we were wondering when we last replaced the brakes.
Ian: Oh, sure.
Tony: Now, I probably could have just—I had them all scanned, and I could scroll through them.
Ian: So you had a bunch of PDFs already.
Tony: A bunch of, yeah. Thankfully, I also did the same thing with paperless technologies years ago. But—so I fed it a bunch of PDFs and asked it to create a table of all the services, their dates, and to bold those that involved brakes. And it didn’t work at first. But I decided that this would be a way of—pretty low-stakes thing. If I really needed it, I could just look for it. And I knew I wasn’t violating anything at work. I wasn’t going to get in trouble for it or something. But rather, this would—and yeah, I learned some things through that.
I learned how to talk to it and just the full range of things you can ask it.
Ian: Yeah. Which is anything you can write down.
Tony: Yeah. Yeah. And it made me smarter, because I realized that I didn’t really understand what I was trying to say. I think this is what people who do a lot of computer programming say happens to you. When you have to explain something to a machine, it clarifies your own thinking.
Ian: One of the things I like about programming.
Tony: I think that could be another way to begin moving toward this. And I guess maybe a third way would be for us to have conversations about that.
Ian: Yeah.
Tony: Even though, in some ways, they’ll be scary. Because it may be that something that you’re doing right now today, you realize, “Oh, maybe I wouldn’t have done it this way before.” And that—that’s a little threatening to feel. But I think ultimately, like you’ve so helpfully said, with each new technology—or any change—if we can move towards it...
Ian: Or at least be open to it.
Tony: At least be open to it. And as a psychologist, as you said, if we talk about like a job change as being something important that we should talk about, how about the entire society changing? Let’s make sure that we stay open and talking about this, and be on the lookout, I think, for some of the weaker arguments or dismissing...
Ian: Yeah.
Tony: ...that we’ve talked about here today.
Ian: Yeah.
Tony: So, yeah.
Ian: Yeah, I think that’s right. I think things tend to go better from a debate perspective when you try and think of the best versions of the arguments that you’re working against, right? So maybe that’s someone that you’re debating against. Maybe it’s a view of the world that you’re worried about. You tend to make better decisions when you base your decisions on what the best counterarguments would be, even if they’re not the counterarguments that you’ve received. Some people call that steel manning. And I think spending more—
Tony: Steel manning as opposed to straw manning.
Ian: As opposed to straw manning, yeah. So the straw man is an artificially weak version of an argument. A steel man would be like, “If you make an argument to me and I say, ‘I disagree,’ but a slightly better version of that I would really have to deal with would be this little refinement.” And if you try to base your decisions on what the steel man position would be, you tend to make steel man positions yourself.
There’s a lot of straw manning being done to the computer science community. And if we spent more time steel manning them, we might come up with better psychological reactions to these technological changes.
Tony: That makes a lot of sense. And I think that’s probably true in both directions. We’re underestimating the safety issues, ethical issues, the dangers involved, because we’re maybe putting up straw men and then dismissing the dangers.
Ian: Right.
Tony: And maybe some of the possibilities for really—yeah, for helping people, for advancing our work, and potentially for us to flourish even more.
Ian: Yeah.
Tony: Well, this kind of brings us full circle, doesn’t it? So I really appreciate your time and the precision that you bring to these kinds of discussions. And I’m sure we’ll continue learning more about the role of AI in science in general and for us in prevention science. Yeah, I’m just really grateful that you spent this time with me.
Ian: Thank you. Thank you for having me. It’s been great.
Tony: Thanks.