In this Digital CxO Leadership Insights series video, Mike Vizard speaks with Matteo Manica, an IBM Research scientist, about how an open source Generative Toolkit for Scientific Discovery (GT4SD) will accelerate the development of artificial intelligence (AI) models.
Mike Vizard: Hello, and welcome to the latest edition of the digital CXO video cast. I’m your host, Mike Vizard. Today we’re with Matteo Manica. He’s an IBM Research Scientist, and we’re going to be talking about something called generative toolkits and AI.
Matteo, welcome to the show.
Matteo Manica: Hi Michael. Hi.
Mike Vizard: I’m not sure that everybody knows what a generative toolkit is, so maybe you could explain that to us a little bit, and where it fits in the whole AI lexicon.
Matteo Manica: Sure. Sure. Thanks, Michael. So the generative toolkit for scientific discovery is an open source library that we’ve been building and I’ve been researching the last year. And it’s a library that you can use to accelerate hypothesis generational scientific discovery. And this is done by leveraging an analytic technology called generative models that you can use to basically create novel models and design, create novel ideas or other novel models in general just using these generative models.
Mike Vizard: And this is an open source toolkit, so anybody can access this thing, and how big is the community?
Matteo Manica: Exactly. It’s completely open. We just opened sourced it last week, so the community is _____ phase, but we already have more than 2,000 downloads as we’re speaking right now, and the goal is to be the communicative researchers that could collaborate on this toolkit and help us to create, and create without these new technology stacks for accelerating discovery.
Mike Vizard: Is this toolkit aimed at specific use cases, like say life sciences or can it be more broadly applied? Because you know, there’s a lot of science in the world. So just how far can it go?
Matteo Manica: Yeah. No, exactly. The toolkit was really developed since day zero in the most generic way, so we really had in mind to build our general purpose toolkit for accelerating science in different forms. Obviously we started from what we know best. And in IBM research in the last I would say three to four years, we have had a lot of publications on generative models for small molecules where most applications were on learning discovery, and market of science.
So example of items that already are included in the toolkit and that are available to use are items that can define new molecules that can target specific proteins, specific genetic _____ or can generate molecules that satisfy certain properties such as binding energies or distance from an existing molecule.
Mike Vizard: I think part of the issue is is maybe we underestimate what it takes to build an AI model these days and some are more complex than others, so do you think that through toolkits like this the whole process is going to be sped up? And in some ways are researchers kind of just reinventing the same wheel over and over again, and now we’re getting better at identifying patterns?
Matteo Manica: You are really hitting the right points. We decided to create this library, especially to ease and lower the barrier – the access barrier to these generative models. As you said, there are a lot of super nice libraries that you can use to build your own machine learning models nowadays, but some models are more complex than others. And generative models, especially in the context of materials, are not so easy to train because they might require experience on the subject or knowing which architectures are working better than others for the problem.
The _____ _____, the toolkit comes into play to lower the production barrier and ease the training, the usage of these models in different applications. So as I said, we have some initial models that we made available in the toolkit including some models that have not been developed back in initial research, so we really had this idea of having a sort of container that gives you access to a wide variety of generative models, and eases the process of explaining them, using them for reference, and essentially to accelerate your own discovery by applying.
Mike Vizard: Do you think that maybe we have an unrealistic expectation of AI these days? Because I mean, everybody’s talking about it, but it takes quite an effort to kind of build an AI model, train it, and then there’s drift. It’s not simple by any stretch. And maybe you’re suggesting that I don’t have to be a rocket scientist to get the toolkit working, but it does require a level of expertise. So what’s a reasonable expectation for C-level executives when it comes to AI these days?
Matteo Manica: That’s a very good question that would require a very complex and elaborate answer, but I will do my best to try to do that. So there are reasonable expectations, depending especially on the field. And I would say that the applications of generative models, or AI more in general to material science or simple discovery is not an exception.
I think that what we could realistically do and what we worked to do with the toolkit was to basically provide what is the current state of the art of the researching in AI applications to material design, and make this technology consumable by researchers and the professionals that are actually working with this model on a daily basis. Because what we have learned over the years and what we’re seeing in a lot of different fields of AI is that there’s no _____ solution, and there’s no model that can be used in a completely safe way assuming that there are no bias and it could be working in any situation. So tools or libraries like the toolkit, I think they’re fundamental because they come into play and they let you interact with the models in a more seamless fashion, and they basically empower the users to access these complex tools that are very powerful, but always need a sort of weak supervision by a human that knows how to put them to use for these particular applications.
Mike Vizard: Is this kind of like the AI version of the notion of Belfast that we’ve seen in digital business, where the sooner I get into something and the sooner I understand how it may work or may not work, the better off I am getting out of it, because a lot of times I may be going down a rabbit hole and it just doesn’t pan out.
Matteo Manica: Exactly. That’s obviously another pretty important aspect. Adding access to a lot of different models that are being developed by different researchers around the world will allow you also to fail fast. In case you are going to or chasing technology that is not fitting the purpose that you have in your research or in your industrial application. It really depends on which type of user of the toolkit you are.
Mike Vizard: One of the things we hear a lot about is the last mile with AI and the integration with the applications themselves, and sometimes it’s difficult to get the AI model and the inference engine aligned up with the application as it’s being rolled out; they’re kind of often in different release patterns and different tools. So you have any thoughts about how to bring say something like DevOps processes and MO ops together in a way that you know, seems to me this is the next big challenge.
Matteo Manica: Yes, and I completely agree there are tools that have helped our solving or mitigating this issue as of now. And I would say that digital is also in its own way and for a specific application domain, that if it’s material science or scientific discovery is trying to bridge this gap. So implementing a model in the framework allows you to use it in a consistent and coherent way with all the other models, and it allows you to consume these models specifically in an _____.
The main idea of adding an integrative library is really to have a consistent way to interface with an API that then allows you to code the models in a seamless fashion. And I think that the digital _____ and all the tools that are coming out in the recent past are actually going in this direction. Try to basically satisfy this urge of having a model train from a proof of concept that works and that is a very nice piece of research in bringing it to production as fast as possible.
Mike Vizard: All right, so last question. What’s your best advice to folks about how to put this all together? How should I put the teams together? The business leaders together? Because I can’t just all put them in a room and lock the door and hope for the best. Right?
Matteo Manica: No. Sure. It’s impossible. So I can speak maybe most for the technical leaders and not business leaders, so I’m really biased in coming from a specific part of the spectrum, but I think that the key here is to really – and it’s also what we’re bringing IBM research– is to really go all out and fully open up the technology we build. And I think that creating communities of users that are onboarded early in our technology that we are constructing together is the key to success. This is why with digital _____ _____ we decided to basically open up the refresher we do in IBM completely, and we are doing this effort of building a very open community with the idea that in this way we can be a sort of catalyst to accelerate the discovery and the development of these new technologies that can really accelerate the material science industry. But this doesn’t apply only to material science or to the chemical industry. Ever since the beginning the toolkit is really something that we built with the idea of creating a library that could correlate different application domains and this also is what we would like to see. To build a community that is putting together a lot of users and stake holders with different levels of experience, coming from different application domains that are not solely developers or researchers, but also professionals that want to view generative models to accelerate their business.
And I think that the key, instead of blocking the business people and the technical people together in the same room, probably the key is really to open a door of their own and let them go all together in the yard, then to discuss openly and share different development experience and different pieces of technology.
Mike Vizard: All right. Set them all free and something interesting will happen.
Matteo Manica: Exactly.
Mike Vizard: Matteo, thanks for being on the show.
Matteo Manica: Thanks. Thanks a lot for the invitation.
Mike Vizard: All right, and thank you all for watching the latest episode of our video cast. You can find this episode and others on digitalcxo.com. We invite you to check them all out, and thank you for spending some time with us today.