Capital One’s Data Insights Chief on Responsible ML in Finance

Dave Kang talks about partnerships, use cases and best practices for deploying machine learning

Ben Wodecki

December 15, 2022

14 Min Read

Applying AI in finance requires businesses to not only ensure responsible development and deployments, but also the security of user data and information.

According to a recent survey from Capital One and Forrester, businesses face difficulties over transparency, traceability and explainability of data flows when they try to deploy and scale ML.

There is a danger of algorithmic systems showing bias – say, refusing loans to applicants from certain zip codes − due to being trained on historical data that contains prejudices perpetuated by humans.

AI Business spoke with Dave Kang, senior vice president and head of data insights at Capital One, to talk about the report’s findings and using ML responsibly in finance.

Kang takes a deep dive into the responses found in the survey and reflects on the work his team undertakes to provide best practices for digital transformation and ML deployments.

The following is an edited transcript of that conversation. You can listen to the full chat in the latest episode of the AI Business Podcast below, or wherever you get your podcasts.

AI Business: Can you outline Capital One’s technology transformation and how it is enabling machine learning today?

Dave Kang: The data insights team consists of a cross-functional group of data scientists, software developers, machine learning engineers and product specialists. And we have been creating an internal machine learning platform that provides Capital One's associates with governed access to ML algorithms, components and infrastructure for reuse. Our homegrown and open source machine learning algorithms can detect anomalies and trends, and run root cause analysis, among other capabilities. This platform that we're building is allowing us to democratize ML across the enterprise.

Related:Survey: Data decision-makers find AI explainability ‘challenging’

We've been undergoing a data transformation that's been a decade in the making. That includes everything from getting better on metadata, as well as data lineage, and getting the infrastructure all set up so that we can have the stage properly set for us to exploit and take the greatest advantage possible we can have in machine learning capabilities.

AI Business: How far back has the Data Insights team’s work and wider digital transformation been going on at Capital One?

Kang: This has been a transformation that's been a decade in the making. And it's one of the reasons that I joined Capital One back in 2011. And before that, there was a recognition on the part of our founder that going native on cloud, going open source, and undergoing a technology transformation from the bottom of the stack all the way to the top was going to be an investment that would have a real payoff.

The data insights team is an organic offshoot of that decade-long transformation. Our team is relatively young, … formally constituted into a data insights team for about a year now. But it’s the culmination of grassroots efforts on the part of some very talented machine learning engineers who were noticing that as we were implementing AI and ML algorithms in Capital One, there were a lot of commonalities to operationalizing those ML capabilities.

And they said, ‘Well, if we're going to build this, let's build it so it can be reused and contributed to and built on, which are the fundamental tenants of a platform.’ The formation of the data insights team is the culmination of a very organic and grassroots effort.

AI Business: Is this a worldwide team? Or are these solely U.S.-focused efforts?

Kang: Capital One is a global company. Although our operations are primarily focused in the U.S., we also expand operations in Canada and the U.K. And we also have staff that are in Asia, as well as South America too. What we do serves Capital One as an enterprise overall, and it is transactional.

AI Business: Your team is working on some use cases around third-party fraud. Can you talk a bit about this?

Kang: I view fraud as one of the places where our machine learning capabilities are being put to the test. This is not something where you've got a static dataset and you run an ML algorithm, and then an analyst picks up something ‘interesting’ and implements a series of defenses.

You have thousands of transactions happening every minute, so there is a real demand for real-time streaming use cases, and the quantity of data that needs to be processed. Rules need to then be created in real-time to protect our customers and also to protect our financial institutions, as well as our merchants − these need to all be implemented in a very automated way.

Our use case for credit card fraud defense, where we are using homegrown as well as open source machine learning algorithms hosted by our shared platform, detects anomalies and automatically creates defenses for fraud.

AI Business: What kinds of data are you using to build this? Is it historical data?

Kang: Every single machine learning model is only as good as the data that you train it on. We have been in the credit card business for a long time and we're able to go back a significant amount of time to be able to train datasets. But a lot of what is informing our ML models is what is being signaled to us by customers themselves.

We have digital as well as voice channels for customers to access us whether or not they actually conducted a transaction on their credit card. And our agents, as well as our automated systems, are taking those flags and signals and informing our ML algorithms. There is a lot of song and dance that happens here where we are processing both historical as well as real-time data alike to inform our fraud defenses.

AI Business: Aside from fraud, what other ML use cases can you share?

Kang: Machine learning is happening everywhere you look. Even when you go onto our website, the homepage experience that I, Dave, experience whenever I go to Capital One will be slightly different from the homepage experience AI Business experiences. That is because we are using ML algorithms to understand the profile of this person who is coming to our website. Have they visited before? Are they an existing customer? What do we think is the right thing to put in front of them − be it a capability that we have that you might not be using as an existing card member or a product that we have to offer you? There is a tremendous amount (of options) in the space of commerce.

Also, we're using ML solutions to power things like virtual card numbers. This is a solution that detects a payment checkout page anytime you're browsing the web. It is a plugin on your Google Chrome browser, for instance, and allows you to enter a virtual card number that's specific to the merchant that looks and acts just like a real card but cannot be traced back to your actual … physical credit card.

AI Business: Are you working on any customer-facing offerings that would encompass ML?

Kang: One thing is our Eno digital assistant, which is an AI-powered Chatbot. It uses natural language processing to assist customers with account management. It can check your balances, pay a bill for you, activate a card, update your personal information, among other things. Rather than having to pick up a phone and talk to an agent, you can chat with our chatbot and get a lot of things done.

AI Business: Chatbots have been around in finance for a long time. But here was a recent survey that found humans prefer interacting with other humans other automated systems. How do you go about easing their concerns when building such solutions?

Kang: It has been interesting. I have been seeing articles about AI-generated artwork, and that has the art community up in arms about whether or not the artwork can be counted as legitimate. I do not think we have any pretense of making people think that interacting with a chatbot can be just as personal and just as effective an experience. It's not meant to capture those use cases where somebody has a very specific and individualized problem.

But we have implemented Eno in such a way that if there are routine things, we can take that burden off of our call center agents and make it easier for customers to get the answers they need more quickly. To the extent that they have inquiries that are more complicated, we make sure it is providing an off-boarding point for folks to then initiate that conversation with an agent. We really believe in just making sure that we're getting the right answer into the customer’s hands as quickly as we can.

AI Business: Are there best practices for democratizing machine learning across the enterprise?

Kang: We wanted to make sure that whatever we had as takeaways were not just specific to the journey that we have been on here in Capital One. So we recently commissioned a study with Forrester about operationalizing machine learning. We asked 150 data management decision-makers what some of the biggest challenges are around making ML work for them.

Seventy-three percent of respondents cited transparency, traceability and explainability of data flows as a key issue. In some ways, this is not surprising, data is hard. Having clean data with good metadata you can trust is also hard. But it is interesting that was the thing that came up.

When we were starting to ask questions about ML, I was expecting to see some stuff about getting the algorithms trained and making sure that there is no overfit. Those sorts of things were cited as problems, but it was a far more fundamental thing about foundational things.

The things that we have built in terms of implementing traceability, transparency, explainability, and making sure that we have a well-structured data infrastructure to support the very intensive needs of AI and ML, is breaking down data silos. Overcoming those obstacles to get to the point that you can actually deploy and operationalize ML is paying off.

AI Business: In that study, having diverse datasets ranked lower by respondents despite being a major talking point by practitioners at the recent AI Summit in Austin. How big a problem do you see the need for having diverse datasets?

Kang: What we have uncovered through this study and our own conversations with colleagues is that we are still learning to crawl before walking or even running when it comes to AI and ML. Even just getting the data staged and engineered to test in a sandbox environment is a challenge.

I know from speaking to other practitioners that a good deal of data scientists’ time is not spent doing data science, it is actually spent doing data engineering. In our study, we were trying to ask questions about different models and different challenges. The answers we got back were more akin to, ‘Well, I haven't even figured out my data environment and gotten that thing cleared up yet.’

AI Business:The need for diverse datasets is important in the finance sector since historical, unbalanced data could introduce biases. How important is it to get this right?

Kang: We are very much guided by a mission to build and deploy AI and machine learning in a responsible, well-managed way that puts our people first. Leveraging these incredible capabilities to make sure that we are looking out for our customer’s well-being, helping them become more financially empowered and better manage their spending in a highly regulated environment is of utmost importance to us. That means (implementing) structured processes, protocols, risk communication, model governance, peer reviews, and unbiased, closely monitored processes across our ML work.

There is great potential to deploy ML and AI in a financial service setting. The stakes are very high. There are tons of transactions happening every second and there is a lot of rich data that can be taken into account that can power very exciting outcomes from ML and AI deployments.

There is this adage from Spider-Man (that rings true): ‘With great power comes great responsibility.’ If a financial institution is not feeling comfortable from a governance standpoint, and they are not bringing the regulators along, I would shy away from going too far because there are many instances we have observed in perhaps more mundane settings where the end of ML has just gone the wrong way. There is a lot to watch out for.

AI Business: Is putting safeguards in place and being responsible the biggest challenge for financial services firms in terms of deploying ML?

Kang: In addition to keeping a close eye on the regulatory environment and ensuring that AI and ML are being deployed with fairness, a lot of what I have thought about in the financial services sector has gone back to the general importance of standardizing tools, standardizing processes and standardizing platforms.

Not only ensuring that we have clean, explainable approaches, but also enabling our data scientists and engineers to more easily identify access data and build upon very solid foundations to deploy their ML models.

Common platforms can help store model training and execution information like parameters and outcomes in repeatable and searchable ways so that models can be more easily audited and reproduced. And also, so people have the opportunity to stand on the shoulders of others, and really build community and better improve the performance of their actual machine learning capabilities.

AI Business: In terms of governance, a lot of regulations coming out, such as the EU’s AI Act, are quite broad. But for a heavily regulated sector like financial services, do you feel it is better to have an industry-specific piece of governance?

Kang: On Capitol One’s part, we are trying to go above and beyond what the letter of the law says and to deploy responsible AI at scale across our enterprise. We have been focused on making sure that we are implementing AI responsibly. And that means building responsible AI into the tools we use and working to develop a framework for embedding responsible AI models into all machine learning development across our company.

We have a model risk office that ensures our algorithms are reviewed and well-governed before we actually put them into implementation. And we have human-centered processes built into every step of implementing ML and AI. When we build products, systems and solutions, we are doing so with the customer at the center of our design-thinking. We integrate this model risk office with our data scientists, product developers and many others to inform how we build and select models.

AI Business: There is a skills shortage in AI and data. How do companies keep up with their responsible AI work if they're understaffed in terms of trying to build an ML team, for example?

Kang: One of the things that came through from our Forrester study is that a majority of participants are looking to get outside help. Following that, we say seek external partners to help you on your AI and machine learning journey. But make sure you are finding partners that have been through the thick and thin of it, and actually done that themselves.

At the same time, we find it to be very important if you are going to participate in a community that is on the cutting edge of practitioners to be contributing to that community as well. We have open sourced some ML solutions, such as our data profiler, which can monitor Big Data and detect private customer information so that it can be protected, and provides a pre-trained deep learning model that efficiently identifies sensitive information and generates statistics with an infrastructure to build data labelers. It is about the community aspect of things and seeking the right partners.

AI Business: The Forrester report found that some respondents had not yet struck partnerships. What would you say to those that haven't partnered yet, but are kind of considering it?

Kang: I will intersect that with one of the other points that came through in the study itself, which is, ‘let's not make machine learning or AI a shinier object than it already is.’ We’re starting to get to a point where boards of directors, investors and management teams want ML and AI to ensure that it is driving demonstrated business value, and not just have it for the sake of it.

That means we should be centering our search for partners around the business use cases that we have, and the business impact that we want to have. Think about exactly the impact that you want to have, be it from a financial standpoint, from a customer experience standpoint, from an efficiency standpoint, and seek out a partner that has bona fides along those lines, not necessarily one that just has general capabilities that they can say they have demonstrated in ML.

About the Author(s)

Ben Wodecki

Assistant Editor

Stay Ahead of the Curve
Get the latest news, insights and real-world applications from the AI Business newsletter

You May Also Like