Unveiling the Challenges: Why Large Language Models Struggle with Complex Financial Data

Explore the challenges of Large Language Models (LLMs) in finance. Uncover innovations, limitations, and the path to a collaborative future.

Ankur A. Patel

and

Ishita Jaiswal

Jan 23, 2024

LLMs in Finance: Large Language Models (LLMs) have entered various industries, including finance, with promises of enhancing customer service, facilitating research, and providing advanced financial analysis.
Challenges in Financial Applications: Despite their broad capabilities, existing LLMs struggle with specialized tasks in finance, such as analyzing financial documents, leading to inaccuracies and the need for customized solutions by financial institutions.
Positive Examples: Early successes in finance include automated financial reporting, enhanced customer interaction through chatbots, and accelerated research processes, showcasing the adaptability and potential impact of LLMs.
Challenges and Limitations: Patronus AI's study on GPT-4-Turbo revealed challenges in handling financial data, including low accuracy rates, high refusal rates, and occasional "hallucinations," emphasizing the importance of a human-in-the-loop approach.
Innovations in Finance-Specific LLMs: JP Morgan's DocLLM, designed for visually complex financial documents, stands out for its integration of layout information, addressing challenges in categorization, precision, and automated handling of diverse document types within the financial sector.

This post is sponsored by Multimodal. Multimodal builds custom GenAI agents to automate your most complex workflows. Here’s the truth: for basic automation tasks, you’re better off just using existing off-the-shelf solutions – they’re cheaper and honestly good enough. But if your workflows require human-level understanding and reasoning, they just don’t work. There’s no one-size-fits-all solution for automating complex knowledge work.

That’s why Multimodal builds AI agents directly on your internal data and customizes them specifically for your exact workflows. Multimodal also deploys their agents directly on your cloud, integrating with the rest of your tech stack. Their goal is simple: eliminate complexity, so you can focus on driving the business forward.

In the past two years, we have seen a lot of traction in the large language models field. LLMs have entered all industries and businesses to make them more efficient and drive higher revenues and speedy growth. One after another, OpenAI, Meta, Google, and several smaller AI companies have designed advanced LLMs excellent at multimodal tasks. But these LLMs are really not great when it comes to highly specialized tasks.

Among others, large language models are not excellent at analyzing financial documents, healthcare records, and other complex, unstructured data. Most such analysis is replete with inaccuracies and hallucinations. As a result, leading financial institutions and consulting firms have started developing their customized LLMs or extremely fine-tuning/personalizing existing ones.

In this article, we will dive deeper into why large language models are inadequate at best for hyper-specialized applications such as finance and what alternatives exist.

The Promise of LLMs in Finance

Large Language Models (LLMs) have emerged as powerful tools with the potential to revolutionize various industries, and finance is no exception. The integration of LLMs in finance holds the promise of enhancing customer service, streamlining research processes, and facilitating in-depth financial analysis.

Potential Applications in Finance

1. Customer Service Enhancement: LLMs are envisioned to play a crucial role in improving customer service within the financial sector. These models can be deployed to automate responses to customer queries, providing quick and accurate information regarding account details, transaction history, and general financial inquiries.

2. Research Facilitation: The ability of LLMs to comprehend and generate human-like text enables them to assist financial professionals in conducting research. These models can sift through vast amounts of data, extracting relevant information from diverse sources, and summarizing it in a comprehensible manner. This capability holds the potential to significantly accelerate research processes in finance.

3. Advanced Financial Analysis: LLMs are expected to contribute to advanced financial analysis by processing complex financial data and generating insights. These models can assist in tasks such as trend analysis, risk assessment, and forecasting, providing valuable support to financial analysts and decision-makers.

Overview of Financial Tasks LLMs are Expected to Perform

To fully grasp the potential impact of LLMs in finance, it's essential to understand the breadth of financial tasks these models are anticipated to perform. These tasks include:

1. Data Extraction from Documents: LLMs are expected to extract relevant information from various financial documents, such as SEC filings, paystubs, and receipts. This feature is particularly important for automating data entry processes and ensuring accuracy in financial record-keeping.

2. Natural Language Understanding in Queries: The natural language processing capabilities of LLMs enable them to understand and respond to queries in a manner that mimics human interaction. This is especially valuable in scenarios where financial professionals seek specific information or insights from the models.

3. Contextual Financial Narratives: LLMs can process and generate contextual financial narratives, aiding in the interpretation of complex financial data. This is crucial for producing reports, summaries, and analyses that are not only accurate but also easily understandable for a broader audience.

Positive Examples: Initial Successes and Applications in the Finance Industry

The finance industry has witnessed initial successes in integrating LLMs into various workflows. These positive examples showcase the adaptability and potential impact of these models:

1. Automated Financial Reporting: Some financial institutions have successfully implemented LLMs for automating the generation of financial reports. This includes summarizing key financial metrics, identifying trends, and providing insights derived from large datasets.

2. Enhanced Customer Interaction: Early adopters have employed LLMs to enhance customer interaction by creating chatbots capable of understanding and responding to customer inquiries. This not only improves customer satisfaction but also allows financial institutions to handle a higher volume of queries efficiently.

3. Accelerated Research Processes: In research-intensive domains within finance, LLMs have demonstrated their ability to expedite data analysis and information synthesis. This acceleration in research processes is a testament to the efficiency gains achievable through LLM integration.

As we explore the promises of LLMs in finance, it becomes evident that these models have the potential to reshape how financial tasks are approached and executed. However, it's crucial to delve deeper into the challenges faced by LLMs when dealing with intricate financial documents, such as SEC filings, to fully comprehend the limitations that currently hinder their seamless integration into the finance industry.

Challenges of Using LLMs in Finance: Patronus AI's Study

Patronus AI conducted a comprehensive study assessing the performance of GPT-4-Turbo in handling financial data, particularly in the context of Securities and Exchange Commission (SEC) filings. The findings shed light on the challenges faced by large language models (LLMs) when dealing with complex financial documents.

Unveiling the Numbers: Even the most advanced configuration of GPT-4-Turbo faced significant hurdles, scoring only 79% accuracy when tested on a set of questions derived from SEC filings. Notably, this evaluation included scenarios where the model had access to almost the entire filing, and yet, the performance fell short of expectations.

Unacceptable Rates: Anand Kannappan, co-founder of Patronus AI, expressed concern over the performance rate, deeming it "absolutely unacceptable." The study revealed instances where the LLMs not only failed to provide accurate answers but also exhibited a reluctance to respond or, in some cases, generated inaccurate information.

Challenges in Automation: The study underscored the challenges in automating financial processes using LLMs, especially in regulated industries like finance. The need for high accuracy in extracting important financial information and providing reliable results poses a considerable hurdle for these models.

Refusal Rates and Hallucinations: One notable aspect was the high refusal rate observed in LLMs, even when the answer was within the context. Additionally, the phenomenon of "hallucinations" emerged, where the models would generate information not present in the SEC filings, raising concerns about the reliability of their outputs.

Human in the Loop: Despite the setbacks, the co-founders of Patronus AI acknowledged the potential of LLMs in the finance industry. However, they emphasized the necessity of a human-in-the-loop approach to guide and support the workflow, particularly in scenarios where precision is paramount.

The Manual Evaluation Challenge: Rebecca Qian, co-founder of Patronus AI, highlighted the current manual nature of evaluation, referring to it as "testing by inspection." The study aimed to set a "minimum performance standard" for language AI in the financial sector through a rigorous evaluation process.

In light of these findings, the study calls attention to the limitations of existing LLMs in handling complex financial data and emphasizes the imperative of continuous improvement for the successful integration of AI in the finance industry.

Understanding the Limitations

Large Language Models (LLMs) have showcased remarkable capabilities across various domains, but their integration into complex financial processes has brought forth inherent limitations that demand careful consideration.

A. Nondeterministic Nature of LLMs: One fundamental challenge lies in the nondeterministic nature of LLMs. Unlike deterministic systems that produce the same output for a given input, LLMs may exhibit variability in their responses to the same input. This unpredictability introduces challenges when aiming for consistent and reliable outcomes, especially in applications where precision is paramount.

B. The Challenge of Extracting Precise Information from Financial Documents: The complexity of financial documents, such as Securities and Exchange Commission (SEC) filings, poses a significant hurdle for LLMs.

These documents often contain intricate details, numerical data, and nuanced language that demand a high level of accuracy for meaningful interpretation. Extracting precise information requires not only language comprehension but also an understanding of financial intricacies, making it a multifaceted challenge for LLMs.

C. Importance of Accuracy in Finance – Implications of Even Minor Errors: In the realm of finance, accuracy holds immense importance. Even minor errors in the interpretation of financial data can lead to significant consequences. Inaccurate insights from LLMs could potentially impact investment decisions, financial analyses, and regulatory compliance.

The study conducted by Patronus AI highlighted the unacceptable nature of low accuracy rates, emphasizing the need for precise and reliable outcomes in the financial domain.

As financial institutions and industries seek to automate LLM processes, the identified limitations become crucial considerations. The study on GPT-4-Turbo and other financial-specific LLMs underscores the challenges in achieving automation without compromising accuracy. The non-deterministic nature of LLMs and their propensity for inaccuracies necessitate a cautious approach in deploying them for tasks that demand a high degree of precision.

Acknowledging these limitations is not a dismissal of the potential of LLMs in finance but rather a call for continued research, development, and refinement. Striking a balance between the power of language models and the exacting demands of financial processes remains a key objective for researchers and practitioners alike.

Testing LLMs in the Financial Sector

To gauge the performance of Large Language Models (LLMs) in the intricate domain of finance, Patronus AI introduced FinanceBench – an evaluation tool designed to subject LLMs to a battery of tests based on over 10,000 questions and answers drawn from Securities and Exchange Commission (SEC) filings of major publicly traded companies.

This comprehensive dataset, including correct answers and their locations within filings, serves as a litmus test for the language models' efficacy in handling real-world financial queries.

Analysis of Different Language Models: Patronus AI conducted a meticulous examination of four prominent language models: OpenAI's GPT-4 and GPT-4-Turbo, Meta's Llama 2, and Anthropic's Claude 2. Each model underwent testing using a subset of 150 questions from FinanceBench, allowing for a comparative analysis of their capabilities in extracting accurate financial information.

GPT-4 and GPT-4-Turbo: These models, representing OpenAI's cutting-edge technology, faced scrutiny in various configurations, including an "Oracle" mode where the exact relevant source text was provided, and a "closed book" test where no access to SEC source documents was granted.

Llama 2: Meta's contribution to financial language models, Llama 2, was evaluated for its performance, especially in handling extensive underlying documents.

Claude 2: Developed by Anthropic, Claude 2 entered the evaluation with a focus on addressing finance-specific challenges.

Evaluation Criteria and Surprising Results

The evaluation criteria encompassed accuracy, the ability to handle long-context scenarios, and the models' propensity to provide correct answers without access to source documents. Surprisingly, even with access to relevant source text, GPT-4-Turbo faced challenges in the "closed book" test, demonstrating the intricacies involved in extracting accurate information without human input.

The refusal rate among models, even when answers were within context, raised concerns about the reliability of LLMs in providing consistent responses. The study's findings highlighted that, even in scenarios where models performed well, the margin for error in the finance sector remains unacceptably low. The implications of inaccuracies in regulated industries further underscore the need for continuous improvement.

Challenges in Incorporating LLMs into Financial Products

The integration of Large Language Models (LLMs) into financial workflows poses multifaceted challenges. One of the central challenges lies in the nondeterministic nature of LLMs, as highlighted by Patronus AI's study. The inability to guarantee consistent output for the same input raises concerns about the reliability of LLMs, particularly in scenarios where precision is paramount, such as financial data analysis.

Moreover, the study pointed out that LLMs, even in their best-performing configurations, exhibited a high refusal rate and occasional "hallucinations" – generating incorrect information not present in SEC filings. This unpredictability necessitates a deeper understanding of the limitations of LLMs and a cautious approach to their implementation in financial products.

Finance-Specific Large Language Models

JP Morgan's DocLLM

JP Morgan Chase & Co. has recently unveiled an innovative Large Language Model (LLM) called DocLLM, specifically engineered to transform the understanding of visually complex documents within the financial sector. DocLLM is a unique transformer-based model that integrates both textual and spatial layout information from documents, enabling it to capture nuanced semantics within enterprise records.

Features and Capabilities of DocLLM

At the core of DocLLM's capabilities is its ability to go beyond traditional language models. Unlike conventional models, DocLLM integrates layout information through bounding boxes derived from Optical Character Recognition (OCR), treating spatial data as a separate modality. This approach is a key innovation, allowing the model to compute dependencies between text and layout in a "disentangled" manner.

DocLLM extends the self-attention mechanism found in standard transformers with additional cross-attention scores focused on spatial relationships. This extension enables the model to represent alignments between content, position, and size of document fields at various abstraction levels.

To handle the diverse nature of business documents, DocLLM utilizes a text-infilling pre-training objective, enhancing its capability to handle disjointed text segments and irregular document arrangements.

The pre-trained model is then fine-tuned using data from 16 datasets covering tasks like information extraction, question answering, and classification. Notably, DocLLM has demonstrated state-of-the-art results on 14 out of 16 test datasets, showcasing its proficiency in understanding and processing various document types.

The Role of DocLLM in Financial Data Processing

DocLLM is poised to play a transformative role in financial data processing, offering a range of capabilities:

Addressing Specific Financial Queries: Leveraging visual question-answering capabilities, DocLLM can tackle specific financial queries sourced from SEC filings, loan documents, and other financial sources.
Categorization of Financial Documents: DocLLM excels at categorizing financial documents, such as checks, account applications, and wire transfers, based on spatial patterns. This enhances organization and facilitates more efficient analysis.
Enhanced Precision in Extracting and Analyzing Complex Documents: The model stands out in extracting and analyzing visually complex financial documents like bank statements, insurance claims, and invoices. Understanding the spatial layout is crucial for precision.
Automated Handling of Diverse Document Types: DocLLM automates the handling of handwritten, scanned, or lower-quality documents—common occurrences in finance. Its infilling approach increases robustness.
Systematic Examination of Tables and Forms: The model systematically examines tables and forms within earnings reports, financial filings, and related documents, contributing to a more comprehensive understanding of financial data.

In essence, DocLLM has the potential to streamline document-related processes, enhance data analysis, and contribute to informed decision-making in financial institutions like JP Morgan Chase & Co.

Other Notable Financial LLMs

While DocLLM stands out for its innovative approach to visually complex financial documents, several other noteworthy Large Language Models have been developed or adapted for specific financial challenges:

BERT for Finance: Applied to sentiment analysis of financial news, stock price prediction, and understanding financial statements, BERT captures complex relationships and dependencies in financial text.

FinBERT: Specialized for financial sentiment analysis, FinBERT understands and predicts market sentiment, providing valuable insights for traders and investors.

XLNet for Financial Forecasting: By combining auto-regressive and auto-encoding methods, XLNet has been applied to financial forecasting tasks, capturing bidirectional dependencies in time-series data.

ERNIE for Financial Documents: ERNIE has been used for understanding and extracting information from financial documents, enhancing its comprehension of financial language and terminology.

These models collectively contribute to the automation and enhancement of various financial processes, addressing specific challenges within the financial domain. DocLLM, with its focus on visually complex documents, stands as a pioneering solution reshaping how financial institutions process and analyze a diverse array of documents.

Besides, models like BloombergGPT, McKinsey’s Lilli, and Deloitte and PwC’s custom AI chatbots further illustrate the need for specialized, hyper-focused solutions catering to industries like finance, auditing, and consulting/advisory.

According to the official McKinsey blog: “So how does Lilli work? A user can type in a question and Lilli can scan our entire landscape of knowledge, identify between five to seven of the most relevant pieces of content, summarize key points, include links, and even identify experts in the appropriate fields. The platform includes two modes: one for searching McKinsey's reserves of knowledge, and a second option for external sources.”

I am hosting the inaugural session of a LinkedIn Live webinar series, "AI Integration in Underwriting: Transforming Risk Assessment". This webinar is essential for finance and insurance professionals eager to understand the transformative impact of Artificial Intelligence in underwriting's risk assessment.

Date and Time: 01/24 at 5 pm - Eastern Time
Duration: 1 Hour

Apply here to join the webinar!

I also host an AI podcast and content series called “Pioneers.” This series takes you on an enthralling journey into the minds of AI visionaries, founders, and CEOs who are at the forefront of innovation through AI in their organizations.

To learn more, please visit Pioneers on Beehiiv.

Wrapping Up

Unveiling inherent challenges, from nondeterministic responses to struggles in precise information extraction, underscores the complexities faced by LLMs in the finance domain. Despite current limitations though, there is a collective vision of these models reshaping how financial tasks are approached and executed.

Looking ahead, the path to innovation lies in fostering a collaborative future where AI and human expertise harmoniously coexist. This symbiotic relationship acknowledges that while LLMs bring unprecedented capabilities, human insight remains indispensable. The call for a human-in-the-loop approach resonates strongly, emphasizing the need for human guidance and support in scenarios demanding precision, such as financial data analysis.

The evolving landscape of finance necessitates a strategic balance between the transformative power of LLMs and the nuanced understanding offered by human professionals. This collaborative vision serves as a compass for navigating the dynamic interplay between technological advancements and the intricate demands of the financial sector, charting a course toward a future where AI augments human capabilities to unlock unprecedented efficiency and insights into financial processes.

Ankur’s Newsletter

Discussion about this post