Claim Denial Management

3 min readOct 26, 2023

Healthcare’s intertwining with insurance has made claim denials a costly issue for individuals and institutions. In the U.S., $262 billion of $3 trillion in total claims were denied in 2016, averaging nearly $5 million per hospital. Hospitals, burdened with massive insurance plans and aiming for minimal coverage, struggle with the resources and time needed to appeal denials. They are unsure how to create an efficient prevention program without overwhelming their teams.

The project aims to create an AI model that assesses the risks of claim denials. The current system consists of two main components: the Autoencoder and the Denial Brain.

  1. Autoencoder


  • PDF Parsing and Computer Vision Module: Converts PDF to structured text.
  • Free Text Coder: Converts text information about procedure or diagnosis into CPT/HCPCS/ICD-10 codes.


  • NLP Parsing: Extracts text from PDF files and identifies keywords using regular expressions.
  • Code Extraction and Validation: Uses pattern matching and the Free Text Coder to convert values into codes, and validates against a code dictionary.
  • RegEx Extractor: Tries to extract codes from text using regular expressions.
  • Code Validator: Verifies obtained codes against a dictionary.
  • TFIDF Extractor: Extracts codes based on text similarity, returning the nearest code in the vocabulary.
  • Computer Vision: Converts PDF to image, fitting it into a model to yield text blocks. Locates keywords in boxes using letterwise matching and Levenstein distance.
  • Merge Outputs: Combines NLP and Computer Vision outputs to create the final Autoencoder output.

2. Denial Brain

Data Preparation:

  • LabelEncoder is used for categorical variables.
  • The target is set to the amount paid for the claim.

Data Processing

  • The dataset contained above 7M samples, split into test and train sets (20%/80%).
  • The most important features were the bill amount and procedure.


  • Used a Random Forest as a regression algorithm to predict the paid amount.
  • Hyperparameters: n_estimators=10, max_depth=None, max_features=’auto’, min_samples_split as needed.

Performance Evaluation: Used the mean squared error (MSE) score to estimate prediction performance.

Programming Languages

  • Python 3
  • SQL (specifically AWS Redshift dialect)

AI Stack:

  • Prediction Modelling:
  • Random Forest
  • Extreme Gradient Boosting Machine
  • Generalized Linear Models
  • Computer Vision:
  • Image Segmentation
  • Text Detection
  • Optical Character Recognition (OCR)
  • Natural Language Processing:
  • Regular Expression
  • TF-IDF Models

Containerization: Docker

Frameworks and Libraries:

  • Data Manipulation and Analysis:
  • NumPy
  • pandas
  • Visualization:
  • matplotlib
  • seaborn
  • Machine Learning and AI:
  • scikit-learn
  • xgboost
  • TensorFlow
  • Keras
  • Database Connectivity:
  • psycopg2
  • pypyodbc
  • Web Frameworks:
  • Flask
  • Werkzeug
  • Utilities:
  • easydict
  • tqdm
  • Cython
  • Computer Vision and Image Processing:
  • opencv-python
  • pdf2image
  • python-dfbox
  • scipy

We developed a solution to reduce claim denial rates and streamline payment processes. Our AI model automates claim processing, cutting human capital costs and speeding up payment plans, which shortens accounts receivable (A/R) times. As the first nationwide solution for managing data claims, it suits hospitals and large physician practices. With a deep understanding of claim denial management, we’re ready to adapt our model to other countries’ healthcare systems.




Ukraine-based IT company specialized in development of software solutions based on science-driven information technologies #AI #ML #IoT #NLP #Healthcare #DevOps