Intelligent RCA: Root Cause Analysis for Oil and Gas Industries’ Machinery by Utilizing Machine Learning Model and Large Language Model (LLM)

Authors

  • Muhammad Khairunnizam Bin Amir Department of Computing, Universiti Teknologi PETRONAS, Perak, Malaysia
  • Halimaton Hakimi Department of Computing, Universiti Teknologi PETRONAS, Malaysia
  • Nurliali Karim Department of Computing, Universiti Teknologi PETRONAS, Malaysia
  • Nour Alsharif Department of Mathematics, Faculty of Engineering and Natural Sciences, Hitit University Corum, Turkey

Keywords:

Machine Learning, Principal Component Analysis, Random Forest, xGBoost, Isolation Forest

Abstract

Root Cause Analysis (RCA) in oil and gas operations remains a critical yet complex task due to high-dimensional sensor data, nonlinear fault dynamics, and operational safety constraints. Conventional RCA approaches are expert-driven, subjective, and increasingly inadequate for Industry 4.0 environments characterized by large-scale condition monitoring systems. This study proposes an intelligent hybrid RCA framework integrating Principal Component Analysis (PCA), ensemble learning (XGBoost and Random Forest), anomaly detection (Isolation Forest), and a fine-tuned Large Language Model (LLaMA) to enable automated, interpretable, and scalable fault diagnosis. A vibration dataset comprising 33,600 samples with 25,000 time-series features was utilized. PCA reduced dimensionality to 250 components while preserving 48% cumulative variance. Experimental results demonstrate that XGBoost achieved 95% classification accuracy, significantly outperforming Random Forest (33%). Isolation Forest showed strong normal-condition detection but limited anomaly recall. The LLM module translated model outputs into structured RCA narratives covering causation, fault progression, and preventive strategies. The proposed architecture advances explainable industrial AI by bridging predictive modeling with semantic reasoning, offering a solution for intelligent maintenance in high-risk energy infrastructures.

Published

2026-04-30

Issue

Section

Articles