Explainable Language Models for the Identification of Factors Influencing Crash Severity Levels in Imbalanced Datasets

Shadi Jaradat, Richi Nayak, Mohammed Elhenawy

Research output: Contribution to conferenceConference paperpeer-review

2 Citations (Scopus)

Abstract

Traditionally, various statistical analyses were employed to model crash severity using tabular data. However, textual crash narratives, which contain rich information, were underutilized due to the complexity associated with handling text. Recently, transfer learning and language models have gained substantial popularity in natural language processing. In contrast to traditional word embedding methods like TF-IDF and Word2Vec, Large Language Models (LLMs) are context-dependent and outperform conventional techniques when finetuned on the target domain. A major limitation of LLMs is the lack of explainability, functioning akin to a black box. Furthermore, datasets in certain domains exhibit high imbalances. This study presents a framework to address these challenges. Firstly, we generated a balanced dataset using BERT Language models. Secondly, we compared the performance of traditional embedding TF-IDF with the finetuned BERT/Bi-LSTM in classifying crash severity. The results demonstrated that the finetuned BERT/Bi-LSTM model outperformed XGB with accuracy scores of 98% and 96%, respectively. The finetuned classifier was then input into the SHAP explainable model to identify factors impacting crash severity. The analysis utilized crash data obtained from the Missouri State Highway Patrol. Results indicated that the use of PLM and SHAP techniques revealed factors associated with crash severity levels, thereby enabling deployment for other textual classification problems.
Original languageEnglish
Pages1-5
DOIs
Publication statusPublished - 13 Apr 2024
Externally publishedYes
Event2024 IEEE 3rd International Conference on Computing and Machine Intelligence (ICMI) - Mt Pleasant, MI, USA
Duration: 13 Apr 202414 Apr 2024

Conference

Conference2024 IEEE 3rd International Conference on Computing and Machine Intelligence (ICMI)
Period13/04/2414/04/24

Fingerprint

Dive into the research topics of 'Explainable Language Models for the Identification of Factors Influencing Crash Severity Levels in Imbalanced Datasets'. Together they form a unique fingerprint.

Cite this