Abstract
Traditionally, various statistical analyses were employed to model crash severity using tabular data. However, textual crash narratives, which contain rich information, were underutilized due to the complexity associated with handling text. Recently, transfer learning and language models have gained substantial popularity in natural language processing. In contrast to traditional word embedding methods like TF-IDF and Word2Vec, Large Language Models (LLMs) are context-dependent and outperform conventional techniques when finetuned on the target domain. A major limitation of LLMs is the lack of explainability, functioning akin to a black box. Furthermore, datasets in certain domains exhibit high imbalances. This study presents a framework to address these challenges. Firstly, we generated a balanced dataset using BERT Language models. Secondly, we compared the performance of traditional embedding TF-IDF with the finetuned BERT/Bi-LSTM in classifying crash severity. The results demonstrated that the finetuned BERT/Bi-LSTM model outperformed XGB with accuracy scores of 98% and 96%, respectively. The finetuned classifier was then input into the SHAP explainable model to identify factors impacting crash severity. The analysis utilized crash data obtained from the Missouri State Highway Patrol. Results indicated that the use of PLM and SHAP techniques revealed factors associated with crash severity levels, thereby enabling deployment for other textual classification problems.
Original language | English |
---|---|
Pages | 1-5 |
DOIs | |
Publication status | Published - 13 Apr 2024 |
Externally published | Yes |
Event | 2024 IEEE 3rd International Conference on Computing and Machine Intelligence (ICMI) - Mt Pleasant, MI, USA Duration: 13 Apr 2024 → 14 Apr 2024 |
Conference
Conference | 2024 IEEE 3rd International Conference on Computing and Machine Intelligence (ICMI) |
---|---|
Period | 13/04/24 → 14/04/24 |