Article citation information:

Ta Duc, Q., Nguyen Xuan, K., Nguyen Tuan, N., Nguyen Thanh, V., Le Dinh, M.,
Trinh Duy, H. Evaluation of SVR and random forest models for accurate prediction of electronic fuel injector behavior in common-rail systems. Scientific Journal of Silesian University of Technology. Series Transport. 2025, 129, 225-235. ISSN: 0209-3324. DOI: https://doi.org/10.20858/sjsutst.2025.129.13

Quyet TA DUC[1], Khoa NGUYEN XUAN[2], Nghia NGUYEN TUAN[3],
Vinh NGUYEN THANH[4], Manh LE DINH[5], Hung TRINH DUY[6]

EVALUATION OF SVR AND RANDOM FOREST MODELS FOR ACCURATE PREDICTION OF ELECTRONIC FUEL INJECTOR BEHAVIOR IN COMMON-RAIL SYSTEMS

Summary. In this study, machine learning algorithms were applied to predict the main injection quantity in a high-pressure common rail system on a diesel engine test bench. The input parameters included engine load, fuel pressure, injection speed, and pulse time. Two models were selected for comparison: Support Vector Regression (SVR) and Random Forest (RF). The results showed that on the training dataset, the RF model outperformed SVR, with RMSE and MAE values of 0.027362 and 0.017628 respectively, significantly lower than those of SVR (RMSE = 0.051563, MAE = 0.027733). Additionally, RF achieved a higher coefficient of determination R² (0.995759 vs. 0.984939), indicating better learning of the relationships among variables. However, on the test dataset, SVR demonstrated superior predictive accuracy, achieving RMSE = 0.050097, MAE = 0.027673, and R² = 0.983550, while RF showed higher RMSE (0.060355), greater MAE (0.040485), and lower R² (0.976123). These results indicate that SVR has better generalization capability and is less prone to overfitting than RF. To assess the contribution of each input parameter, SHAP (SHapley Additive exPlanations) analysis was employed. The results revealed that injection speed, pulse duration, and fuel pressure had the most significant impact on the injection quantity. Meanwhile, engine load had a relatively lower influence but still played an important role under certain operating conditions. These analyses not only provide an intuitive understanding of model sensitivity but also help identify key factors to prioritize in control strategies. This study lays a foundation for the development of optimized control systems aimed at accurately and effectively reducing engine emissions in the future.

Keywords: common rail, SVR, RF, injection speed, injection pressure

1. INTRODUCTION

One of the most critical factors directly influencing diesel engine performance and emissions is the fuel injection process. In this regard, the high-pressure common rail injection system with its advanced injector design and precise control over injection timing, pressure, and fuel quantity, plays a pivotal role. Consequently, conducting in-depth research on injector operation within the common rail system, is not only vital for enhancing fuel efficiency but also represents a key approach to minimizing harmful environmental emissions.

Federico Perini and colleagues [1] developed a simplified phenomenological model to simulate the fuel injection rate of the Bosch CRI2.2 injector within a common rail system. This model enables accurate predictions of the injection process without requiring detailed structural data of the injector. After being calibrated with experimental data, the model is used as an input for computational fluid dynamics (CFD) simulations in engines. The results demonstrate the model's capability to precisely predict injection rates and equivalence ratio distributions while maintaining reliability across various fuel flow rates and injection pressures. Tantan Zhang [2] introduced a real-time method for evaluating large fuel injection quantities (greater than 6 mg per injection) in common rail systems by integrating an additional pressure sensor into the fuel pipe. The pressure data obtained from the sensor is analyzed using an artificial neural network (ANN) combined with a physical model to accurately estimate the injected fuel mass. This approach achieves an error margin of just 1.4 mg, indicating strong potential for practical applications in fuel injection monitoring and precise control. Jianhui Zhao and colleagues [3] developed a deep learning model based on a General Regression Neural Network (GRNN) to accurately predict the fuel injection quantity in common rail systems. To enhance model performance, the research team applied the Particle Swarm Optimization (PSO) algorithm to fine-tune network parameters, thereby improving prediction accuracy. The results showed an average error of only 1.10%, with a coefficient of determination (R²) reaching 0.997. Tests conducted under 144 different operating conditions further revealed a maximum deviation reduction of up to 62.02% in the main injection quantity, highlighting the model's outstanding capability in accurately controlling multiple injection cycles. Synthesizing the above studies, it is evident that machine learning methods have become increasingly prevalent and widely applied in forecasting and identifying the optimal operating regions of injectors and engines [4 -7]. While many previous studies have focused on simulating and predicting injector behavior in common rail systems, most have relied primarily on CFD simulations or employed a single machine learning model. Although CFD offers high accuracy, it often involves high computational costs and long processing times, which limits its practicality in real-time applications. On the other hand, studies using machine learning often implement only one algorithm, lacking comprehensive comparisons and evaluations among different methods. Based on this context, the present study proposes and compares two widely used machine learning models Support Vector Regression (SVR) and Random Forest (RF) for predicting the behavior of electronic fuel injectors. Evaluating the performance of these models not only facilitates the selection of the most suitable algorithm for a specific application but also offers a more comprehensive perspective on the potential of machine learning in analyzing and optimizing fuel injection systems. This constitutes the novelty and key contribution of the study compared to previous works.

2. METHODOLOGY

Support Vector Regressor (SVR)

SVR aims to find a regression function of the form:

f(x) = ω^T φ(x) + b (1)

Where: ω is the weight vector, ϕ(x) is the feature mapping function that transforms the input data 𝑥 into a higher-dimensional feature space, and 𝑏 is the bias term.

Random Forest (RF)

Random Forest (RF) is an ensemble machine learning algorithm that combines the outputs of multiple individual models to generate a single, more accurate prediction. In this study, the RF model is built upon decision trees, where each tree independently splits nodes and makes predictions. RF employs the bagging technique, in which each decision tree is trained on a randomly sampled subset of the original dataset, allowing for model diversity and improved generalization. In regression tasks, the final prediction is computed by averaging the outputs of all individual trees. By aggregating the results from multiple base learners, the RF model enhances prediction accuracy and effectively reduces the risk of overfitting [8]. Given the dataset’s significant feature imbalance, an 80:20 train-test split is applied. Techniques such as cross-validation, learning curves, and residual analysis are utilized for model evaluation. This study employs a pipeline model with GridSearchCV to systematically experiment with both linear and nonlinear regression models, covering the entire process from data preprocessing to model training. The dataset is normalized using the Standard Scaler method [9]. The goal of the predictive model is to identify a function that minimizes the difference between predicted and actual values. This difference is quantified by a loss function, and the machine learning model aims to optimize this loss function. The following section presents common error metrics used for model evaluation [10].

Mean Squared Error (MSE) Calculates the mean square of the model error:

(2)

Root Mean Squared Error (RMSE) calculates the average deviation between predicted values and actual values.

(3)

Mean Absolute Error (MAE) calculates the average of the absolute errors between the actual and predicted values.

(4)

R² (Coefficient of Determination): A statistical indicator that shows the proportion of variance in the target to be predicted that is explained by the input variables in the model.

(5)

Mean Absolute Percentage Error (MAPE).

x 100 (6)

Where: y_i is the actual value and is predicted value, is the average of the actual values.

Shapley Additive Explanations (SHAP) interpret a feature by quantifying its contribution to the model’s prediction. The SHAP value of a feature represents how much that feature adds to or subtracts from the model's output compared to the average prediction. When the input value of a feature changes, its corresponding SHAP value also changes, reflecting the variation in that feature’s influence on the model’s prediction. This change can be linear or nonlinear, depending on the nature of the model and interactions with other features [11]. The use of standard deviation and mean error [12], calculated through Equations (7), (8) and (9).

(7)

(8)

(9)

Where: St: represents the standard deviation; t ̂: denotes the sample mean; t_i: is the mean error value of the individual parameter; n: refers to the sample size.

3. RESULTS AND DISCUSSION

Table 1 presents the fuel Table 1 presents the fuel flow rate prediction performance of two machine learning methods Support Vector Regression (SVR) and Random Forest (RF) using evaluation metrics including RMSE, MAE, R², and standard deviation (STD), measured on both the training and testing datasets. On the training set, the RF model demonstrates superior performance, achieving lower RMSE and MAE values of 0.027362 and 0.017628, respectively, compared to SVR’s RMSE = 0.051563 and MAE = 0.027733. In addition, RF obtains a higher coefficient of determination (R² = 0.995759) than SVR (R² = 0.984939), indicating that RF learns the relationship between variables more effectively during training. However, on the testing set, SVR yields more accurate predictions, with RMSE = 0.050097, MAE = 0.027673, and R² = 0.983550, while RF results in higher error values (RMSE = 0.060355, MAE = 0.040485) and a lower R² of 0.976123. These results suggest that SVR generalizes better and is less prone to overfitting compared to RF. Regarding standard deviation, both models show similar levels of output variability on both training and testing sets, indicating the stability of both the data and the models. RF is better suited for scenarios where high accuracy on training data is essential, whereas SVR is a more appropriate choice when prioritizing model stability and generalizability to new, unseen data.

Tab. 1

Prediction Results of the Models

Method	Phase	RMSE	MAE	R2	STD
SVR	Train	0.051563	0.027733	0.984939	0.412498
SVR	Test	0.050097	0.027673	0.983550	0.396126
RF	Train	0.027362	0.017628	0.995759	0.412343
RF	Test	0.060355	0.040485	0.976123	0.380328

Figures 1 and 2 illustrate the learning curves of two different algorithms: RSV and RF, demonstrating how each model's performance evolves with increasing training sample size. For the RSV algorithm, both the training score (red curve) and the cross-validation score (green curve) show a rapid decline in error as the number of training samples increases. Specifically, the Mean Squared Error (MSE) of the training score starts at around 0.3 with a small dataset and quickly drops below 0.002 when the sample size reaches approximately 50, eventually stabilizing at around 0.001. Similarly, the cross-validation score decreases sharply from over 0.3 to below 0.003 once the training sample size exceeds 100. The gap between the training and cross-validation curves remains very narrow (less than 0.002), indicating that the model learns well and generalizes effectively, with no signs of overfitting or underfitting. Moreover, the shaded region around the cross-validation score curve is narrow (±0.001), suggesting high model stability across different validation sets. These characteristics confirm that the RSV model not only fits the training data well but also performs reliably on unseen data, making it a robust choice for regression tasks in this context.

In contrast, for the Random Forest (RF) algorithm, although the training score achieves a very low MSE (below 0.005) even with a small number of training samples, the cross-validation score remains significantly higher and decreases only gradually as the sample size increases. Specifically, the MSE of the cross-validation starts around 0.22–0.25 and only drops to approximately 0.01 when the number of training samples reaches 300. The gap between the training and validation scores ranges from 0.01 to 0.02, indicating a tendency toward overfitting; the model fits the training data very well but performs less effectively on unseen data. Additionally, the shaded area around the cross-validation score curve in the RF plot is noticeably wider (up to ±0.01), reflecting a higher variance in performance across different validation sets and hence reduced model stability.

Nevertheless, the steady downward trend of the cross-validation score with increasing training data suggests that RF has potential for improvement, particularly if more training data is provided or appropriate hyperparameter tuning is performed. This highlights the possibility of enhancing both the accuracy and stability of the model for practical applications.

Based on the above results, it can be concluded that the RSV algorithm demonstrates superior performance on the current dataset, with lower MSE and fast convergence between training and validation scores indicating strong generalization capability. In contrast, while Random Forest (RF) shows promising potential, it currently suffers from overfitting and requires either more training data or further hyperparameter tuning to achieve the same level of stability and effectiveness as RSV.

Obraz zawierający tekst, diagram, Wykres, linia

Zawartość wygenerowana przez AI może być niepoprawna.

Fig. 1. Learning curve of SVR model

Obraz zawierający tekst, diagram, linia, Wykres

Zawartość wygenerowana przez AI może być niepoprawna.

Fig. 2. Learning curve of RF model

Figure 3 illustrates a comparison between the actual values (denoted by red stars) and the predicted results from two regression algorithms: SVR (Support Vector Regression – represented by blue triangles) and RF (Random Forest – represented by yellow squares) across 91 data samples in the test set. It can be observed that both algorithms tend to produce predictions closely aligned with the actual values. However, noticeable deviations still occur, particularly in samples with abnormally high or low "Flow" values, such as samples 19, 34, 55, and 61. These outliers highlight specific cases where the models struggle to maintain accuracy, especially under extreme conditions. In most data points, both SVR and RF yield predictions that are close to the actual values; however, differences between the two models can be observed in terms of fluctuation and adherence to the true data trend. Specifically, SVR tends to produce smoother and more continuous predictions, particularly in regions with gradual variations, thanks to its ability to flexibly adapt within the function space. In contrast, while RF is powerful in handling nonlinear relationships, it sometimes generates abrupt or repetitive outputs due to the nature of decision trees, leading to larger errors at boundary or transition points. Overall, SVR demonstrates slightly higher stability and accuracy, especially in areas with minor fluctuations in flow values. Meanwhile, RF may perform well at certain extreme points but shows less consistency across transitional segments. This observation aligns with the earlier Learning Curve analysis, where SVR exhibited balanced performance between training and validation sets, while RF showed signs of overfitting. Therefore, SVR may be the more suitable choice in regression problems that demand high accuracy and stability on the test dataset.

Obraz zawierający tekst, linia, Wykres, numer

Zawartość wygenerowana przez AI może być niepoprawna.

Fig. 3. Comparation prediction values on testing data between models

Figure 4 presents a comparison of the prediction performance between two machine learning models Support Vector Regressor (SVR) and Random Forest (RF) in a regression task. The left plot shows the results for SVR, while the right plot displays those for RF. Both plots illustrate the relationship between the actual values (y-axis) and the predicted values (x-axis), with the red dashed line representing the ideal prediction line, where predicted values perfectly match the actual values (i.e., y = x). The SVR model demonstrates highly promising results, with a coefficient of determination of R² = 0.984, indicating that the model explains 98.4% of the variance in the test data. In addition, its error metrics are very low, with an RMSE of 0.050 and an MAE of 0.028, reflecting high prediction accuracy and model stability. Meanwhile, the Random Forest model also performs well, achieving an R² of 0.977, RMSE of 0.060, and MAE of 0.039. Although these metrics are slightly less favorable than those of SVR, RF still delivers effective predictions, as evidenced by the data points being closely clustered around the y = x line. From these results, it is clear that both models perform well, but SVR achieves slightly higher accuracy with lower prediction errors, making it better suited to the characteristics of this particular dataset. However, the final choice of model should also consider other factors such as training time, scalability with larger datasets, and interpretability requirements depending on the specific application context.

Obraz zawierający tekst, diagram, linia, Wykres

Zawartość wygenerowana przez AI może być niepoprawna.

Fig. 4. Prediction results of the models on the testing data

Figure 5 illustrates the contribution of input features to the prediction outcomes of the Support Vector Regressor (SVR) model. Among these, Injection_speed has the most significant impact, as evidenced by high values (in red) greatly increasing the predicted output, while low values (in blue) tend to decrease it. The next two influential features, Pressure and Pulse_time, also demonstrate substantial effects pressure typically contributes to higher predictions when its value is high, whereas Pulse_time exhibits both positive and negative influences, indicating a more complex relationship with the output. In contrast, the features related to speed at different channels (Speed_CH_1: Low-speed mode, representing light load and low engine speed conditions; Speed_CH_2: Medium-speed mode, reflecting typical operating conditions in real-world usage; Speed_CH_3: High-speed mode, corresponding to situations where the engine operates at high power or under heavy load; and Speed_CH_4: Medium-speed mode combined with a pilot injection strategy) show noticeably smaller impacts, with SHAP values mostly close to zero. Overall, this plot reveals that the SVR model primarily relies on the top three features for making predictions. This insight is crucial for model interpretability, feature selection, and improving forecasting performance.

Obraz zawierający tekst, zrzut ekranu, Czcionka, numer

Zawartość wygenerowana przez AI może być niepoprawna.

Fig. 5. SHAP value chart

Figure 6 illustrates the mean absolute SHAP values (mean (|SHAP value|)) for each feature, reflecting the average impact of each feature on the output of the SVR model. Among them, the three most important features are Injection_speed, Pressure, and Pulse_time, with mean SHAP values of approximately 0.205, 0.200, and 0.195, respectively. These values indicate that they contribute the most to adjusting the model's prediction results. In contrast, the features related to load conditions represented by low load, medium load, high load, and pilot injection are shown through the corresponding input variables (Speed_CH4, Speed_CH1, Speed_CH2, and Speed_CH3), which exhibit significantly lower average SHAP values of about 0.045, 0.040, 0.038, and 0.025, respectively. This indicates that the speed features in these channels have a relatively minor influence on the model. The inclusion of these quantitative values helps clarify the relative importance of the features, effectively supporting feature selection and dimensionality reduction. As a result, the model can be simplified, leading to improved processing performance.

Obraz zawierający tekst, zrzut ekranu, Czcionka, numer

Zawartość wygenerowana przez AI może być niepoprawna.

Fig. 6. The influence of input features on the model output

4. CONCLUSION

In this study, two machine learning models Support Vector Regression (SVR) and Random Forest (RF) were developed to predict fuel injection quantity and assess the influence of input parameters such as injection speed, pressure, pulse duration, and operating modes on the injector performance in the diesel engine's common rail system. The results show that both models achieved high predictive performance. In particular, the SVR model demonstrated superior accuracy, with a coefficient of determination 𝑅² = 0.984, explaining 98.4% of the variance in the test dataset. Low error metrics, including RMSE = 0.050 and MAE = 0.028, further confirm the model’s stable and accurate predictive capability. The Random Forest model also delivered promising results, with 𝑅² = 0.977, RMSE = 0.060, and MAE = 0.039. Additionally, the SVR model achieved an RMSE of 5.86 MPa in terms of pressure units, indicating better generalization across various operating conditions. The analysis of feature importance in the SVR model revealed that the three most influential factors are injection speed, fuel pressure, and pulse time. Among these, injection speed plays the most critical role, followed by pressure and pulse duration, highlighting the significance of these parameters in controlling the fuel injection system. These findings not only clarify the role of each technical parameter in injector control but also provide valuable support for calibration and optimization of engine performance. Overall, this study demonstrates the effectiveness of machine learning models in accurately predicting key technical parameters and opens up new directions for developing intelligent control systems aimed at enhancing engine performance and reducing emissions in diesel engines. In the future, integrating real-time sensor data with deep learning techniques is expected to further improve the accuracy and adaptability of modern control systems.

Acknowledgement

This research was funded by Hanoi University of Industry under Project 09-2025-RD/HĐ-ĐHCN

References

1. Baker P., P. Croucher, F. Perini, S. Busch, R.D. Reitz. 2020. „A phenomenological rate of injection model for predicting fuel injection with application to mixture formation in light-duty diesel engines”. Proceedings of the Institution of Mechanical Engineers, Part D 234(7): 1826-1839. DOI: 10.1177/0954407019-898062.

2. Tantan Zhang. 2022. „An estimation method of the fuel mass injected in large injections in Common-Rail diesel engines based on system identification using artificial neural network”. Fuel 310 (Part B): 122404. ISSN: 0016-2361. DOI: 10.1016/j.fuel.2021.122404.

3. Xiangdong Lu, Jianhui Zhao, Vladimir Markov, Tianyu Wu. 2024. „Study on precise fuel injection under multiple injections of high pressure common rail system based on deep learning”. Energy 307: 132784. ISSN: 0360-5442. DOI: 10.1016/j.energy.2024.132784.

4. Yosri M., R. Palulli, M. Talei, et al. 2023. “Numerical investigation of a large bore, direct injection, spark ignition, hydrogen-fuelled engine”. International Journal Of Hydrogen Energy 48(46): 17689-17702.

5. Finesso Roberto, Ezio Spessa. 2015. “A control-oriented approach to estimate the injected fuel mass on the basis of the measured in-cylinder pressure in multiple injection diesel engines”. Energy Conversion and Management 105: 54-70. ISSN: 0196-8904. DOI: 10.1016/j.enconman.2015.07.053.

6. Mengzhao Chang, Minuk Jeong, Sungwook Park, Hyung Ik Kim, Jeong Hwan Park, Suhan Park. 2023. „Study on predictions of spray target position of gasoline direct injection injectors with multi-hole using physical model and machine learning”. Fuel Processing Technology 247: 107774. ISSN: 0378-3820. DOI: 10.1016/j.fuproc.2023.107774.

7. Junjian Tian, Yu Liu, Haobo Bi, Fengyu Li, Lin Bao, Kai Han, Wenliang Zhou, Zhanshi Ni, Qizhao Lin. 2022. „Experimental study on the spray characteristics of octanol diesel and prediction of spray tip penetration by ANN model”. Energy 239 (Part A): 121920. ISSN: 0360-5442. DOI: 10.1016/j.energy.2021.121920.

8. Breiman L. 2001. „Random Forests”. Machine Learning 45: 5-32. DOI: 10.1-023/A:1010933404324.

9. Manjurul Ahsan Md, M.A. Mahmud, Pritom Saha, Kishor Datta Gupta, Zahed Siddique. 2021. “Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance”. Technologies 9: 52.

10. Yingjie Tian, Duo Su, Stanislao Lauria, Xiaohui Liu. 2022. “Recent advances on loss functions in deep learning for computer vision”. Neurocomputing 497: 129-158. ISSN: 0925-2312. DOI: 10.1016/j.neucom.2022.04.127.

11. Hilloulin Benoît, Van Quan Tran. 2022. “Using machine learning techniques for predicting autogenous shrinkage of concrete incorporating superabsorbent polymers and supplementary cementitious materials”. Journal of Building Engineering 49: 104086. ISSN: 2352-7102. DOI: 10.1016/j.jobe.2022.104086.

12. Kumar Shashikant, Rakesh Kumar, Baboo Rai, Pijush Samui. 2024. “Prediction of compressive strength of high-volume fly ash self-compacting concrete with silica fume using machine learning techniques”. Construction and Building Materials 438: 136933. ISSN: 0950-0618. DOI: 10.1016/j.conbuildmat.2024.136933.

Received 11.07.2025; accepted in revised form 04.10.2025

Scientific Journal of Silesian University of Technology. Series Transport is licensed under a Creative Commons Attribution 4.0 International License

[1] Automotive Practice, Inspection Center, School of Mechanical & Automotive Engineering, Hanoi University of Industry. No. 298 Cau Dien Street, Tay Tuu Ward, Hanoi. Email: taquyet20028091@gmail.com. ORCID: https://orcid.org/0009-0002-4913-2744

[2] Automotive Practice, Inspection Center, School of Mechanical & Automotive Engineering, Hanoi University of Industry. No. 298 Cau Dien Street, Tay Tuu Ward, Hanoi. Email: khoanx@haui.edu.vn. ORCID: https://orcid.org/0000-0003-2869-465X

[3] Automotive Practice, Inspection Center, School of Mechanical & Automotive Engineering, Hanoi University of Industry. No. 298 Cau Dien Street, Tay Tuu Ward, Hanoi. Email: nghiant@haui.edu.vn. ORCID: https://orcid.org/0000-0003-2754-1889

[4] Automotive Practice, Inspection Center, School of Mechanical & Automotive Engineering, Hanoi University of Industry. No. 298 Cau Dien Street, Tay Tuu Ward, Hanoi. Email: vinhnt@haui.edu.vn. ORCID: https://orcid.org/0000-0001-5633-5203

[5] Automotive Practice, Inspection Center, School of Mechanical & Automotive Engineering, Hanoi University of Industry. No. 298 Cau Dien Street, Tay Tuu Ward, Hanoi. Email: manhld@haui.edu.vn. ORCID: https://orcid.org/0000-0002-7231-4944

[6] Automotive Practice, Inspection Center, School of Mechanical & Automotive Engineering, Hanoi University of Industry. No. 298 Cau Dien Street, Tay Tuu Ward, Hanoi. Email: hungtd@haui.edu.vn. ORCID: https://orcid.org/0000-0002-3323-0836