For citation:
Petrakova V. S. Algorithm for searching for outliers in non-stationary time series of field measurements. Izvestiya of Saratov University. Mathematics. Mechanics. Informatics, 2025, vol. 25, iss. 4, pp. 566-577. DOI: 10.18500/1816-9791-2025-25-4-566-577, EDN: TMDBOY
Algorithm for searching for outliers in non-stationary time series of field measurements
The paper is devoted to finding an efficient algorithm for detecting outliers in non-stationary one-dimensional time series representing field measurements. Thus, the non-stationarity of a series is characterized by the presence of a variable trend in the data, as well as heteroscedasticity which is the inconstancy of variance for individual subsequences of the time series. Failure to take these features into account leads to the fact that outliers associated with breakdowns or inaccuracies of the equipment recording field measurements can be classified as regular values. This makes most existing methods for detecting outliers in time series ineffective. The paper describes real data representing observations of temperature and pollutant concentration in the boundary layer of the atmosphere in Krasnoyarsk, which have specified properties. A brief overview of existing methods is given, their advantages and disadvantages in application to the available data are shown. The author's approach to detecting outliers in the series of the described type is proposed. The method proposed in the paper is aimed at correcting and combining existing approaches and is divided into two stages: localization of points suspected of being outliers and regression on the localized section with an adaptive threshold for cutting off points. The proposed algorithm was tested on the available data. A comparison with existing approaches was made.
- Bezmenov I. V., Drozdov A. E., Pasynok S. L. A strategy for finding outliers in noisy data series including an unknown trend. Measurement Techniques, 2022, vol. 65, iss. 5, pp. 339–345. DOI: https://doi.org/10.1007/s11018-022-02085-6
- Fan J., Han F., Liu H. Challenges of big data analysis. National Science Review, 2014, vol. 1, iss. 2, pp. 293–314. DOI: https://doi.org/10.1093/nsr/nwt032
- Hawkins D. M. Identification of outliers. Monographs on Statistics and Applied Probability. New York, Springer Netherlands, 1980. 188 p. DOI: https://doi.org/10.1007/978-94-015-3994-4
- Kiani R., Jin W., Sheng V. S. Survey on extreme learning machines for outlier detection. Machine Learning, 2024, vol. 113, pp. 5495–5531. DOI: https://doi.org/10.1007/s10994-023-06375-0
- Rhyu J., Bozinovski D., Dubs A. B., Mohan N., Cummings Bende E. M., Maloney A. J., Nieves M., Sangerman J., Lu A. E., Hong M. S., Artamonova A., Ou R. W., Barone P. W., Leung J. C., Wolfrum J. M., Sinskey A. J., Springs S. L., Braatz R. D. Automated outlier detection and estimation of missing data. Computers & Chemical Engineering, 2024, vol. 180, art. 108448. DOI: https://doi.org/10.1016/j.compchemeng.2023.108448
- Hu R., Chen L., Wang Y. An efficient outlier detection algorithm for data streaming. arXiv:2501. 01061 [stat] January 2, 2025. 12 p. DOI: https://doi.org/10.48550/arXiv.2501.01061
- Alimohammadi H., Shengnan N. Ch. Performance evaluation of outlier detection techniques in production timeseries: A systematic review and meta-analysis. Expert Systems with Applications, 2022, vol. 191, art. 116371. DOI: https://doi.org/10.1016/j.eswa.2021.116371
- Blázquez-García A., Conde A., Mori U., Lozano J. A. A review on outlier/anomaly detection in time series data. ACM Computing Surveys, 2021, vol. 54, iss. 3, pp. 1–33. DOI: https://doi.org/10.1145/3444690
- Zhao N., Liu Y., Vanos J. K., Cao G. Day-of-week and seasonal patterns of PM2.5 concentrations over the United States: Time-series analyses using the Prophet procedure. Atmospheric Environment, 2018, vol. 192, pp. 116–127. DOI: https://doi.org/10.1016/j.atmosenv.2018.08.050
- Zhai B., Chen J., Yin W., Huang Z. Relevance analysis on the variety characteristics of PM2.5 concentrations in Beijing, China. Sustainability, 2018, vol. 10, iss. 9, art. 3228. DOI: https://doi.org/10.3390/su10093228
- Liu B., Yan S., Li J., Li Y., Lang J., Qu G. A spatiotemporal recurrent neural network for prediction of atmospheric PM2.5: A case study of Beijing. IEEE Transactions on Computational Social Systems, 2021, vol. 8, iss. 3, pp. 578–588. DOI: https://doi.org/10.1109/TCSS.2021.3056410
- Wang P., Zhang H., Qin Z., Zhang G. A novel hybrid-Garch model based on ARIMA and SVM for PM2.5 concentrations forecasting. Atmospheric Pollution Research, 2017, vol. 8, iss. 5, pp. 850–860. DOI: https://doi.org/10.1016/j.apr.2017.01.003
- Senthamarai Kannan K., Manoj S. K., Arumugam S. Labeling methods for identifying outliers. International Journal of Statistics and Systems, 2015, vol. 10, iss. 2, pp. 231–238.
- Mare D. S., Moreira F., Rossi R. Nonstationary Z-Score measures. European Journal of Operational Research, 2017, vol. 260, iss. 1, pp. 348–358. DOI: https://doi.org/10.1016/j.ejor.2016.12.001
- Wang H., Bah M. J., Hammad M. Progress in outlier detection techniques: A survey. IEEE Access, 2019, vol. 7, pp. 107964–108000. DOI: https://doi.org/10.1109/ACCESS.2019.2932769
- Tang B., He H. A local density-based approach for outlier detection. Neurocomputing, 2017, vol. 241, pp. 171–180. DOI: https://doi.org/10.1016/j.neucom.2017.02.039
- Boulmerka A., Allili M. S., Ait-Aoudia S. A generalized multiclass histogram thresholding approach based on mixture modelling. Pattern Recognition, 2014, vol. 47, iss. 3, pp. 1330–1348. DOI: https://doi.org/10.1016/j.patcog.2013.09.004
- Karpatne A., Khandelwal A., Kumar V. Ensemble learning methods for binary classification with multi-modality within the classes. Proceedings of the 2015 SIAM International Conference on Data Mining, 2015, pp. 730–738. DOI: https://doi.org/10.1137/1.9781611974010.82
- 230 reads