You are using an outdated browser. For a faster, safer browsing experience, upgrade for free today.

APPLICATION OF MUTUAL INFORMATION ASSESSMENT METHODS FOR FEATURE SELECTION IN REGRESSION MODELS PREDICTING PERCUTANEOUS PENETRATION OF PESTICIDES

ISSN 2223-6775 Український журнал з проблем медицини праці Том.19, Додаток, 2023

https://doi.org/10.33573/ujoh2023.Suppl.347

APLICATION OF MUTUAL INFORMATION ASSESSMENT METHODS FOR FEATURE SELECTION IN REGRESSION MODELS PREDICTING PERCUTANEOUS PENETRATION OF PESTICIDES

Lytvynenko V.1, 2, Demchenko V.1, Dontsova D.1, Lurie I.2, 4, Olszewski S.1, 3, Zaets Е.1, Zakharchenko Ye.2

1State Institution "Kundiiev Institute of Occupational Health of the National Academy of Medical Sciences of Ukraine", Kyiv, Ukraine
2Kherson National Technical University, Ukraine
3Taras Shevchenko National University of Kyiv, Kyiv, Ukraine
4Ben-Gurion University of Negev, Beer Sheva, Israel

Повна стаття (PDF), УКР


Introduction. The development of regression models for predicting percutaneous penetration or transdermal penetration of pesticides using molecular descriptors is crucial for assessing the toxicological impact of these chemicals on human health and the environment. Some pesticides, such as DDT and aldrin, remain persistent in the environment, and percutaneous penetration or transdermal penetration is one of the primary routes of exposure. This study aims to deepen the understanding of the factors influencing percutaneous penetration or transdermal penetration, with an emphasis on the need for alternative methods to predict this process without conducting extensive in vivo or in vitro tests.

The aim of the research – to develop regression models for predicting percutaneous penetration or transdermal penetration using molecular descriptors and feature selection methods to improve model accuracy and reduce the need for costly and ethically questionable experimental procedures.

Materials and methods of the research. The study analyzed data from 70 pesticide formulations subjected to testing for percutaneous penetration or transdermal penetration, with the calculation of the permeability coefficient (Kp) based on molecular weight and lipophilicity. Molecular descriptors, such as aromatic rings, molecular flexibility, and polarity, were computed using the rcdk and rcdklibs libraries in the R programming environment. Feature selection methods based on mutual information, including CMIM, DISR, and NJMIM, were applied to identify the most significant descriptors.

Results. The study showed that regression models using machine learning algorithms, such as Random Forest, SVR, and Multilayer Perceptron, demonstrated significant improvement in predicting percutaneous penetration or transdermal penetration when combined with feature selection methods. Models that included molecular descriptors such as aromatic rings, planarity, and flexibility showed the highest correlation with percutaneous penetration or transdermal penetration. Feature selection methods CMIM and NJMIM yielded the best results, especially when using 20–30 descriptors, with the most reliable predictions obtained with Random Forest and IBK models.

Conclusions. The study confirms the utility of molecular descriptors and modern feature selection methods for predicting percutaneous penetration or transdermal penetration of pesticides. These methods provide a more efficient and ethical alternative to traditional experimental approaches, allowing for more accurate risk assessments for human health. The results contribute to the development of more accurate predictive models in toxicology and environmental safety.

Key words: percutaneous penetration or transdermal penetration of pesticides, molecular descriptors, feature selection methods, mutual information, regression models, toxicology, machine learning, Random Forest, environmental safety


References


1. Janicka M, Śliwińska A. Quantitative Retention (Structure) – Activity Relationships in Predicting the Pharmaceutical and Toxic Properties of Potential Pesticides. Molecules. 2022;27(11):3599. DOI: https://doi.org/10.3390/molecules27113599.
2. Yu R, Zhou Y, Xu S, Jing J, Zhang H, Huang Y. Distribution, Transfer, and Health Risk of Organochlorine Pesticides in Soil and Water of the Huangshui River Basin. Toxics. 2023;11(12):1024. DOI: https://doi.org/10.3390/toxics11121024.
3. Basak SC, Mills D, Mumtaz MM. A quantitative structure–activity relationship (QSAR) study of dermal absorption using theoretical molecular descriptors. SAR and QSAR in Environmental Research. 2007;18(1-2):45-55. DOI: https://doi.org/10.1080/10629360601033671.
4. Chang YC, CHEN CP, CHEN CC. Predicting skin permeability of chemical substances using a quantitative structure-activity relationship. Procedia Engineering.
2012;45: 875-79. DOI: https://doi.org/10.1016/j.proeng.2012.08.252.
5. Lian G, Chen L, Zhao X. Prediction of skin permeability using artificial neural network (ANN) and support vector machine (SVM) approaches. SAR and QSAR in Environmental Research. 2008;19(5-6):533-48.
6. Mitragotri S. Modeling skin permeability to hydrophilic and hydrophobic solutes based on four permeation pathways. Journal of Controlled Release. 2003;
86(1):69-92. DOI: https://doi.org/10.1016/S0168-3659(02)00321-8.
7. Flynn GL. Physicochemical determinants of skin absorption. In: Bronaugh RL, Maibach HI, editors. Percutaneous Absorption: Drugs, Cosmetics, Mechanisms, Methodology. New York: Marcel Dekker; 1990. p. 93-127.
8. Alves VM, et al. Predicting chemically-induced skin reactions. Part I: QSAR models of skin sensitization and their application to identify potentially hazardous compounds. Toxicology and Applied Pharmacology. 2015;(2):262-72. DOI: https://doi.org/10.1016/J.TAAP.2014.12.014.
9. Huque SA, et al. Modeling and prediction of skin permeability using molecular descriptors in silicon approaches. Molecular Informatics. 2017.
10. Lee S, et al. QSAR modelling for dermal permeation of chemicals: Application to regulatory assessment. Regulatory Toxicology and Pharmacology. 2018;
94:231-38. DOI: https://doi.org/10.1016/j.yrtph.2018.02.011.
11. Carneiro SM, et al. QSAR analysis and prediction of skin permeability of organic compounds. Journal of Molecular Graphics and Modelling. 2007. DOI: https://doi.org/10.1016/j.jmgm.2007.04.004.
12. Juračka J, Šrejber M, Melíková M, Bazgier V, Berka K. MolMeDB: A database for membrane permeation data of small molecules. Database. 2019;2019:baz078. DOI: https://doi.org/10.1093/database/baz078.
13. Jacques D, et al. QSAR and mechanistic interpretation of skin permeability of organic chemicals. Environmental Toxicology and Pharmacology. 2020. DOI: https://doi.org/10.1016/j.etap.2020.103451.
14. Ekins S, et al. Computational approaches for predicting ADME properties. Molecular Pharmaceutics. 2007.
15. Veith GD, et al. Using QSAR to predict the properties of environmental chemicals. Environmental Science & Technology. 1988. DOI: https://doi.org/10.1021/es00162a002.
16. Flynn GL. Physicochemical determinants of skin absorption. In Bronaugh & Maibach, editors. Percutaneous Absorption. Marcel Dekker; 2000.
17. Potts RO, Guy RH. A predictive algorithm for skin permeability: the Potts and Guy equation. Journal of Pharmaceutical Sciences. 1992. DOI: https://doi.org/10.1002/jps.2600810404.
18. Lian G, et al. Prediction of skin permeability using artificial neural network (ANN) and support vector machine (SVM) approaches. SAR and QSAR in Environmental Research. 2008;19(5-6):533-48.
19. Huque SA, et al. Modeling and prediction of skin permeability using molecular descriptors in silicon approaches. Molecular Informatics. 2017.
20. Carneiro SM, et al. QSAR analysis and prediction of skin permeability of organic compounds. Journal of Molecular Graphics and Modelling. 2007. DOI: https://doi.org/10.1016/j.jmgm.2007.04.004.
21. Lee S, et al. QSAR modeling for dermal permeation of chemicals: Application to regulatory assessment. Regulatory Toxicology and Pharmacology. 2018;94:231-38. DOI: https://doi.org/10.1016/j.yrtph.2018.02.011.
22. Potts RO, Guy RH. A predictive algorithm for skin permeability: the Potts and Guy equation. Journal of Pharmaceutical Sciences. 1992. DOI: https://doi.org/10.1002/jps.2600810404.
23. Potts RO, Guy RH. Predicting skin permeability. Pharmaceutical Research. 1992;9(5):663-69. DOI:https://doi.org/10.1023/A:1015810312465.
24. Brown G, Pocock A, Zhao M, Lujan M. Conditional likelihood maximisation: a unifying framework for information theoretic feature selection Journal of Machine Learning Research. 2012;13:27-66.
25. Fleuret F. Fast binary feature selection with conditional mutual information. Journal of Machine Learning Research. 2004;5:1531-55.
26. Meyer PE, Bontempi G. On the Use of Variable Complementarity for Feature Selection in Cancer Classification. In: Rothlauf F, et al. Applications of Evolutionary Computing. Lecture Notes in Computer Science. EvoWorkshops 2006. vol 3907. Berlin, Heidelberg: Springer; 2006. DOI: https://doi.org/10.1007/11732242_9.
27. Peng H, Long F, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2005;27:1226-38. DOI: https://doi.org/10.1109/TPAMI.2005.159.
28. Bennasar M, Hicks Y, Setchi R. Feature selection using Joint Mutual Information Maximisation. Expert Systems with Applications. 2015;42(22):8520-32. DOI:
https://doi.org/10.1016/j.eswa.2015.07.007.
29. Yang HH, Moody J. Data visualization and feature selection: New algorithms for nongaussian data. In Advances in Neural Information Processing Systems (NIPS'99). 2000. p. 687-693.
30. Rego-Fernandez D, Bolón-Canedo V, Alonso-Betanzos A. Scalability analysis of mRMR ’ for microarray data. In: Proceedings of the 6th International Conference on Agents and Artificial Intelligence, March 6–8, 2014, Angers, France. ICAART 2014. Vol. 1. p. 380–386. DOI: https://doi.org/10.5220/0004807703800386.
31. Altman NS. An introduction to kernel and nearest-neighbor nonparametric regression. The AmericanStatistician, 1992;46(3):175-85. DOI: https://doi.org/10.1080/00031305.1992.10475879.
32. Breiman L, Friedman J, Stone CJ, Olshen RA. Classification and regression trees. Taylor & Francis, 1984. 368 p.
34. Vapnik VN. The nature of statistical learning theory. Springer Science & Business Media, 1995. DOI:https://doi.org/10.1007/978-1-4757-2440-0.
35. Rumelhart DE, Hinton GE, Williams RJ. Learning representations by back-propagating errors.
Nature. 1986;323(6088):533-36. DOI: https://doi.org/10.1038/323533a0.
36. Breiman L. Random Forests. Machine Learning. 2001;45(1):5-32. DOI: https://doi.org/10.1023/A:1010933404324.
37. Willmott CJ, Matsuura K. Advantages of the Mean Absolute Error (MAE) over the Root Mean Square Error (RMSE). Climate Research. 2005;30(1):79-82. DOI: http://dx.doi.org/10.3354/cr030079.
38. Chai T, Draxler RR. Root Mean Square Error as a Standard Criterion for Model Evaluation and Comparison.2014.
39. Tenenhaus M. Statistics for Management and Economics. 2007.
40. Hyndman R, Koehler AB. Another Look at Measures of Forecast Accuracy, International Journal of Forecasting. 2006; 22(4), 679-88. DOI: https://doi.org/10.1016/j.ijforecast.2006.03.001.
41. Draper NR, Smith H. Applied Regression Analysis. John Wiley & Sons, 1998. 736 p. DOI: https://doi.org/10.1002/9781118625590.