Applying data mining techniques for forecasting geochemical anomalies: Fid-Bau Portal

Applying data mining techniques for forecasting geochemical anomalies

Hezarkhani, Bita

Undoubtedly, forecasting the anomalous values could play an important role in the inchoate stages of exploration. Therefore, it is essential to find the most accurate approach to separate anomalous values from background and afterward use the results to anticipate each arbitrary sample. In this study, results of a structural multivariate method (the combination of Mahalanobis distance and U-statistics algorithms) are used to distinguish anomalous values from background. Then, three data mining methods will be applied to produce practical equations and finally estimate anomalous values. Actually, at the first, separation of geochemical anomalies, based on the combination of the U-statistics and the Mahalanobis distance approaches, would be done. Afterward, three data mining methods, K nearest neighbor (K-NN), decision tree, and naïve Bayes classifier have been applied based on separation results and the other parameters – x and y coordinates and Cu and Mo grades. For this purpose, after separation of anomalous values according to the number of 377 collected surface samples from Parkam porphyry system by applying above combination, the data mining methods would be utilized to anticipate anomalous values for each unknown point. Finally, in order to judge about the designed networks, training samples would be considered as test samples under the application of the networks. Results show that the decision tree method would appear as the more powerful approach than the other due to far fewer number of wrong estimated samples and approving high accuracy of the designed network. Noteworthy is that resubstitution error for this network is noted only 0.0212 based on numbers of wrong estimated samples (8 from 377). Whereas, the numbers of wrong estimated samples for K-NN and naïve Bayes methods are, respectively, 13 and 46 and the rates of error are 0.0345 and 0.122. Moreover, it was observed that the estimated samples (the joint points of Cu and Mo) delineated by decision tree method are closely associated with the defined zone of potassic alteration in the study area. Finally, according to results, it can be said that the combination of decision tree method and the introduced anomaly separation approach could be applied as a reliable and efficient technique to approach worthwhile predictions.