Data Mining in the Analysis of Tree Harvester Performance Based on Automatically Collected Data: Fid-Bau Portal

Data Mining in the Analysis of Tree Harvester Performance Based on Automatically Collected Data

Krzysztof Polowy / Marta Molińska-Glura

Data recorded automatically by harvesters are a promising and potentially very useful source of information for scientific analyses. Most researchers have used StanForD files for this purpose, but these are troublesome to obtain and require some pre-processing. This study utilized a new source of similar data: JDLink, a cloud-based service, run by the machine manufacturer, that stores data from sensors in real time. The vast amount of such data makes it hard to comprehend and handle efficiently. Data mining techniques assist in finding trends and patterns in such databases. Records from two mid-sized harvesters working in north-eastern Poland were analyzed using classical regression (linear and logarithmic), cluster analysis (dendrograms and k-means) and Principal Component Analysis (PCA). Linear regression showed that average tree size was the variable having the greatest effect on fuel consumption per cubic meter and productivity, whereas fuel consumption per hour was also dependent, e.g., on distance driven in a low gear or share of time with high engine load. Results of clustering and PCA were harder to interpret. Dendrograms showed most dissimilar variables: total volume harvested per day, total fuel consumption per day and share of work time on high revolutions per minute (RPMs). K-means clustering allowed us to identify periods when specific clusters of variables were more prominent. PCA results, despite explaining almost 90% of variance, were inconclusive between machines, and, therefore, need to be scrutinized in follow-up studies. Productivity values (avg. around 10 m³/h) and fuel consumption rates (13.21 L/h, 1.335 L/m³ on average) were similar to the results reported by other authors under comparable conditions. Some new measures obtained in this study include, e.g., distance driven in a low gear (around 7 km per day) or proportion of time when the engine was running on low, medium or high load (34%, 39% and 7%, respectively). The assumption of this study was to use data without supplementing from external sources, and with as little processing as possible, which limited the analytic methods to unsupervised learning. Extending the database in follow-up studies will facilitate the application of supervised learning techniques for modeling and prediction.