Modification of the random forest algorithm to avoid statistical dependence problems when classifying remote sensing imagery

Cánovas García, Fulgencio; Alonso Sarria, Francisco; Gomariz Castillo, Francisco; Oñate Valdivieso, Fernando

Publication:
Modification of the random forest algorithm to avoid statistical dependence problems when classifying remote sensing imagery

Files

2017_CanovasGarcia_etal_AcceptedVersion.pdf(7.98 MB)

Date

2017

Authors

Cánovas García, Fulgencio ; Alonso Sarria, Francisco ; Gomariz Castillo, Francisco ; Oñate Valdivieso, Fernando

Publisher

Pergamon-Elsevier Science Ltd

publication.page.department

Geografía

DOI

https://doi.org/10.1016/j.cageo.2017.02.012

item.page.type

info:eu-repo/semantics/article

Description

Abstract

Random forest is a classification technique widely used in remote sensing. One of its advantages is that it produces an estimation of classification accuracy based on the so called out-of-bag cross-validation method. It is usually assumed that such estimation is not biased and may be used instead of validation based on an external data-set or a cross-validation external to the algorithm. In this paper we show that this is not necessarily the case when classifying remote sensing imagery using training areas with several pixels or objects. According to our results, out-of-bag cross-validation clearly overestimates accuracy, both overall and per class. The reason is that, in a training patch, pixels or objects are not independent (from a statistical point of view) of each other; however, they are split by bootstrapping into in-bag and out-of-bag as if they were really independent. We believe that putting whole patch, rather than pixels/objects, in one or the other set would produce a less biased out-of-bag cross-validation. To deal with the problem, we propose a modification of the random forest algorithm to split training patches instead of the pixels (or objects) that compose them. This modified algorithm does not overestimate accuracy and has no lower predictive capability than the original. When its results are validated with an external data-set, the accuracy is not different from that obtained with the original algorithm. We analysed three remote sensing images with different classification approaches (pixel and object based); in the three cases reported, the modification we propose produces a less biased accuracy estimation.

publication.page.subject

Classifcation , Random Forest , Object-based image analysis , Bagging , Statistical independence

Citation

Computers & Geosciences, 103. 2017

URI

http://hdl.handle.net/10201/138091

Collections

Artículos

Full item page

Ir a Estadísticas

Este ítem está sujeto a una licencia Creative Commons. http://creativecommons.org/licenses/by-nc-nd/4.0/

Publication:
Modification of the random forest algorithm to avoid statistical dependence problems when classifying remote sensing imagery

Files

Date

relationships.isAuthorOfPublication

relationships.isSecondaryAuthorOf

relationships.isDirectorOf

Authors

item.page.secondaryauthor

item.page.director

Publisher

publication.page.editor

publication.page.department

DOI

item.page.type

Description

Abstract

publication.page.subject

Citation

URI

item.page.embargo

Collections

Publication: Modification of the random forest algorithm to avoid statistical dependence problems when classifying remote sensing imagery

Files

Date

relationships.isAuthorOfPublication

relationships.isSecondaryAuthorOf

relationships.isDirectorOf

Authors

item.page.secondaryauthor

item.page.director

Publisher

publication.page.editor

publication.page.department

DOI

item.page.type

Description

Abstract

publication.page.subject

Citation

URI

item.page.embargo

Collections

Publication:
Modification of the random forest algorithm to avoid statistical dependence problems when classifying remote sensing imagery