Publication:
Modification of the random forest algorithm to avoid statistical dependence problems when classifying remote sensing imagery

dc.contributor.authorCánovas García, Fulgencio
dc.contributor.authorAlonso Sarria, Francisco
dc.contributor.authorGomariz Castillo, Francisco
dc.contributor.authorOñate Valdivieso, Fernando
dc.contributor.departmentGeografía
dc.date.accessioned2024-01-30T08:55:33Z
dc.date.available2024-01-30T08:55:33Z
dc.date.issued2017
dc.description©2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/es
dc.description.abstractRandom forest is a classification technique widely used in remote sensing. One of its advantages is that it produces an estimation of classification accuracy based on the so called out-of-bag cross-validation method. It is usually assumed that such estimation is not biased and may be used instead of validation based on an external data-set or a cross-validation external to the algorithm. In this paper we show that this is not necessarily the case when classifying remote sensing imagery using training areas with several pixels or objects. According to our results, out-of-bag cross-validation clearly overestimates accuracy, both overall and per class. The reason is that, in a training patch, pixels or objects are not independent (from a statistical point of view) of each other; however, they are split by bootstrapping into in-bag and out-of-bag as if they were really independent. We believe that putting whole patch, rather than pixels/objects, in one or the other set would produce a less biased out-of-bag cross-validation. To deal with the problem, we propose a modification of the random forest algorithm to split training patches instead of the pixels (or objects) that compose them. This modified algorithm does not overestimate accuracy and has no lower predictive capability than the original. When its results are validated with an external data-set, the accuracy is not different from that obtained with the original algorithm. We analysed three remote sensing images with different classification approaches (pixel and object based); in the three cases reported, the modification we propose produces a less biased accuracy estimation.es
dc.formatapplication/pdfes
dc.format.extent28es
dc.identifier.citationComputers & Geosciences, 103. 2017
dc.identifier.doihttps://doi.org/10.1016/j.cageo.2017.02.012
dc.identifier.urihttp://hdl.handle.net/10201/138091
dc.languageenges
dc.publisherPergamon-Elsevier Science Ltdes
dc.relationSin financiación externa a la Universidades
dc.relation.publisherversionhttps://www.sciencedirect.com/science/article/pii/S0098300416303909es
dc.rightsinfo:eu-repo/semantics/openAccesses
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectClassifcationes
dc.subjectRandom Forestes
dc.subjectObject-based image analysises
dc.subjectBagginges
dc.subjectStatistical independencees
dc.subject.otherCDU::9 - Geografía e historiaes
dc.titleModification of the random forest algorithm to avoid statistical dependence problems when classifying remote sensing imageryes
dc.typeinfo:eu-repo/semantics/articlees
dspace.entity.typePublicationes
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2017_CanovasGarcia_etal_AcceptedVersion.pdf
Size:
7.98 MB
Format:
Adobe Portable Document Format
Description:
Accepted version
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.26 KB
Format:
Item-specific license agreed upon to submission
Description:
Collections