Skip to content

Walk trees efficiently by excluding irrelevant subfolders #3

@amynbe

Description

@amynbe

Java was lacking a good library for wildcard matching, so thank you for writing this one, especially with the performance and reliability mindset that is immediately visible from your README.

When we hear globbing library, however, we usually think about command line globbing of file system paths.
And in this area I thing a gap still needs to be filled with respect to folder filtering.

I/O being expensive, if I have have a deep and bushy file tree, even just asking the OS to bring up the list of paths recursively can take several minutes.

The top answer here shows how to skip whole subtrees with Files.walkFileTree.

In kotlin I've used this to address a (ridiculously restricted) scenario in a quick script I was writing:

import java.io.IOException
import java.nio.file.*
import java.nio.file.attribute.BasicFileAttributes
import kotlin.io.path.ExperimentalPathApi
import kotlin.io.path.Path
import kotlin.io.path.name

@ExperimentalPathApi
class GlobVisitor(
    private val matcher: PathMatcher,
    private val matches: MutableList<Path>,
    private val glob: String
) : SimpleFileVisitor<Path>() {
    override fun visitFile(file: Path, attrs: BasicFileAttributes)
            : FileVisitResult {
        if (matcher.matches(file)) matches.add(file)
        return FileVisitResult.CONTINUE
    }

    override fun visitFileFailed(file: Path, e: IOException)
            : FileVisitResult = FileVisitResult.SKIP_SUBTREE

    override fun preVisitDirectory(
        dir: Path,
        attrs: BasicFileAttributes
    ): FileVisitResult {
        val split = glob
            .substringAfter("glob:")
            .split("/")
        val currentDirMatcher = split.elementAtOrNull(dir.nameCount)

        return if (dir.nameCount == 0 ||
            currentDirMatcher == "*" ||
            dir.name == currentDirMatcher
        ) FileVisitResult.CONTINUE
        else FileVisitResult.SKIP_SUBTREE
    }

    companion object {
        fun walkFileTree(start: String, glob: String): MutableList<Path> {
            val matches = mutableListOf<Path>()
            val matcher = FileSystems.getDefault().getPathMatcher(glob)
            Files.walkFileTree(
                Path(start),
                setOf(FileVisitOption.FOLLOW_LINKS),
                Int.MAX_VALUE,
                GlobVisitor(matcher, matches, glob)
            )
            return matches
        }
    }
}

then called like this:

    val matches = GlobVisitor.walkFileTree(
        "J:/",
        "glob:J:/*/*/src/server/main/java/*.properties"
    )

where preVisitDirectory does the folder skipping, e.g will skip J:/*/*/blah because "blah" != "src".

Again, my example is ridiculously restricted, it requires absolute path, doesn't support recursive wildcard **, etc.

But I think this kind of feature would fit nicely in your library glob-library-java.

The only other java library to have this, that I know of, is DirectoryScanner from Apache Ant https://ant.apache.org/manual/api/org/apache/tools/ant/DirectoryScanner.html

Very powerful but requires the 2.2MB Ant Core library which looks a bit overkill regarding the required feature.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions