test: add file discovery tests for walk_nix_files#2668
Conversation
aa27ccf to
9db4576
Compare
|
Ready for your review @mightyiam, couldn't push to sinrohit-desco's fork directly, so the refactor from #2566 is folded into this branch as a single commit rather than kept separate; the description above has the attribution and details. |
|
Sorry about the CI issues. I will look into those. In the meantime, could you convert this to testing existing behavior, instead, please? |
9db4576 to
f70898a
Compare
|
Ready for your review @mightyiam |
| // Tests for which files get linted given different CLI argument combinations, | ||
| // covering the current file-discovery behavior (Walker / build_ignore_set in | ||
| // bin/src/dirs.rs). Each test writes files into a tempdir, runs `statix check | ||
| // -o errfmt`, and inspects which paths appear in the output. | ||
| // | ||
| // We use LINTABLE content in every file we *want* to see linted, so that | ||
| // statix will emit an errfmt line for it. Files that are discovered but | ||
| // contain no lints would be invisible to us — using a lintable expression | ||
| // sidesteps that ambiguity. | ||
| // | ||
| // errfmt line format: `<path>>line:col:severity:code:message` |
There was a problem hiding this comment.
I appreciate the effort, but I don't see value in such comments.
There was a problem hiding this comment.
I forgot to remove it; I was trying to document the learning along.
| fn statix() -> Command { | ||
| Command::new(env!("CARGO_BIN_EXE_statix")) | ||
| } |
There was a problem hiding this comment.
If this is used only once, perhaps it should be inlined?
| /// Triggers the `bool_simplification` lint, so the file always appears in | ||
| /// `-o errfmt` output if statix visits it. | ||
| const LINTABLE: &str = "!(a == b)\n"; |
There was a problem hiding this comment.
Maybe instead of a comment, a name like CODE_THAT_TRIGGERS?
| /// `-o errfmt` output if statix visits it. | ||
| const LINTABLE: &str = "!(a == b)\n"; | ||
|
|
||
| /// Parse errfmt output into the set of file paths that produced findings. |
There was a problem hiding this comment.
I would suggest that function names be expanded to explain what they do sufficiently, instead of such comments.
| fn write(path: &Path, content: &str) { | ||
| std::fs::write(path, content).unwrap(); | ||
| } |
There was a problem hiding this comment.
I don't think we need this function.
f70898a to
0ab0b8a
Compare
|
Addressed the comment, naming, and inlining feedback. Still holding off on the errfmt JSON switch pending #2674. |
| insta.workspace = true | ||
| paste.workspace = true | ||
| strip-ansi-escapes.workspace = true | ||
| tempfile = "3" |
There was a problem hiding this comment.
Is the divergence from the general dependency declaration pattern intentional?
There was a problem hiding this comment.
Not intentional, and turned out to be more than just a style mismatch tempfile is already a regular dependency of this crate (tempfile.workspace = true under [dependencies]), so tests already had access to it without anything extra. The dev-dependency line was fully redundant.
| const CODE_THAT_TRIGGERS_A_LINT: &str = "!(a == b)\n"; | ||
|
|
||
| fn file_paths_from_errfmt_output(stdout: &[u8]) -> HashSet<PathBuf> { | ||
| String::from_utf8_lossy(stdout) |
There was a problem hiding this comment.
Would from_utf8 with expect be a tiny bit safer?
There was a problem hiding this comment.
Agreed, switched to str::from_utf8(stdout).expect(...). If statix ever emitted non-UTF-8 bytes, the lossy version would silently substitute replacement characters and we'd get a confusing assertion failure downstream, this fails loudly right where the actual problem is instead
| fn file_paths_from_errfmt_output(stdout: &[u8]) -> HashSet<PathBuf> { | ||
| String::from_utf8_lossy(stdout) | ||
| .lines() | ||
| .filter_map(|line| line.split('>').next()) |
There was a problem hiding this comment.
Do we expect lines without >?
There was a problem hiding this comment.
Checked this against real output, lint hits, parse errors, multiple files, multiple findings in one file. Every line statix emits under -o errfmt contains exactly one >. Worth noting the old code wasn't actually filtering on that anyway: str::split always yields at least the original string even when the delimiter isn't present, so split('>').next() never excluded anything regardless of whether > showed up. Switched to split_once('>').expect(...), so a line without > now fails loudly instead of silently being treated as a bare path.
| String::from_utf8_lossy(stdout) | ||
| .lines() | ||
| .filter_map(|line| line.split('>').next()) | ||
| .filter(|s| !s.is_empty()) |
There was a problem hiding this comment.
So we actually expect arbitrary empty lines? Otherwise, if we expect a single empty line, perhaps at the end, maybe handle that specifically, such as with whitespace trimming?
There was a problem hiding this comment.
Checked this too, across clean files, an empty directory, syntax errors, and multiple files, there were never any blank lines, not even a trailing one after the last entry (.lines() doesn't yield a trailing empty string for `\n-terminated input, and zero findings means completely empty stdout, not even one newline). So the filter was guarding against something that doesn't actually happen. Removed it.
| .arg("errfmt") | ||
| .args(args) | ||
| .output() | ||
| .expect("failed to run statix"); |
There was a problem hiding this comment.
| .expect("failed to run statix"); | |
| .expect("should succeed"); |
I think that this would be more idiomatic
| .arg("-o") | ||
| .arg("errfmt") |
There was a problem hiding this comment.
Since these two belong together, perhaps a bit nicer would be to group them together in one args call?
| let dir = tempfile::tempdir().unwrap(); | ||
| let a = dir.path().join("a.nix"); | ||
| let b = dir.path().join("b.nix"); | ||
| std::fs::write(&a, CODE_THAT_TRIGGERS_A_LINT).unwrap(); | ||
| std::fs::write(&b, CODE_THAT_TRIGGERS_A_LINT).unwrap(); | ||
|
|
||
| let linted = linted_file_paths([&a]); | ||
|
|
||
| assert!( | ||
| linted.contains(&a), | ||
| "explicitly passed file should be linted" | ||
| ); | ||
| assert!(!linted.contains(&b), "file not passed should not be linted"); |
There was a problem hiding this comment.
This makes me want more declarative testing.
In a test, I would expect to see no procedural details such as creating directories or writing files.
Instead, I imagine a function call that declares which files should exist and with what arguments the subject is invoked.
Then, some assertions on its return value.
Does that make sense?
There was a problem hiding this comment.
Makes sense, and I think this lands well. Ended up with one helper, linted_file_paths(files, extra_args, target): files is a list of (relative path, content) pairs to create in a fresh tempdir, extra_args is the CLI flags, and target is either a relative path or None for the tempdir root itself. It handles tempdir setup, nested directories, running statix, and translating the output paths back to plain relative strings. Every test is now one call plus one assert_eq!, no std::fs or path-joining visible in the test bodies at all.
One side effect I liked: a few assertions are now exact-set comparisons instead of partial containment checks, e.g. directory_target_skips_non_nix_files now asserts the complete expected set rather than checking presence/absence separately, which is strictly stronger. Happy to adjust the shape if you had something different in mind.
|
Just make sure to push at some point. |
0ab0b8a to
633d8d8
Compare
mightyiam
left a comment
There was a problem hiding this comment.
I like how this is shaping up 👍
| fn linted_file_paths( | ||
| files: &[(&str, &str)], | ||
| extra_args: &[&str], | ||
| target: Option<&str>, |
There was a problem hiding this comment.
Since &str can be empty, does this being an Option add value?
There was a problem hiding this comment.
Good point, the Option doesn't add value here. Will change to &str with "" meaning "use the whole directory."
| /// findings. | ||
| fn linted_file_paths( | ||
| files: &[(&str, &str)], | ||
| extra_args: &[&str], |
There was a problem hiding this comment.
Does having separate extra_args and target have benefit over perhaps having just one args?
There was a problem hiding this comment.
The reason they're currently separate is that target needs to be resolved against the tempdir to an absolute path before being passed to statix, while extra_args are flags passed verbatim. If merged into one args, the helper would need some way to tell which args are paths to resolve against the tempdir and which aren't e.g. "--ignore" and "generated.nix" are both non-flag-looking strings in different roles. Happy to rethink the shape if you have a design in mind though.
There was a problem hiding this comment.
Perhaps instead of resolving it against the tempdir, set the working directory of the process to the tempdir?
There was a problem hiding this comment.
That's much cleaner. Running statix from inside the tempdir means all relative paths in args work naturally, no resolution logic needed, and the output paths come back already relative to the tempdir. Will fold extra_args and target into a single args: &[&str] using this approach.
| Some("a.nix"), | ||
| ); | ||
|
|
||
| assert_eq!(linted, HashSet::from(["a.nix".to_owned()])); |
There was a problem hiding this comment.
Can you find an elegant way to avoid extra syntax here including function calls such as HashSet::from and to_owned, to make it more concise?
There was a problem hiding this comment.
What about a small helper in the file, something like fn strs<const N: usize>(s: [&str; N]) -> HashSet<String> so assertions read as assert_eq!(linted, strs(["a.nix"])) and assert_eq!(linted, strs(["kept.nix", "ignored.nix"]))? Open to other ideas if you have something cleaner in mind.
There was a problem hiding this comment.
Well, the assert_eq uses equality. That's the PartialEq trait. Perhaps a newtype and a trait implementation would do the trick?
There was a problem hiding this comment.
Makes sense, a newtype with PartialEq<[&str; N]> and Debug so assertions read as assert_eq!(linted, ["a.nix"]) with no extra syntax. Will implement that.
| .collect() | ||
| } | ||
|
|
||
| // ── Single explicit file ────────────────────────────────────────────────────── |
There was a problem hiding this comment.
Could modules be used to categorize integration tests, instead? I do mean modules in the same file. I never looked into that.
There was a problem hiding this comment.
Yes, this works well in Rust #[test] functions inside an inline mod are discovered and run normally by cargo test, and the output shows the module path (gitignore::files_are_excluded etc.), which reads more clearly than the current long function names. Will restructure using modules and shorten the test names since the module provides the context.
633d8d8 to
57823ce
Compare
| impl fmt::Debug for LintedPaths { | ||
| fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { | ||
| let mut paths: Vec<_> = self.0.iter().collect(); | ||
| paths.sort(); | ||
| f.debug_set().entries(paths).finish() | ||
| } | ||
| } |
There was a problem hiding this comment.
Does this provide more value than deriving Debug?
There was a problem hiding this comment.
The benefit is sorted output in failure messages. HashSet iteration is non-deterministic, so the debug output would vary between runs with #[derive(Debug)]. That said it's a minor benefit since test failures are rare, and the extra code probably isn't worth it. Will switch to #[derive(Debug)].
| let linted = linted_file_paths( | ||
| &[ | ||
| ("a.nix", CODE_THAT_TRIGGERS_A_LINT), | ||
| ("b.nix", CODE_THAT_TRIGGERS_A_LINT), | ||
| ], | ||
| &["."], | ||
| ); | ||
|
|
||
| assert_eq!(linted, ["a.nix", "b.nix"]); |
There was a problem hiding this comment.
I just tried to imagine some other ways this could be represented and here the best I could think of:
| let linted = linted_file_paths( | |
| &[ | |
| ("a.nix", CODE_THAT_TRIGGERS_A_LINT), | |
| ("b.nix", CODE_THAT_TRIGGERS_A_LINT), | |
| ], | |
| &["."], | |
| ); | |
| assert_eq!(linted, ["a.nix", "b.nix"]); | |
| let report = Fixture::with_files(&[ | |
| ("a.nix", CODE_THAT_TRIGGERS_A_LINT), | |
| ("b.nix", CODE_THAT_TRIGGERS_A_LINT), | |
| ]) | |
| .run_with_args(&["."]); | |
| assert_eq!(report.paths, ["a.nix", "b.nix"]); |
What I like about it is that I think it can be understood without reading anything else.
What do you think?
There was a problem hiding this comment.
Yes, that's noticeably cleaner, the builder chain reads as a sentence and report.paths makes the assertion's subject explicit without needing to know what linted_file_paths returns. Implementing it.
57823ce to
96c62f4
Compare
| if let Some(parent) = path.parent() { | ||
| std::fs::create_dir_all(parent).unwrap(); | ||
| } |
There was a problem hiding this comment.
🤔 would that work?
| if let Some(parent) = path.parent() { | |
| std::fs::create_dir_all(parent).unwrap(); | |
| } | |
| std::fs::create_dir_all(path.parent().unwrap()).unwrap(); |
| } | ||
| } | ||
|
|
||
| fn run_with_args(self, args: &[&str]) -> Report { |
There was a problem hiding this comment.
I think this should return Result. The unwrap should be in each test.
There was a problem hiding this comment.
Agreed, will return Result from run_with_args and use ? throughout, with .unwrap() in each test.
| let (path, _) = line | ||
| .split_once('>') | ||
| .expect("every errfmt line should contain '>'"); | ||
| path.strip_prefix("./").unwrap_or(path).to_owned() |
There was a problem hiding this comment.
🤔 unwrap_or makes me think that only some cases will have a ./ prefix. But is that true?
There was a problem hiding this comment.
Yes, only some cases have a ./ prefix, empirically verified. When statix is run with a directory target (.), it outputs paths as ./a.nix. When run with an explicit file path (a.nix), it outputs a.nix without the prefix. The strip_prefix normalizes both forms so assertions can use bare relative paths consistently.
There was a problem hiding this comment.
🤔 okay, I see. Well, should these assertions enjoy from such normalization? Shouldn't assertions portray the reality of output? Yes, these assertions don't specifically test the formatting of the output, that's not what they're about. But, still, is it of benefit to normalize them? Would it be simpler and perfectly acceptable for the paths in the assertions to portray the reality of output?
There was a problem hiding this comment.
Agreed. Removing the normalization assertions will show the raw output, with ./ prefix for directory targets and no prefix for explicit file paths. Simpler and more honest.
| } | ||
|
|
||
| struct Report { | ||
| paths: LintedPaths, |
There was a problem hiding this comment.
Some of the linted paths are not in the report. This seems more like ReportPaths maybe?
| use super::*; | ||
|
|
||
| #[test] | ||
| fn is_linted() { |
There was a problem hiding this comment.
| fn is_linted() { | |
| fn only_it_is_linted() { |
| // Lintable content in non-.nix files, to confirm extension | ||
| // filtering rather than parser failure. |
There was a problem hiding this comment.
| // Lintable content in non-.nix files, to confirm extension | |
| // filtering rather than parser failure. |
| } | ||
| } | ||
|
|
||
| mod gitignore { |
There was a problem hiding this comment.
| mod gitignore { | |
| mod gitignored_files { |
| use super::*; | ||
|
|
||
| #[test] | ||
| fn files_are_excluded() { |
There was a problem hiding this comment.
| fn files_are_excluded() { | |
| fn are_excluded() { |
| fn files_are_excluded() { | ||
| let report = Fixture::with_files(&[ | ||
| (".gitignore", "ignored.nix\n"), | ||
| ("kept.nix", CODE_THAT_TRIGGERS_A_LINT), |
There was a problem hiding this comment.
| ("kept.nix", CODE_THAT_TRIGGERS_A_LINT), | |
| ("linted.nix", CODE_THAT_TRIGGERS_A_LINT), |
| } | ||
|
|
||
| #[test] | ||
| fn directories_are_excluded() { |
There was a problem hiding this comment.
| fn directories_are_excluded() { | |
| fn in_subdirs_are_excluded() { |
96c62f4 to
cc6587a
Compare
| } | ||
| } | ||
|
|
||
| fn run_with_args(self, args: &[&str]) -> Result<ReportPaths, Box<dyn std::error::Error>> { |
There was a problem hiding this comment.
| fn run_with_args(self, args: &[&str]) -> Result<ReportPaths, Box<dyn std::error::Error>> { | |
| fn run_with_args(self, args: &[&str]) -> Result<Report, Box<dyn std::error::Error>> { |
| const CODE_THAT_TRIGGERS_A_LINT: &str = "!(a == b)\n"; | ||
|
|
||
| #[derive(Debug)] | ||
| struct LintedPaths(HashSet<String>); |
There was a problem hiding this comment.
| struct LintedPaths(HashSet<String>); | |
| struct Paths(HashSet<String>); |
| struct ReportPaths { | ||
| paths: LintedPaths, | ||
| } |
There was a problem hiding this comment.
| struct ReportPaths { | |
| paths: LintedPaths, | |
| } | |
| struct Report { | |
| paths: Paths, | |
| } |
| ("a.nix", CODE_THAT_TRIGGERS_A_LINT), | ||
| ("b.nix", CODE_THAT_TRIGGERS_A_LINT), | ||
| ]) | ||
| .run_with_args(&[]) |
There was a problem hiding this comment.
There is already a test specifically for the default. Other than that test I think a directory path should be passed, and it should be different than the default. Does that make sense to you?
There was a problem hiding this comment.
Makes sense. I think that means putting the fixture files inside a named subdirectory and passing that name as the explicit target e.g. run_with_args(&["src"]) with files at "src/a.nix" and "src/b.nix". The paths in assertions would update to "./src/a.nix" etc. Does that match what you had in mind?
There was a problem hiding this comment.
Yes.
src is arbitrary and may as well be dir?
There was a problem hiding this comment.
Note that I do not mean that ("a.nix", CODE_THAT_TRIGGERS_A_LINT) would end up in dir. It would have to be explicit, such as ("dir/a.nix", CODE_THAT_TRIGGERS_A_LINT).
cc6587a to
c68112c
Compare
| let paths = std::str::from_utf8(&output.stdout)? | ||
| .lines() | ||
| .map(|line| { | ||
| let (path, _) = line | ||
| .split_once('>') | ||
| .expect("every errfmt line should contain '>'"); | ||
| path.to_owned() | ||
| }) | ||
| .collect(); |
There was a problem hiding this comment.
| let paths = std::str::from_utf8(&output.stdout)? | |
| .lines() | |
| .map(|line| { | |
| let (path, _) = line | |
| .split_once('>') | |
| .expect("every errfmt line should contain '>'"); | |
| path.to_owned() | |
| }) | |
| .collect(); | |
| let paths = std::str::from_utf8(&output.stdout)? | |
| .lines() | |
| .map(|line| { | |
| let (path, _) = line.split_once('>')?; | |
| Ok(path.to_owned()) | |
| }) | |
| .collect::<Result<_>>(); |
| #[test] | ||
| fn does_not_exclude_non_matching_files() { | ||
| let report = Fixture::with_files(&[ | ||
| ("a.nix", CODE_THAT_TRIGGERS_A_LINT), | ||
| ("b.nix", CODE_THAT_TRIGGERS_A_LINT), | ||
| ]) | ||
| .run_with_args(&["--ignore", "b.nix"]) | ||
| .unwrap(); | ||
|
|
||
| assert_eq!(report.paths, ["./a.nix"]); | ||
| } |
There was a problem hiding this comment.
Does the test immediately above this one demonstrate this already?
There was a problem hiding this comment.
You're right, excludes_matching_file already shows linted.nix appearing, which directly demonstrates that non-matching files aren't excluded. Removing does_not_exclude_non_matching_files
There was a problem hiding this comment.
I think you meant to remove this test.
There was a problem hiding this comment.
Removed does_not_exclude_non_matching_files, applied the path parsing suggestion (needed .ok_or(...) to bridge Option → Result, and two type params on collect), and rebased.
|
Hrm... also a rebase wouldn't hurt, I suppose. |
1ae77ac to
7d95d7f
Compare
7d95d7f to
731c034
Compare
|
Alright! Now we have file discovery tests. |
|
Thank you, @Lymah123 🙏 |
Adds integration tests covering which files
walk_nix_filessurfaces for each combination of CLI arguments: explicit files, directory targets (including recursion and extension filtering),.gitignorehandling (files and directories),--unrestricted, and--ignore.These tests target the current
Walker/build_ignore_setimplementation inbin/src/dirs.rs, independent of the ignore::WalkBuilder refactor in #2566 per feedback, this tests existing behavior rather than building on that refactor.One thing worth noting for whenever #2566 comes up for evaluation: the current implementation uses
ignore::gitignore::GitignoreBuilderdirectly, which has no concept of being inside a git repository, it reads whatever.gitignorefile sits at the target path regardless.ignore::WalkBuilder(used in that refactor) only honors.gitignorewhen the target is inside a recognized git repository (require_gitdefaults totruein theignorecrate). So.gitignoreworks today even outside a git repo, and stops working under the refactor as currently written.gitignored_files_are_excludedandgitignored_directories_are_excludedhere lock in the current behavior, so they'd catch that difference directly if run against the refactored code.