Skip to content

#1176 - Drop CasDumpWriter#1627

Merged
reckart merged 4 commits into
mainfrom
removal/1176-Drop-CasDumpWriter
May 24, 2026
Merged

#1176 - Drop CasDumpWriter#1627
reckart merged 4 commits into
mainfrom
removal/1176-Drop-CasDumpWriter

Conversation

@reckart

@reckart reckart commented May 24, 2026

Copy link
Copy Markdown
Member

What's in the PR

  • Replace CasDumpWriter with CasToComparableTextWriter in CasFilter_ImplBaseTest, rewrite line-index assertions to be format-agnostic, drop stale imports
  • Replace CasDumpWriter with CasToComparableTextWriter in ApplyChangesBackmapperTest and regenerate output.txt reference
  • Replace CasDumpWriter with CasToComparableTextWriter in HtmlReaderTest
  • Replace CasDumpWriter with CasToComparableTextWriter in PdfReaderTest, regenerate test.dump reference, drop V2_PRETTY_PRINT setup
  • Default CasToComparableTextWriter to excluding documentUri/collectionId/documentBaseUri features so reference fixtures stay machine-independent
  • Switch IOTestRunner to use CasToComparableTextWriter instead of CasDumpWriter
  • Regenerate .dump reference fixtures in CSV format (bnc, html, negra ×4, tiger ×2, xces ×2)
  • Drop V2_PRETTY_PRINT @BeforeAll setup from HtmlReaderTest, HtmlDocumentReaderTest, TigerXmlReaderTest, TigerXmlWriterTest (no longer needed without CasDumpWriter)
  • Drop uima.v2_pretty_print_format surefire system property from dkpro-core-parent-common
  • Delete obsolete CasDumpWriter and clean up TeiWriterTest stale comment
  • Update NOTICE.txt to drop CasDumpWriter attribution

How to test manually

  • No specific test procedure

Automatic testing

  • PR includes unit tests

Documentation

  • PR updates documentation

- Replace CasDumpWriter with CasToComparableTextWriter in CasFilter_ImplBaseTest, rewrite line-index assertions to be format-agnostic, drop stale imports
- Replace CasDumpWriter with CasToComparableTextWriter in ApplyChangesBackmapperTest and regenerate output.txt reference
- Replace CasDumpWriter with CasToComparableTextWriter in HtmlReaderTest
- Replace CasDumpWriter with CasToComparableTextWriter in PdfReaderTest, regenerate test.dump reference, drop V2_PRETTY_PRINT setup
- Default CasToComparableTextWriter to excluding documentUri/collectionId/documentBaseUri features so reference fixtures stay machine-independent
@reckart reckart added this to the 3.0.0 milestone May 24, 2026
@reckart reckart self-assigned this May 24, 2026
@reckart reckart added this to Kanban May 24, 2026
@github-project-automation github-project-automation Bot moved this to In progress in Kanban May 24, 2026
@reckart reckart force-pushed the removal/1176-Drop-CasDumpWriter branch from 52fcfbd to fca45cd Compare May 24, 2026 16:51
@reckart reckart requested a review from Copilot May 24, 2026 17:43

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the custom CasDumpWriter from dkpro-core-testing-asl and migrates tests/fixtures to use CasToComparableTextWriter (CSV/HTML comparable-text output), including updating defaults to keep reference outputs machine-independent.

Changes:

  • Replaced CasDumpWriter usages in tests and IOTestRunner with CasToComparableTextWriter.
  • Updated CasToComparableTextWriter defaults to exclude DocumentMetaData URI-related features to avoid machine-specific fixture diffs.
  • Regenerated multiple .dump / output reference fixtures to the comparable-text CSV format and removed now-unneeded uima.v2_pretty_print_format wiring.

Reviewed changes

Copilot reviewed 23 out of 26 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
dkpro-core-textnormalizer-asl/src/test/java/org/dkpro/core/textnormalizer/casfilter/CasFilter_ImplBaseTest.java Switches test dumping from CasDumpWriter to CasToComparableTextWriter and loosens assertions to be format-agnostic.
dkpro-core-testing-asl/src/main/java/org/dkpro/core/testing/IOTestRunner.java Uses CasToComparableTextWriter for dump generation in one-way tests.
dkpro-core-testing-asl/src/main/java/org/dkpro/core/testing/dumper/CasToComparableTextWriter.java Introduces default feature exclusions for URI-like metadata fields to stabilize fixtures.
dkpro-core-testing-asl/src/main/java/org/dkpro/core/testing/dumper/CasDumpWriter.java Deletes obsolete writer implementation.
dkpro-core-testing-asl/NOTICE.txt Removes CasDumpWriter attribution and keeps comparable-text attribution.
dkpro-core-parent-common/pom.xml Removes global uima.v2_pretty_print_format surefire property.
dkpro-core-io-xces-asl/src/test/resources/xces-complex.xml.dump Regenerated fixture in comparable-text CSV format.
dkpro-core-io-xces-asl/src/test/resources/xces-basic.xml.dump Regenerated fixture in comparable-text CSV format.
dkpro-core-io-tiger-asl/src/test/resources/tiger-sample.xml.dump Regenerated fixture in comparable-text CSV format.
dkpro-core-io-tiger-asl/src/test/java/org/dkpro/core/io/tiger/TigerXmlWriterTest.java Removes V2 pretty-print setup that was only needed for CasDumpWriter.
dkpro-core-io-tiger-asl/src/test/java/org/dkpro/core/io/tiger/TigerXmlReaderTest.java Removes V2 pretty-print setup that was only needed for CasDumpWriter.
dkpro-core-io-tei-asl/src/test/java/org/dkpro/core/io/tei/TeiWriterTest.java Drops stale commented-out CasDumpWriter usage.
dkpro-core-io-pdf-asl/src/test/resources/reference/test.dump Regenerated fixture in comparable-text CSV format.
dkpro-core-io-pdf-asl/src/test/java/org/dkpro/core/io/pdf/PdfReaderTest.java Uses CasToComparableTextWriter and compares against regenerated reference dump.
dkpro-core-io-negra-asl/src/test/resources/tueba-sample.export.dump Regenerated fixture in comparable-text CSV format.
dkpro-core-io-negra-asl/src/test/resources/sentence.export.dump Regenerated fixture in comparable-text CSV format.
dkpro-core-io-negra-asl/src/test/resources/format4-with-coref-sample.export.dump Regenerated fixture in comparable-text CSV format.
dkpro-core-io-html-asl/src/test/resources/html/test.html.dump Regenerated fixture in comparable-text CSV format.
dkpro-core-io-html-asl/src/test/java/org/dkpro/core/io/html/HtmlReaderTest.java Uses CasToComparableTextWriter and removes V2 pretty-print setup.
dkpro-core-io-html-asl/src/test/java/org/dkpro/core/io/html/HtmlDocumentReaderTest.java Removes V2 pretty-print setup that was only needed for CasDumpWriter.
dkpro-core-io-conll-asl/src/test/resources/conll/2002/germeval2014_test.conll.out Regenerated fixture in comparable-text CSV format.
dkpro-core-castransformation-asl/src/test/resources/output.txt Regenerated expected dump output in comparable-text CSV format.
dkpro-core-castransformation-asl/src/test/java/org/dkpro/core/castransformation/ApplyChangesBackmapperTest.java Uses CasToComparableTextWriter and compares against regenerated output.txt.
Comments suppressed due to low confidence (1)

dkpro-core-textnormalizer-asl/src/test/java/org/dkpro/core/textnormalizer/casfilter/CasFilter_ImplBaseTest.java:90

  • This uses FileUtils.readFileToString(File) without specifying an explicit charset. Since the same test class also reads using a fixed UTF-8 charset, it would be more robust/consistent to always read the dump using UTF-8 (or StandardCharsets.UTF_8) to avoid platform-default differences.
                .createAggregateBuilderDescription(filter, writer);

        SimplePipeline.runPipeline(reader, annotator, aggregator);
        assertTrue(FileUtils.readFileToString(tmpFile).isEmpty());
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@reckart reckart force-pushed the removal/1176-Drop-CasDumpWriter branch from fca45cd to fdaa583 Compare May 24, 2026 18:19
- Switch IOTestRunner to use CasToComparableTextWriter instead of CasDumpWriter
- Regenerate .dump reference fixtures in CSV format (bnc, html, negra ×4, tiger ×2, xces ×2)
- Drop V2_PRETTY_PRINT @BeforeAll setup from HtmlReaderTest, HtmlDocumentReaderTest, TigerXmlReaderTest, TigerXmlWriterTest (no longer needed without CasDumpWriter)
- Drop uima.v2_pretty_print_format surefire system property from dkpro-core-parent-common
- Delete obsolete CasDumpWriter and clean up TeiWriterTest stale comment
- Update NOTICE.txt to drop CasDumpWriter attribution
@reckart reckart force-pushed the removal/1176-Drop-CasDumpWriter branch from fdaa583 to afa3346 Compare May 24, 2026 18:43
@reckart reckart requested a review from Copilot May 24, 2026 18:43

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 29 changed files in this pull request and generated 2 comments.

- Make CasToComparableTextWriter iterate all CAS views with per-view section headers so multi-view CASes (e.g. backmapper output) are dumped completely instead of only the current view
- Regenerate ApplyChangesBackmapperTest output.txt reference to cover both _InitialView (backmapped) and TargetView sections
- Tighten CasFilter_ImplBaseTest assertions to check the fully-qualified Sentence type header and exact anchor prefix instead of loose substring matches
- Refresh RTFReaderTest Javadoc to reference CasToComparableTextWriter instead of CASDumpWriter

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 29 changed files in this pull request and generated 2 comments.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 30 changed files in this pull request and generated 1 comment.

- Throw `AnalysisEngineProcessException` when a view pattern is missing the include/exclude prefix
- Extract per-view rendering into a private `renderView` helper
- Add tests covering single-view header omission, multi-view headers, default URI feature exclusions, view include/exclude patterns, and custom exclusion overrides
@reckart reckart force-pushed the removal/1176-Drop-CasDumpWriter branch from a291e6f to ae903af Compare May 24, 2026 20:11
@reckart reckart merged commit a6ef9e1 into main May 24, 2026
5 checks passed
@reckart reckart deleted the removal/1176-Drop-CasDumpWriter branch May 24, 2026 20:30
@github-project-automation github-project-automation Bot moved this from In progress to Done in Kanban May 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants