Skip to content

Merging 14 Ontologies (huge merge) #403

Description

@OliverHex

Hello,

I am trying to merge 14 ontologies at once with Boomer : DERMO, DO, HUGO, ICDO, IDO, IEDB, MESH, MFOMD, MPATH, NCIT, OBI, OGMS, ORPHANET and SCDO.

This is how I proceed :

  • I compute the 91 LOGMAP alignments between every pair of ontologies (i.e. 91 = n(n-1)/2 with n=14)
  • I convert and merge these alignments into a single ptable (Boomer format)
  • I join all these ontologies into a single "union" OWL file (622K classes ~ 2.5 GB)
  • I launch Boomer on the union OWL file and the single ptable (54K entries ~ 7 MB).

I have run various tests and it seems that when the ptable is too large, the problem becomes intractable.

By removing the MESH and NCIT (i.e. now I try to merge 12 ontologies), the resulting union ontology is only 81K classes (242 MB) and the ptable contains only 7K entries. In this case, Boomer ends with a result in 30 min (on a i7 - 1.90 GHz with 32 GB RAM​).

But I also need the MESH and the NCIT ontologies to be included in my merge result.

Overall, I am wondering if that's the correct way to proceed ?

Here follow some questions :

  1. Should I continue with this strategy ?
    -> Should I keep trying to merge all at once ? In order to give Boomer complete decision power on selecting the best mappings (without introducing any bias)...

  2. Or should I change my merging strategy ?
    -> Should I split the problem into smaller sub-problems
    -> Then organize them in some order (according to some criteria) : this could introduce some bias...
    -> And launch Boomer following this order.

    For example, I could try this :
    - I convert the 91 alignments into 91 ptables (instead of converting and merging them into 1 single ptable)
    - For each of the 91 ptables
    ----> I launch Boomer with this ptable and the union OWL file.
    ----> In the union OWL file, I add all the equivalence axioms generated by Boomer for this ptable.

    So far, it seems to work much faster.
    But the problem is the arbitrary order in the for-loop that is introducing a bias : since each equivalence axiom added at one step will influence Boomer results in the next steps.

Any suggestions ?

Oliver

PS : I couldn't attach the Boomer input union ontology (compressed ~ 140 MB) since the maximum attachment size is 25 MB. However, the input ptable is here ptable-91-mappings.zip .

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions