Skip to content

Remove pinning fields in built-in Ruby types. #54

@wks

Description

@wks

If an object calls rb_gc_mark on a field, it pins the child. Such objects are potential pinning parents that must be handled specially. Reducing such objects can reduce the overhead it imposes on copying GC.

Related higher-level issues are:

This issue keeps a list of built-in types that are PPPs, and why they pin their children.

Some object can pin its children

  • T_DATA: Some third-party libraries were written before Ruby introduced moving GC.
  • T_IMEMO
    • imemo_ifunc:
      • gc_mark_maybe(RANY(obj)->as.imemo.ifunc.data) type: VALUE
      • ifunc represents a "block written in C",
        and data is the "extra argument" passed to the block in addition to the yielded data.
        • I guess because the ifunc is written in C,
          the data can be anything (as long as the C func recognizes),
          even though it is supposed to be a VALUE which holds a Ruby value.
          It could be a compromise due to frequent misuse.
    • imemo_memo:
      • gc_mark_maybe(RANY(obj)->as.imemo.memo.u3.value)
      • It looks like a generic "memo" type. The u3 field is an untagged union that can be anything.
    • imemo_iseq (No longer PPP since Make all of the references of iseq movable ruby/ruby#7156):
      • Union aux members
        • rb_gc_mark(iseq->aux.loader.obj)
        • rb_gc_mark(compile_data->catch_table_ary)
        • rb_hook_list_mark(iseq->aux.exec.local_hooks) which calls rb_gc_mark(hook->data) for each hook.
        • rb_iseq_mark_insn_storage(compile_data->insn.storage_head) which calls rb_gc_mark(op)
        • The three fields above are parts of a union (iseq->aux)
          Other union variants do not hold reference at the same offset, so it has to be conservative.
          • It should be possible to test the union tag to know precisely which case it is.
            • Actually rb_iseq_mark is testing the union tags!
      • MJIT:
        • mjit_mark_cc_entries(body)
    • imemo_tmpbuf:
      • fully conservative.
        • Calls rb_gc_mark_locations on all offsets.
        • It is used to implement ALLOCV. I think it has to be PPP because of it conservative nature.
    • imemo_ast:
      • rb_gc_mark(ast->node_buffer->mark_hash)
      • rb_gc_mark(ast->body.compile_option)
      • rb_gc_mark(ast->body.script_lines)
      • rb_ast_update_references only calls update_ast_value on each NODE, but not the three fields above.
    • imemo_parser_strterm:
      • rb_gc_mark(heredoc->lastline)
      • It is part of a union, but rb_strterm_mark already tested the tag.
  • T_HASH: If Hash#compare_by_identity is called, it will pin_key_mark_value.
    • compare_by_identity: Sets self to consider only identity in comparing keys;
      two keys are considered the same only if they are the same object; returns self.
      • Cannot be undone. Good candidate for using remembered set.
    • Can be made non-PPP by introducing address-based hashing.
  • Any object that has gen_ivtab (No longer PPP since ruby@de72448)
    • What's that?
      • gen_ivtab = generic instance variable table
        • useful for adding custom variables to anything other than T_OBJECT
      • gc_mark_children -> (if EXIVAR) rb_mark_generic_ivar -> gen_ivtbl_mark -> rb_gc_mark
      • generic_iv_tbl_: (in variable.c) a global st_table mapping obj to gen_ivtable.
    • Seems unnecessary.
      • I patched the code to let it move, and it seems to work.
      • wks@282148b

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions