Skip to content

hard limit on memory causes ray job to die #5

@asm582

Description

@asm582

Currently, even if one worker goes OOM, the entire ray cluster gets killed by LSF. With help of bluanch find a mechanism to manage remote tasks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is needed

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions