Craig Ulmer

FAODEL 1.1906.1 Released

2019-07-08 faodel code

One of the things that's been missing from FAODEL is a tool to help manage resources and launch services. After the EMPIRE release, we did a lot of work to fix this by building a new cli tool that does many different things. The faodel tool can start/stop services, set/remove DirMan resource info, and put/get Kelpie objects from resource pools. We've received approval from DOE to release this as version 1.1906.1 (Excelsior!) at https://github.com/faodel/faodel. Here's the changelog:

Release Improvements

  • New faodel-cli tool for manipulating many things
    • Gets build/configure info (replaces faodel-info)
    • Start/stop services (dirman, kelpie)
    • Define/query/remove dirman resources
    • Put/get/list kelpie objects
    • New example/kelpie-cli script shows how to use
  • Support for ARM platform
  • NNTI adds On-Demand Paging capability
  • NNTI adds Cereal as alternative for serialization
  • NNTI has better detection and selection of IB devices
  • Fixes
    • SBL could segfault due to Boost if exit without calling finish
    • FAODEL couldn't be included in a larger project's cmake
    • LDO had a race condition in destructor

Significant User-Visible Changes:

  • faodel-info and whookie tools replaced by faodel cli tool
  • Dirman's DirInfo "children" renamed to "members"
  • Faodel now has a package in the Spack develop branch

Known Issues

  • FAODEL's libfabric transport is still experimental. It does not fully implement Atomics or Long Sends. While Kelpie does not require these operations, other OpBox-based applications may break without this support.
  • On Cray machines with the Aries interconnect, FAODEL can be overwhelmed by a sustained stream of sends larger than the MTU. To avoid this problem, the sender should limit itself to bursts of 32 long sends at a time.