Zml-smi: universal monitoring tool for GPUs, TPUs and NPUs

(zml.ai)

61 points | by steeve 5 days ago

5 comments

imcritic 23 minutes ago
Is it capable of exposing metrics in Prometheus format?
[-]
- steeve 11 minutes ago
  consider it done
rdyro 8 hours ago
Looks cool!
nvtop can actually support TPUs too via https://github.com/rdyro/libtpuinfo/ https://github.com/Syllo/nvtop/blob/76890233d759199f50ad3bdb...
serialx 3 hours ago
Look into all-smi https://github.com/lablup/all-smi It supports all GPUs thinkable including Apple Silicon and many AI accelerator cards.
mrflop 5 days ago
Renaming fopen64 to intercept library calls feels like a brittle hack masquerading as "sandboxing." Why not just upstream this hardware support to nvtop instead of fragmenting the ecosystem?
[-]
- steeve 5 days ago
  sadly, sandboxing is something that can't be upstreamed. this way, sandboxing is kept in zml instead of patching mesa.
  as for nvtop, great program, but we missed a few features (such as sandboxing)
  [-]
  - pstuart 6 hours ago
    It looks cool and I was excited to get monitoring for the NPU on my Ryzen AI 395+, unfortunately it does not show. NPU support in linux really seems to be an afterthought.
    [-]
    - steeve 6 hours ago
      Weird, because we tried it. It doesn’t show anything?
      We use the amdsmi to get metrics. I’ll investigate.
- marwanet 6 hours ago
  If this logic were pushed into nvtop, wouldn't the codebase become unmaintainable? Each vendor's interception method is going to be different.
  [-]
  - nareyko 6 hours ago
    [dead]
152334H 4 hours ago
"NPU" seems to refer to trainium only?