How we made Python's packaging library 3x faster

(iscinumpy.dev)

64 points | by rbanffy 3 days ago

3 comments

  • djoldman 5 hours ago
    > _canonicalize_table = str.maketrans( "ABCDEFGHIJKLMNOPQRSTUVWXYZ_.", "abcdefghijklmnopqrstuvwxyz--", )

    > ...

    > value = name.translate(_canonicalize_table)

    > while "--" in value:

    > value = value.replace("--", "-")

    translate can be wildly fast compared to some commonly used regexes or replacements.

    • est 4 hours ago
      I am curious, why not .lower().translate('_.', '--')
      • fwip 4 hours ago
        .lower() has to handle Unicode, right? I imagine the giant tables slow it down a bit.
    • teaearlgraycold 5 hours ago
      I would expect however that a regex replacement would be much faster than your N^2 while loop.
      • notpushkin 4 hours ago
        It would be, if it was a common situation.

        This loop handles cases like `eggtools._spam` → `eggtools-spam`, which is probably rare (I guess it’s for packages that export namespaced modules, and you probably don’t want to export _private modules; sorry in advance for non-pythonic terminology). Having more than two separator characters in a row is even more unusual.

  • zahlman 3 days ago
  • YouAreWRONGtoo 38 minutes ago
    [dead]