Reimagining the mouse pointer for the AI era

(deepmind.google)

52 points | by devhouse 2 hours ago

29 comments

  • arjie 24 minutes ago
    Oh interesting, this is very cool. At first I thought it was just focus-follows-mouse but it's more interesting. You have certain keywords trigger "add to prompt". Ignoring the voice functionality (which is admittedly crucial currently because other inputs currently take over focus), I've often wanted to just have a continuous conversation with the LLM as I 'point and click' (or tab over and select) at various things. Might be neat to have text input focus continue to go to the LLM where I'm typing text etc.

    Sometimes I go to a different page to take a screenshot and other times I'm browsing for a file, and other times I'm highlighting some log lines. Cursor did this well, with selecting text in the terminal auto-focusing the Cursor agent textbox so you could talk to the agent and then select some text and you didn't have to re-select the original agent textbox again. The agent is a top-level function in that system not "just another app I have to switch to" to take my context with.

    I have some small amount of bias because I've always felt input-constrained on computers. I have to move my hands to go places and that's exasperating. I've tried head tracking, had a vim pedal for a while, and used tiling WMs, and things like this to aid but while my vim-fu is pretty good and I function inside things very well with it, my cross-application interface isn't.

    In the end, perhaps we all have our home offices with our Apple Vision Pros and we talk to them like this to maneouvre faster through our machines and get our ideas into them.

    Cool research. I wonder what we'll end up with.

  • chromacity 9 minutes ago
    My reaction to the first demo (recipe) is that it was slower than typing the same thing on your keyboard.

    The second demo seems to be a wash: there's no time saved in saying "move this" versus "move crab".

    The third demo doesn't seem to warrant the use of a pointer at all, since there is only one way to interpret the prompt.

    None of this means that this approach will not be successful, but there's a reason why so many attempts to revolutionize user interfaces ended up going nowhere. For example, talking to your computer was always supposed to be the future, but in practice, it's slower and more finicky than typing.

    In fact, the only new UI paradigm of the past 28+ years appears to have been touchscreens and swipe gestures on phones. But they are a matter of necessity. No one wants to finger-paint on a desktop screen.

  • why_at 40 minutes ago
    My first impression coming away from this is skepticism.

    Anything with voice controls for routine use is a pretty tough sell. Doing this when you're not completely alone would be annoying to everyone around you.

    Most of their examples seem like they could have been done with a right click drop down menu so they don't really need to "re-invent the mouse pointer".

    So is this thing talking to Google's servers all the time for the AI integration? So it won't work if you're not connected to the internet? Privacy concerns are obvious; now Google wants to have an AI watching literally everything you do on your computer?

    Does it cost the user anything for the LLM use? If it's free will it stay free forever? That's quite a lot to give away if they're expecting people to use it to change a single word like in one of their examples. I guess they're expecting to make the money back by gathering data about literally everything you do on your computer.

    There might be a killer app for AI integration with personal computers that has yet to be invented, but this doesn't look like it.

    • AirMax98 20 minutes ago
      Right — it does seem cool but the voice is patching over a major gap. If I'm talking already, why wouldn't I just describe what I'm looking at and have the AI grab it for me?
    • nolist_policy 34 minutes ago
      The "Edit an Image" Demo at the bottom is pretty fun. Maybe this is just Google flexing their LLM inference capacity.
  • dandaka 8 minutes ago
    Next generation of OS should have constant video and audio recognition by on device LLM. This will provide valuable context for a lot of scenarios. So instead of frequent copy-pasting we are used to, we can let agents access context of our whole workflows from different apps.

    But Google is a very ill positioned candidate for such OS. I would rather trust Apple and local-first on-device models.

  • kjellsbells 49 minutes ago
    I sense a privacy problem brewing.

    It reminds me of Microsoft Recall in the sense that some portion of the screen is going to be continuously transmitted outside of the users control.

    What happens when someone browses something very private (planning a surprise engagement. looking at medical data. planning a protest)? All that data gets slurped to google and subject to a warrant or discovery or building your advertising fingerprint.

    Maybe the idea is that the data is sent to AI only when you right click, but that seems like a very thin firewall that a product manager will breach in the interests of delivering "predictive AI" via some kind of precomputed results.

  • juancn 21 minutes ago
    Please don't.

    I like text selection exactly how it is. I want precise controls.

    It's fine for a touch interface like a phone, but on a computer I expect precision. As much as I can get.

  • maheenaslam 14 minutes ago
    The concept is good but accuracy in cluttered environment can be a concern, also misinterpreting context can be a problem
  • nolist_policy 53 minutes ago
    Wiggle at CAPTCHAs, wiggle at Termux, wiggle at Emacs, wiggle at the Godot Editor, wiggle at my remote desktop.

    (Not going to happen)

  • jpatten 51 minutes ago
    Reminds me of Put That There https://m.youtube.com/watch?v=RyBEUyEtxQo
  • loaderchips 1 hour ago
    It's beautiful how the human mind can take something very obvious but overlooked and make it into this fantastic innovation. Fab stuff.
  • tintor 1 hour ago
    Of course, it isn't a Google Demo, if you can't use it to book a table at restaurant. (shown at the bottom of the page)
  • jaccola 57 minutes ago
    This seems like one of those things that is usable infrequently enough to be forgotten/poorly developed/never used. (Even before accounting for the actual failure rate of the LLM which will be none-zero).

    Perhaps a text box and file upload isn’t the perfect interface for every use case but it is versatile which is a huge barrier to overcome.

  • AbuAssar 1 hour ago
    so Google will be monitoring whatever on the screen continuously or only when the user say the magic words (this, that, here, there)?
    • EdgeExplorer 1 hour ago
      Indeed. "AI-enabled pointer" is misdirection. This isn't an AI-enabled pointer; it's sending screen to AI, which yes, includes pointer position. The AI doesn't live in the pointer. The AI lives, apparently, so thoroughly in the system that it can see and do anything, and the pointer is just a way of giving it context.
    • OtomotO 54 minutes ago
      Google Recall. Hey, it's all about the marketing.
  • hmokiguess 31 minutes ago
    Don't build these things, instead build protocols and expose system level APIs for application developers to build things.
  • xiphias2 14 minutes ago
    Google needs to beat OpenAI and Antropic in coding models because that's where the big money is going. I love using the Gemini pro model for quick questions, but that's not where I'm spending the real money.

    They have so many great software engineers but unable to use them to speed up coding AI research. Hopefully with Sergey's focus it will get better.

    This cursor thing is just another experiment nobody cares about.

  • iridione 1 hour ago
    Interesting! I wonder how UI will evolve in the long-term? If there are browser-use/computer-use and clicky-clones automating pointer actions, do we really need complex UI anymore? If yes, when?
    • Ancapistani 50 minutes ago
      I've been playing with writing a visionOS app that allows an AI agent to be aware of what you're looking at at any given time.

      At some point I fully expect eye tracking (or attention tracking) to be common enough to be a first-class input method.

  • strgrd 1 hour ago
    No thanks
  • SirFatty 1 hour ago
    It only took Google and their AI offering to come up with Graffiti.
  • mcookly 1 hour ago
    I wonder what sort of monstrous power would be unleashed if Google used Plan9 as a foundation.
    • bitwize 45 minutes ago
      They'd half-finish it then bury it, like they did with Fuchsia which is heavily Plan-9-inspired.
  • Joker_vD 34 minutes ago
    Just seven hours ago there was a plea on HN [0] to please not do this. Seriously, what are they smoking at Google right now?

    [0] https://news.ycombinator.com/item?id=48107027

  • jinkuan 56 minutes ago
    being able to make precise edits would be huge for AI
  • mvdtnz 1 hour ago
    Both of the text based demos would have been simpler and faster with traditional mouse and keyboard interactions. What is the AI adding?
    • hyperhello 54 minutes ago
      They’re going to take your abilities to do anything and spread it across many places so you have to run around to do them, same as all the moneyed technology.
    • wartywhoa23 19 minutes ago
      Hype-flavored surveillance!
    • dfxm12 44 minutes ago
      It tracks what's on the screen and sends it back to Alphabet. If you're watching a video about BBQ, enjoy a bunch of ads for Omaha steaks and big green egg in your Gmail.

      On a less serious note, the audience for this is people who want to optimize for what seems like the least amount of effort.

    • slopinthebag 1 hour ago
      It feels like everything modern is like this. No value added, just the appearance of it.
  • simondw 48 minutes ago
    Maybe I'm misunderstanding, but what is new about the pointer itself? Seems to be functionally the same as selecting + tooltips / context menus.
    • kwertyoowiyop 43 minutes ago
      Shush, how is anyone going to get promoted with that kind of talk!?
    • DaiPlusPlus 32 minutes ago
      > but what is new about the pointer itself?

      I'm hoping for a const-reference joke.

  • pmarreck 31 minutes ago
    There's already a product that does this lol

    Aaaaand now I can't remember the name of it

  • LocalH 57 minutes ago
    do not want
  • OtomotO 55 minutes ago
    Like a dream come true...

    Nightmares are dreams as well and this is a nightmare like Windows Recall.

    Technically wonderful though.

  • themafia 1 hour ago
    > We’ve been exploring new AI-powered capabilities to help the pointer not only understand what it’s pointing at, but also why it matters to the user.

    We couldn't quite track you well enough before. So we're fixing that under the guise of "AI powered capabilities."

  • brgsk 36 minutes ago
    what the hell is going on at google
  • SirMaster 48 minutes ago
    Thanks, I hate it