Launch HN: Sitefire (YC W26) – Automating actions to improve AI visibility

Hi HN! We're Vincent and Jochen from sitefire (https://sitefire.ai). Our platform makes it easy for brands to improve their visibility in AI search.

We’ve been working together for years and have backgrounds in RL/optimization at Stanford and software engineering. We came to this idea after speaking with marketing teams who were seeing declining traffic due to Google’s AI Overviews and didn’t know what to do.

This space can feel esoteric. Many case studies, few actual studies. Constant battle against myths (e.g. you need a llms.txt vs. you don't need a llms.txt) and "GEO hacks". We try to be more data-driven. And we try to be more bold and build a system that not only monitors, but actually improves traffic from AI search.

While Google performs a single search, AI search engines expand the user prompt into 3-10 fan-out queries. The sourced pages are ranked using a classified algorithm similar to Reciprocal Rank Fusion (RFF). Finally, the LLMs skim the pages and decide what snippets to cite. Our goal is making sure brands have the right content that makes it through this funnel.

Here is how sitefire works:

- The user defines a set of prompts they want to monitor. These are synthetic prompts - we generate them based on SEO keywords and their monthly search volume.

- We submit these prompts to ChatGPT, Gemini, Google AI Mode, etc. on a daily basis and capture the answers. We extract fan-out queries, sourced pages, citations, and brand mentions.

- For each topic, our agents analyze which web pages are sourced and cited the most, and why. They also consider similar pages that you already have.

- Based on the diagnosis, our content agents draft improvements or create new pages, and push them directly to the client’s CMS.

- We integrate with the client’s network logs and Google Analytics to monitor the increase in AI bot requests and human referrals to their page.

This system is continuously updated, so it always shows which content works, and how to adapt the existing sitemap. For one client that used sitefire to optimize their blog, the AI-optimized articles increased their AI bot requests from ~200/day to ~570/day within ten days.

A risk we recognize is that AI-generated content is filling brands’ websites with slop. Whilst it’s still early days and we don’t claim to have figured everything out yet, our intention is to mitigate this by focusing the content on specific, unique information: real product capabilities, real pricing, honest comparisons. The clients still review every page before it goes live, so they can ensure the content is true to their brand.

Some clients use our platform themselves. For others we act more like an agency, automating steps as we go. The goal is for sitefire to run mostly on its own, with clients approving changes via Slack, Claude or their CMS.

Here's a video demo: https://screen.studio/share/fw7VQQak

If you'd like to try what we've built so far, sign up at https://sitefire.ai.

19 points | by vincko 1 hour ago

10 comments

  • yunyu 1 hour ago
    What do you guys do differently than Profound or Airops?
    • vincko 1 hour ago
      That's a super valid question, we get it a lot. There are a lot of overlaps.

      In our view Profound and Airops are aimed at existing marketing teams. Our goal is to be more hands-off, so you don't need a team. With many of our clients we act more like an agency, communicating via Slack and automating step by step. That's the experience we want to create. We aren't there yet though.

    • debarshri 1 hour ago
      Add peec to that list.
      • methyl 4 minutes ago
        And Surfer, the OG content optimization platform.
      • vincko 1 hour ago
        True, it is very competitive.

        Our view on Peec is that it is an analytics solution. They recently did launch an actions feature. But they do not take any actions (yet). Creating content takes a lot of resources. And agencies are expensive.

        As an analytics solution it is a good option.

  • onecommit 59 minutes ago
    How do models deal with assessing the quality of content and its accuracy/veracity when recommending products currently? What do the providers do to avoid a situation where more content === more traffic? Would love to see links to relevant research on this, if you have them. much success to you, appreciate your ai slop risk awareness.
    • vincko 29 minutes ago
      There is the preselection, which depends on the fanout queries the model comes up with and the contents performance across those queries on the search index.

      After that content is actually assessed by the model. This paper tried different strategies to improve performance for this last step: https://arxiv.org/pdf/2311.09735. Adding statistics, sources, original data are all strategies that we apply.

      In classic SEO, creating more and more content leads to "cannibalization". Generally this hurts performance of all overlapping content so much that it is not worth it.

  • Gobhanu 1 hour ago
    how do you track where users are coming from?
    • vincko 1 hour ago
      We currently simply integrate with your Google Analytics and filter by Source. This tends to be a lower bound, since it's not always set correctly. Coming from some of the native apps, users might be categorized as direct visitors.

      There are other data sources we want to enable in the future like Cloudflare.

  • ceejayoz 1 hour ago
    Ugh. The worst of SEO, but a bunch more of it? Noooooo.
    • vincko 1 hour ago
      I get it, there is a lot of worry about slop.

      We think about it like this: all of these agents will be most useful to users if they provide valuable answers. So they will be looking for valuable content for grounding their answer.

      There are exploits, you can overfit on whatever they currently use as an objective function. But those tend to be temporary. So in the long run, valuable content will win. That's what we aim to create. It's a fine line.

      • ceejayoz 1 hour ago
        > all of these agents will be most useful to users if they provide valuable answers

        This is a bald assertion.

        • vincko 1 hour ago
          Do you doubt the statement on how to maximize usefulness? Or do you mean that the companies behind the models might not optimize (exclusively) for usefulness to the user?

          I do share doubts about the latter.

          • ceejayoz 59 minutes ago
            > Do you doubt the statement on how to maximize usefulness?

            Yes; the customer here is the site using it, not Google end users, who'll tend to accept whatever's the top search result even if it's deeply wrong or complete slop.

            The wellbeing of search users isn't really the priority here, right?

            • vincko 18 minutes ago
              Yes, that is correct. We help the brands, not the end user.

              Let me try to rephrase the line of thinking:

              To maximize value to the end user, the models generally aim to be helpful. The companies building these models are incentivized to make the model use helpful content.

              Our goal is to be aligned with their objective function long term. And that incentivizes us to create helpful content.

              Not all of this is a given. We don't know for sure how it will play out. There will always be ways to game the system. But we think those will get fixed over time.

  • abitabovebytes 26 minutes ago
    [dead]
  • Remi_Etien 1 hour ago
    [dead]
  • a13n 1 hour ago
    Please don't override the browser's default scroll behavior. It's so jarring and basically never a good idea.
    • vincko 1 hour ago
      Thank you for the feedback. We'll launch our new site soon where this is fixed.
  • vahar 33 minutes ago
    Regarding the topic of ambient agents, what’s the impact of your product? It’s hard for me to imagine the impact but I guess it must be a necessity if we have ambient agents to get discovered at all right? Nice to see a player from Europe on the market too!
    • vincko 4 minutes ago
      Do you mean agents not answering short specific user prompts?

      For those types of agents, prompt tracking is less accurate since the context of the queries is so large. But it's still relevant to understand what web searches they tend to perform and if you do show up in those.

      That's another reason why we want to integrate other data sources, especially network logs.

  • DragDropDeploy 35 minutes ago
    [flagged]
  • dmani 59 minutes ago
    [flagged]