This is set up for the same fate as DNT in browsers. Collecting all the "do not track" env vars into a single "do_not_track.env" file, however, may not be a bad idea...
Love it. This is an annoying problem and likely the actual solution than asking folks to use a universal one. I'll put something together as a starting point.
Advertisers chose to ignore DNT because they claimed Microsoft making DNT enabled by default took agency away from the user. In reality, they probably weren't going to honor it anyway.
Microsoft is too sophisticated to plead ignorance; they are responsible for that outcome and I think we can assume they knowningly chose it. (Though now Microsoft browsers are such a small portion of the market that it doesn't matter.)
The biggest failure of DNT was browser makers - including Mozilla - removing it. It has zero performance impact (1 bit?) or development cost. As long as it was out there, when there was momentum against tracking, advocates had evidence of both demand for privacy and of trackers ignoring user wishes.
I was surprised how hard it was to stop the Python transformers library from phoning home to Hugging Face. I set HF_HUB_DISABLE_TELEMETRY=1, and when I called Wav2Vec2CTCTokenizer.from_pretrained I explicitly passed local_files_only=True, but still I got got a warning about not having a valid HF_TOKEN. It wasn't until I stumbled upon HF_HUB_OFFLINE=1 that I'm somewhat confident that I'm not making outgoing connections to HF every time I load a wav2vec2 model from disk.
I wouldn't have realized this was happening at all if it weren't for the obnoxious HF_TOKEN warning.
HF is notorious for making it difficult to work offline (or at least not waste time trying to connect when everything needed is offline) and is constantly changing how it is being handled. Previously, there was TRANSFORMERS_OFFLINE, HF_DATASETS_OFFLINE, etc.
Looks like a helpful honeypot! Any tool that will public announce support for this spec is a tool I know to avoid because it collects telemetry without explicit opt-in in the first place.
DO_NOT_TRACK support doesn't mean tracking is not an explicit opt-in.
Example: the software crashes, and there is a crash handler that asks you if you want to send a crash dump. With DO_NOT_TRACK, the crash handler is disabled entirely, no question, no dump.
If it gets some adoption, that's probably how it will work. Those who have an financial interest in using tracking (ex: ads) probably won't support such an option.
I don't think there is any way to stop people from tracking you. Technically speaking, you can pretty much always be tracked. Even if you eliminated all third party requests you could still be tracked. Downloads, logins, queries, etc all can be tracked. Virtually all software now has the "continuously upgrade to the latest version" bullshit so you are tracked every time you open the app. Even if you turn it off, they stop the app from working until you upgrade, so they force you to be tracked.
I think the only solution is to make it law that you can't track anyone for any reason without their consent, and can't sell consensual tracking data without an additional consent agreement. It would be a huge blow to the advertising industry, so it will never be made law, but it's the only thing that would work.
> Many CLI tools, SDKs, and frameworks collect telemetry data by default.
Any of those are using a dark pattern and before exploring new ways to opt out you should look for and spend your energy on an alternative which respects your freedoms upfront.
This is just sad. Luckily I do not use any of the listed programs. I threw out Homebrew many years ago when they started this nonsense.
The only tool I have installed currently that does %/"($& like this is Deno (required for yt-dlp now). It phones happily home even if you wrap it into a wrapper script that forces the env variable (in no way I'll pollute my default environment with stuff like this):
I wish bad dreams to whoever puts such crap into their software! Thankfully I have Little Snitch to catch most of those kind of invasions of my privacy.
Could you provide more details? Many applications use multiple processes, and use some intermittently. It seems like quite a bit of work to enumerate every process used and then to keep the white/blacklist updated as usage and software changes - every new application or command you use, every update, every OS change that affects networking or system calls etc ...
The issue is that it is not enforced. My version of My IP will tell you if 'Do Not track' and 'Global Privacy Control' are set by your browser but it is up to the website to honour your requests. Check if your browser is sending them by visiting: https://fshot.org/utils/myip.php
It's probably easier to run your own DNS and blacklist the offending domains. There are good blacklists with millions of telemetry domains, e.g. https://github.com/hagezi/dns-blocklists.
I’m morally opposed to the notion of optimizing the opt-out mechanism. I want a standardized opt-in mechanism, like:
export ALLOW_TRACKING=telemetry,crash_dumps
and the absence of such a setting means “fuck off, don’t spy on me”. It’s not my responsibility to turn off apps wanting to track me. It’s their responsibility to get me to authorize their specific flavor of tracking.
Honest question, what's the problem with crash dumps that include no personal info? They just help make the software less buggy. I also don't see an issue with anonymized usage patterns (this feature was used X times this month, this one Y times, etc).
Can someone expound on what they see as a problem?
> Honest question, what's the problem with crash dumps that include no personal info?
In addition to the other response: crash dumps are difficult to anonymize, both because useful crash dumps include something like a minidump (or some other small alternative to a core file), and because even without that, any random information from a backtrace may be sensitive (e.g. a URL).
There's nothing wrong with saving a crash dump and giving the user control of whether to submit a bug report.
I would suggest that the default to enrolling people in supplying such information is the issue. In a world driven by surveillance capitalism, even "anonymous" data can be used for much broader purposes (think, for example, of when and where people are using tools geographically and at what times: you can start to track the behaviour of people in this way).
Users should never be opted in through usage alone of free or paid-for tooling to supply information that isn't part of the function of the tool. Where that is required for a service or product, you should opt-in explicitly, not implicitly.
Default opt-in tracking should be illegal and enforced with such fines and prison sentences, that companies wouldn't even dare to have anything remotely capable of tracking in the runtime.
Unfortunately big corporations can always find away to make regulators see no problem.
Though if you just want a simple ENV var that handles this WHILE honoring the specification on this page: https://github.com/alloydwhitlock/do-not-track-cli
The biggest failure of DNT was browser makers - including Mozilla - removing it. It has zero performance impact (1 bit?) or development cost. As long as it was out there, when there was momentum against tracking, advocates had evidence of both demand for privacy and of trackers ignoring user wishes.
[0]: https://github.com/renovatebot/renovate/discussions/42932
I wouldn't have realized this was happening at all if it weren't for the obnoxious HF_TOKEN warning.
Example: the software crashes, and there is a crash handler that asks you if you want to send a crash dump. With DO_NOT_TRACK, the crash handler is disabled entirely, no question, no dump.
If it gets some adoption, that's probably how it will work. Those who have an financial interest in using tracking (ex: ads) probably won't support such an option.
I think the only solution is to make it law that you can't track anyone for any reason without their consent, and can't sell consensual tracking data without an additional consent agreement. It would be a huge blow to the advertising industry, so it will never be made law, but it's the only thing that would work.
Any of those are using a dark pattern and before exploring new ways to opt out you should look for and spend your energy on an alternative which respects your freedoms upfront.
The only tool I have installed currently that does %/"($& like this is Deno (required for yt-dlp now). It phones happily home even if you wrap it into a wrapper script that forces the env variable (in no way I'll pollute my default environment with stuff like this):
I wish bad dreams to whoever puts such crap into their software! Thankfully I have Little Snitch to catch most of those kind of invasions of my privacy.The proposed way just normalizes tracking.
Is anyone maintaining a more complete list of those?
https://dpaste.com/E7RZ34MVD
https://github.com/StevenBlack/hosts
Everyone proclaiming a "standard" is just adding to the long list of (unofficial) alternatives.
export SEMGREP_SEND_METRICS=off export COLLECT_LEARNINGS_OPT_OUT=true export STORYBOOK_DISABLE_TELEMETRY=1 export NEXT_TELEMETRY_DISABLED=1 export SLS_TELEMETRY_DISABLED=1 export SLS_NOTIFICATIONS_MODE=off export DISABLE_OPENCOLLECTIVE=true export NPM_CONFIG_UPDATE_NOTIFIER=false
And they do by burying it in the user agreement you probably agreed to.
Like it or not, it is your responsibility. I agree it shouldn’t be, but let’s be realistic.
They didn't opt out of my data, after all.
Can someone expound on what they see as a problem?
In addition to the other response: crash dumps are difficult to anonymize, both because useful crash dumps include something like a minidump (or some other small alternative to a core file), and because even without that, any random information from a backtrace may be sensitive (e.g. a URL).
There's nothing wrong with saving a crash dump and giving the user control of whether to submit a bug report.
Users should never be opted in through usage alone of free or paid-for tooling to supply information that isn't part of the function of the tool. Where that is required for a service or product, you should opt-in explicitly, not implicitly.
Unfortunately big corporations can always find away to make regulators see no problem.