I'd say, Hz is quite regular choice for _this_ ... it's just not referred to as "Hertz" by IT practitioners (usually). Technically Bq and Hz are same unit 1/sec - difference is that Bq is used for random physical events (comparable to web requests) and Hz is used for periodic physical events.
The Bq suggestion doesn’t actually fix anything. Becquerel is defined as one decay event per second and is dimensionally identical to Hz. Using Bq typically signals that a poisson process is being measured which is itself an assumption about the arrival statistics. This assumption is likely wrong for real web traffic (which tends to be bursty rather than memoryless).
More importantly, the claim that Hz is inappropriate for non-periodic phenomena is false. Many random processes have a well-defined Fourier transform, and reporting the intensity of random fluctuations in a frequency-range is standard across signal processing, neuroscience, finance, and physics. The unit doesn’t imply periodicity of the process itself. It implies that we are working in the Fourier domain, which applies as much to periodic signals as to stochastic processes.
If you want to characterize web request traffic properly, the right question is what the arrival process actually looks like. A single scalar whether in Hz or Bq throws away almost all of that. In all cases, you have to think carefully what your underlying assumptions are and what the reported number actually measures.
Becquerel (or counts per second) have the same problem in that they don't measure the "energy" of each request.
I do like the analogy though. Actual radiation has many forms and energy levels.
Decay chains are a nice analogy you could use too (i.e. a branching out of subsequent processes and work that come later, but are a consequence of the initial request).
And yes, like Sieverts, some types of incoming request, and some "organs" are more consequential than others. There's even an analogy to "committed dose" as the database accumulates things.
The authority on the definition of SI units is very clear:
> The hertz shall only be used for periodic phenomena and the becquerel shall only be used for stochastic processes in activity referred to a radionuclide
Usually, no radionuclides are involved in web requests.
Oh, that's kinda fun. I got the same that I get for every Mastodon (and Anubis-protected) link: a page telling me that it won't work without JavaScript. I guess since AI scrapers these days do run some amount of JS, that is some second layer of defense?
At least for Twitter there are proxies that work without JS. For Mastodon, none that I'm aware of. I usually just audibly sigh and remark that they shall "keep their secrets then", and move on.
We are not talking about the same thing, it seems. I can understand a web page that doesn't work without javascript.
What I do not understand is someone going through all this work of putting an AI-scraper tarpit on Mastodon, a system that fundamentally needs to have its data distributed to other servers. It's just signalling and posturing, because that content is available on any server that has someone following the account.
(Tip to AI scrapers: if you want to slurp all the data from the fediverse, just create an account on mastodon.social and pull the data from the "Federated timeline" stream.)
It's not you. It's the people that were somehow convinced that serving crap is gonna "hurt" the models. These are people who have 0 clue on how models are trained and how they work, but have been riled up by others who similarly don't understand the technical details, but have strong biases against them. This is ignorance signalling at its finest.
And, as expected, it's hurting their (regular) users more than they'll hurt the model trainers. Oh well..
To those who automatically assume humans with "weird" setups are "AI scrapers" (also a bit of a boogeyman these days): FUCK YOU. I'm a human, not a stupid mindless sheeple.
Some people feel like we are being turned into content producers for large corporations to monetize and they’re not entirely wrong. I don’t mind when people take a stance, even if their methods aren’t perfect and it may inconvenience myself personally.
If you’re in rush to airport or hospital and you are delayed by protesters for a cause you don’t understand, it’s one thing, I could understand a bit of cursing. However, this is someone’s web resource, they are free to do with it whatever they want, and they owe you and me nothing.
> I don’t mind when people take a stance, even if their methods aren’t perfect and it may inconvenience myself personally.
The problem is that not only their methods "aren't perfect", but completely ineffective.
If you are posting on a public social network, your data will be available to the public, one way or another. The whole protest becomes a "performance art" kind of thing: it might be useful for creating awareness, but in most cases the people who will be seeing it are the ones who are aware of it in the first place.
We don't use unit of measurements.
We use metrics because we have a lot more context.
Rps, requests per second is a commonly used unit but it has no defined standard, you could and often do average it over time for reporting but no one says you have to. For scaling however you'll probably want to use the max not the average, because no one wants a web application where in business as usual 60% of the time it works every time.
We should use Sievert. I.e. how is the speed affecting my UX. That may depend on how much I give a fuck about the site multiplied by how many requests are needed to render it.
Counterpoint: let's say we connect a speaker to the HTTP server, and every time there's a request, the speaker produces a click. This setup will make audible sound. If it's OK to measure this sound in Hz, then is't OK to measure the HTTP requests in Hz, because they're explicitly === sound in this case.
amingilani | 4 hours ago
PunchyHamster | 4 hours ago
raffael_de | 4 hours ago
dnnddidiej | an hour ago
felooboolooomba | 2 hours ago
a3w | 3 hours ago
manuel-rhdt | 4 hours ago
More importantly, the claim that Hz is inappropriate for non-periodic phenomena is false. Many random processes have a well-defined Fourier transform, and reporting the intensity of random fluctuations in a frequency-range is standard across signal processing, neuroscience, finance, and physics. The unit doesn’t imply periodicity of the process itself. It implies that we are working in the Fourier domain, which applies as much to periodic signals as to stochastic processes.
If you want to characterize web request traffic properly, the right question is what the arrival process actually looks like. A single scalar whether in Hz or Bq throws away almost all of that. In all cases, you have to think carefully what your underlying assumptions are and what the reported number actually measures.
HPsquared | 3 hours ago
I do like the analogy though. Actual radiation has many forms and energy levels.
Decay chains are a nice analogy you could use too (i.e. a branching out of subsequent processes and work that come later, but are a consequence of the initial request).
HPsquared | 3 hours ago
a3w | 3 hours ago
[R] = Ohm
Never [Ohms]
maxnoe | 3 hours ago
The authority on the definition of SI units is very clear:
> The hertz shall only be used for periodic phenomena and the becquerel shall only be used for stochastic processes in activity referred to a radionuclide
Usually, no radionuclides are involved in web requests.
https://www.bipm.org/documents/d/guest/si-brochure-9-en-pdf
rglullis | 3 hours ago
perching_aix | 3 hours ago
At least for Twitter there are proxies that work without JS. For Mastodon, none that I'm aware of. I usually just audibly sigh and remark that they shall "keep their secrets then", and move on.
rglullis | 2 hours ago
What I do not understand is someone going through all this work of putting an AI-scraper tarpit on Mastodon, a system that fundamentally needs to have its data distributed to other servers. It's just signalling and posturing, because that content is available on any server that has someone following the account.
(Tip to AI scrapers: if you want to slurp all the data from the fediverse, just create an account on mastodon.social and pull the data from the "Federated timeline" stream.)
NitpickLawyer | an hour ago
It's not you. It's the people that were somehow convinced that serving crap is gonna "hurt" the models. These are people who have 0 clue on how models are trained and how they work, but have been riled up by others who similarly don't understand the technical details, but have strong biases against them. This is ignorance signalling at its finest.
And, as expected, it's hurting their (regular) users more than they'll hurt the model trainers. Oh well..
stingraycharles | 2 hours ago
zarzavat | an hour ago
rglullis | 42 minutes ago
userbinator | 2 hours ago
dalmo3 | 2 hours ago
rglullis | 2 hours ago
rglullis | 2 hours ago
All the talk about "putting the human first" and "embracing diversity" goes out of the window the moment you are not diverse in the way they want.
goblin89 | 38 minutes ago
If you’re in rush to airport or hospital and you are delayed by protesters for a cause you don’t understand, it’s one thing, I could understand a bit of cursing. However, this is someone’s web resource, they are free to do with it whatever they want, and they owe you and me nothing.
rglullis | 29 minutes ago
The problem is that not only their methods "aren't perfect", but completely ineffective.
If you are posting on a public social network, your data will be available to the public, one way or another. The whole protest becomes a "performance art" kind of thing: it might be useful for creating awareness, but in most cases the people who will be seeing it are the ones who are aware of it in the first place.
cassianoleal | 2 hours ago
felooboolooomba | 2 hours ago
theginger | an hour ago
oncallthrow | an hour ago
dnnddidiej | an hour ago
animuchan | an hour ago