“Free” as in “mattress”.
While the advice in this post is all still quite correct – wholly independent backups are your only realistic hope of long-term data integrity – there’s an interesting quirk here that will be of interest to anyone trying to set up a large home storage array.
Finally, after all these failures, I split my storage array up into 3 pieces so that they'd be independent. But I was still experiencing an unusually high rate of physical device failure. Even as long ago as 2007, hard drives should be replaced at somewhere between 2% and 10% annually. So, if I had 10 drives, I should replace maybe 1 of them per year. But I was replacing them at a rate of maybe 60%-70% per year.
As a final hail mary, I went out and bought a UPS, and plugged all of the enclosures into it, and…
…I haven’t lost a single disk in the intervening 2 years.
After having a couple of months of good experience with this setup (i.e. after going for about 3x my previous mean time between failures with zero failures) I went back and looked at the numerous reviews on various JBOD enclosures. The bad reviews almost all list issues which are power-related; "turning off randomly" is highly correlated with data loss.
So my working hypothesis here is that most consumer-grade JBOD enclosures are simply not conditioning their power adequately to support hard disks, and require an external UPS to ensure even a baseline level of data integrity.
Consider this fairly typical Python class:
class Publisher(object):
def __init__(self):
self._subscribers = []
def publish(self, event):
for subscriber in self._subscribers:
subscriber.published(event)
def subscribe(self, subscriber):
self._subscribers.append(subscriber)
If a user of this class has a bug where they accidentally pass something that isn't a Subscriber
to subscribe
, it fails slow. You don't find out that you screwed up until the next call to publish
, and by that point, it's likely too late to figure out who, exactly, screwed up.
The only way to deal with this is to insert tedious isinstance
checks everywhere, raise your own TypeErrors
with your own error messages, and so on. The result: nobody ever bothers.
I want to be able to do, instead, something like this:
self._subscribers = list[Subscriber]()
Then I could get a nice TypeError
at subscribe
time instead of publish
time.
It would also be great if I could do:
self._subscribers = dict[str:Subscriber]()
Formal correctness proofs – much like another of Dijkstra's bugaboos, static type systems – are a substantial amount of effort to use. Formal correctness proofs are so difficult to use that I'm not aware of any research project to validate their effectiveness, but static type systems are the closest thing we have in the field of software industry to provably worthless.
(To be fair, my favorite bugaboo, test-driven development, also has relatively weak empirical support for its benefits. But of course in the case of the thing that I like, the study was flawed and in the case of the thing that I don't like, one study is definitive.)
Critically, there are times when, given the constraints of a problem (both political and technical), PHP is actually the better option. For example, for all its warts, it's generally a better idea to build a web application in PHP than in C. This despite the fact that C is quite a reasonable programming language certain problems, it is considerably more labor-intensive than PHP and can result in even more security-critical mistakes.
Inheritance is bad because
Wheel
andTortilla
should not both inherit fromCircle
.– David Reid
One very common problem with class inheritance, of course, is that you don't get anything so clean as Tortilla
and Wheel
inheriting from Circle
. Generally in a large, real-world project, you get Circle
inheriting from both Tortilla
and Wheel
, because you already had a Tortilla
and a Wheel
at the time you discovered you needed a Circle
, and it was so much easier to just do that for now.
So now all Circle
s have numberOfSpokes
and also cornOrFlour
attributes.
To quote Keith:
“… it was a criticism about the sound synchronization in a Playstation emulator. I would have expected to find some buffering system implemented in the audio code, but instead it was only limited by how often the call to write(/dev/dsp) returned. That had the effect of delaying all your sound by the size of the kernel audio buffer.”
No more arbitrary licensing on copyright. Given that it's a state-granted monopoly, you can live with state-granted pricing.
Here's how damages work: if you make a copyrighted work, and someone else sells media (where "media" could be paper, could be a service where you can download things, could be CDs; anything where you transfer information via some mechanism for a fee), you are then entitled to a percentage of their gross revenue. Not their net profit: they can't just pay themselves all the gains as "costs" like salary and then pretend that the information is worthless. But you get a fixed percentage of that net.
And if the marginal cost of reproduction is zero, and nobody is running a business to copy your content, people are just doing it for free as gifts to each other, then TOUGH. Get another job.
I encourage all developers to participate in Julython. It's always been a great motivator for me to get back to old projects or explore new ideas.
It's also a great excuse to spend a bit of time cleaning up projects and getting them up-to-date because it rewards breadth of work more than depth.
In case you didn't think there were enough nails in SSL's 15-year-old-at-this-point coffin, SSLv3 is now completely broken. SSLv2 has been for a long time.
There is no SSL. There is only TLS.
There's a whole confusing taxonomy of the various kinds of things your tests can test against which aren't "real", so I probably am not going to repeat that here. Many of the reasons I covered above indicate why you might want to have more or less real fakes.
One particular term I use a lot which I don't see covered elsewhere in the literature is verified fake.
When you write a library, you provide an implementation of the thing the library does. But if your library does I/O (makes an HTTP request, generates an HTTP response, pops up a window, logs a message, whatever), you've just introduced a new barrier to testing: callers of your library might want to test their code that is talking to your thing, and how are they supposed to figure out if your thing did what they wanted it to?
A good library - and the libraries that I maintain are struggling to be "good" in this sense, for the most part they're not - will provide you a real (i.e. not a fake, double, stub, mock, or dummy) in-memory implementation of their functionality. One of the best examples of this is SQLite. If you need to test code that uses SQLite, you just make an in-memory SQLite database and supply it; there's virtually no reason to fake out the database.
One step removed from this is providing a verified fake - an implementation of your functionality which doesn't do anything "useful" (like an in-memory SQLite database does) but nevertheless is verified against (a subset of) the same test suite as the real implementation, as well as providing an introspection API that allows test cases to verify that it did the right thing. This allows client code to import the fake from your library, test against it, and have a reasonable level of assurance that their code is correct in terms of how it's using the API. When they upgrade your library and its interface has changed, their tests will start failing.
Tests which use an unverified fake have a maintenance burden: they must manually keep the fake up to date with every version bump on the real implementation.
Tests which use a real implementation will then be relying on lots of unimportant details, and will be potentially unreliable and flaky as real external systems (even systems you might not usually think about as "external", like the filesystem, or your operating system's clock) have non-deterministic failure modes.
Tests which use a verified fake get the benefits of a unit test (reliability, speed, simplicity) with the benefits of an integration test (assurance that it "really works", notification of breakage in the event of an upgrade) because they place the responsibility for maintenance of the fake along with the responsibility for the maintenance of the interface and its implementation.
The issue is what happens when there's packet loss. If you blindly blast out all of your player's movements with UDP, then when there is some packet loss, their movements will start to appear jerky, they'll pause periodically, and they'll take impossible paths, but, assuming their network recovers, they'll generally end up in the right place.
If you blindly send your players movements over TCP, then when there's packet loss, they'll pause, then rapidly catch up, but take an accurate path.
It would be nicer if the preview "button" were instead a live side-by-side view of the rendered markdown, a-la stack overflow. (Except make it responsive, so if I have a full-screen window with scads of white space on the left and right, I can glance over to the left or right rather than scrolling down.)
Do you remember when I used to be Merlin Mann?
Yes! You used to talk into your shoe…
That's not to say I would, precisely, endorse Node. I think there are better options. Other languages can provide some of these advantages, but not all of them in combination, simply because no language has achieved the ubiquity of JavaScript.
Clojure has a fairly serious effort in ClojureScript. They have a page that nicely compares JavaScript and ClojureScript.
Python has PyJS. I've played with it enough to know that in combination with Twisted, it's possible to share logic between client and server in a manner very similar to Node. I even gave a talk last year which sneakily introduced people to this concept. PyJS is a project which is struggling to re-gain its footing, but I have found that it's worth dealing with its idiosyncrasies just to avoid the rigamarole of comparing two arrays in JavaScript.
Hopefully the next time you see a PHP developer headed for your inputs with some backslashes and extra quotes, you'll have a clear mental picture of a concerned-looking medieval physician headed for you with a nice, sharp corkscrew.
React accordingly.
P.S.: If you still insist on pseudo-latinate prescriptivist grammatical dogmatism, but are still sensitive to the gender-neutral language issue and prefer to use some nonsense like Spivak, I will still think you're a bit daft. However, that's a clear improvement upon the alternative, which is for me to think you're rude and inconsiderate.
I should also probably stipulate that defects in a payment processor and errors in payment are a serious problem, and therefore it is important to avoid them. Sufficiently bad problems can lose you money, or cause you to steal money from your customers, or enable someone else to steal money from them, which can (at least hypothetically) land you in jail.
Unfortunately while the hypothetical consequences of Bazaar's mutability are great, the practical consequences are more problematic; years of format-migration bugs and headaches.
Thus, one of our atavistic tendencies as software developers reinforces another: because diff and patch tools all operate at the level of line additions and deletions, we can't use tools which manipulate graph structures but don't contort themselves to preserve artificial formatting constraints to avoid producing giant, incomprehensible deltas.
Streams by this user that have been favorited by others.
While the advice in this post is all still quite correct – wholly independent backups are your only realistic hope of long-term data integrity – there’s an interesting quirk here that will be of interest to anyone trying to set up a large home storage array.
Finally, after all these failures, I split my storage array up into 3 pieces so that they'd be independent. But I was still experiencing an unusually high rate of physical device failure. Even as long ago as 2007, hard drives should be replaced at somewhere between 2% and 10% annually. So, if I had 10 drives, I should replace maybe 1 of them per year. But I was replacing them at a rate of maybe 60%-70% per year.
As a final hail mary, I went out and bought a UPS, and plugged all of the enclosures into it, and…
…I haven’t lost a single disk in the intervening 2 years.
After having a couple of months of good experience with this setup (i.e. after going for about 3x my previous mean time between failures with zero failures) I went back and looked at the numerous reviews on various JBOD enclosures. The bad reviews almost all list issues which are power-related; "turning off randomly" is highly correlated with data loss.
So my working hypothesis here is that most consumer-grade JBOD enclosures are simply not conditioning their power adequately to support hard disks, and require an external UPS to ensure even a baseline level of data integrity.
To quote Keith:
“… it was a criticism about the sound synchronization in a Playstation emulator. I would have expected to find some buffering system implemented in the audio code, but instead it was only limited by how often the call to write(/dev/dsp) returned. That had the effect of delaying all your sound by the size of the kernel audio buffer.”
I encourage all developers to participate in Julython. It's always been a great motivator for me to get back to old projects or explore new ideas.
It's also a great excuse to spend a bit of time cleaning up projects and getting them up-to-date because it rewards breadth of work more than depth.
In case you didn't think there were enough nails in SSL's 15-year-old-at-this-point coffin, SSLv3 is now completely broken. SSLv2 has been for a long time.
There is no SSL. There is only TLS.
There's a whole confusing taxonomy of the various kinds of things your tests can test against which aren't "real", so I probably am not going to repeat that here. Many of the reasons I covered above indicate why you might want to have more or less real fakes.
One particular term I use a lot which I don't see covered elsewhere in the literature is verified fake.
When you write a library, you provide an implementation of the thing the library does. But if your library does I/O (makes an HTTP request, generates an HTTP response, pops up a window, logs a message, whatever), you've just introduced a new barrier to testing: callers of your library might want to test their code that is talking to your thing, and how are they supposed to figure out if your thing did what they wanted it to?
A good library - and the libraries that I maintain are struggling to be "good" in this sense, for the most part they're not - will provide you a real (i.e. not a fake, double, stub, mock, or dummy) in-memory implementation of their functionality. One of the best examples of this is SQLite. If you need to test code that uses SQLite, you just make an in-memory SQLite database and supply it; there's virtually no reason to fake out the database.
One step removed from this is providing a verified fake - an implementation of your functionality which doesn't do anything "useful" (like an in-memory SQLite database does) but nevertheless is verified against (a subset of) the same test suite as the real implementation, as well as providing an introspection API that allows test cases to verify that it did the right thing. This allows client code to import the fake from your library, test against it, and have a reasonable level of assurance that their code is correct in terms of how it's using the API. When they upgrade your library and its interface has changed, their tests will start failing.
Tests which use an unverified fake have a maintenance burden: they must manually keep the fake up to date with every version bump on the real implementation.
Tests which use a real implementation will then be relying on lots of unimportant details, and will be potentially unreliable and flaky as real external systems (even systems you might not usually think about as "external", like the filesystem, or your operating system's clock) have non-deterministic failure modes.
Tests which use a verified fake get the benefits of a unit test (reliability, speed, simplicity) with the benefits of an integration test (assurance that it "really works", notification of breakage in the event of an upgrade) because they place the responsibility for maintenance of the fake along with the responsibility for the maintenance of the interface and its implementation.
No more arbitrary licensing on copyright. Given that it's a state-granted monopoly, you can live with state-granted pricing.
Here's how damages work: if you make a copyrighted work, and someone else sells media (where "media" could be paper, could be a service where you can download things, could be CDs; anything where you transfer information via some mechanism for a fee), you are then entitled to a percentage of their gross revenue. Not their net profit: they can't just pay themselves all the gains as "costs" like salary and then pretend that the information is worthless. But you get a fixed percentage of that net.
And if the marginal cost of reproduction is zero, and nobody is running a business to copy your content, people are just doing it for free as gifts to each other, then TOUGH. Get another job.
No more vague patents on software. A vague description of an application of a concept to programming is not a "design", it's a daydream. If you want patent protection, you submit a fully working program, including a functioning build system. Then the patent office makes that available, and you only get to bring litigation against people who have actually downloaded that program and either used it or read it. If they had a parallel invention, no deal. If they wrote a program that did something similar but without using your exact source code as a reference, no deal.
No more “Terms And Conditions”. Contracts of adhesion are bullshit. You want to obligate some people to a gigantic pile of fine print that you just pulled out of your ass? Fine, you have two options:
If you can do all of that, then great, but cascading hardware/software failures or malware infections are both dramatically more likely than your whole house burning down or blowing up, so the most important aspect of this is to have backups on a separate volume that can be (and often is) disconnected and moved somewhere else when there's a problem.
Ideally, “wholly independent” means:
The issue that I was facing here was not that disk hardware is unreliable, or that consumer disks fail.
Btrfs was compensating quite nicely for both the failure rate of the physical drives (by having raid1 level redundancy) and the errors introduced by USB weirdness, resets, and general unreliability (by checksumming all reads and relying on replicas when checksums failed).
I've said it already but I must stress that numerous physical drives had already failed in this filesystem already and the recovery process from that failure was as seamless as advertised.
Hopefully many of the ephemera here will have been useful to readers, but by far the most important lesson to take away from this unfortunate experience is this:
Make sure you have wholly independent backups.
Luckily, there's a workaround: plug the drives in to a USB hub, so they're not directly connected to the root hub. It seems that the BIOS will helpfully disregard them in that configuration, leading to normal POSTing. Right now I'm using a fairly old USB3 hub, and it's shaving 10 megs a second off my write performance and 30 megs a second off my read performance (via btrfs→openssh→ssh→osxfuse→sshfs→Blackmagic Disk Speed Test), but it's still a good 3x faster than USB 2.0, and a better hub might have better results.
In a final, ironic twist, it turns out that with both drive bays plugged in and turned on … the server won’t POST. It just sits there at a black screen forever, no BIOS logo.
I've got both bays plugged in, both churning away on I/O, for the better part of 6 hours now and there is no appreciable error rate or interesting log traffic or anything. It seems as though the whole problem might have been this one ever so slightly outdated JMicron chip.
After plugging in the aforementioned DATOpic adapter… I'm cautiously optimistic, it appears that this has straight-up solved the unreliability issue (at least on Linux, I haven't tested on a Mac yet).
In fact not only is it from JMicrom, but my NexStar HX4 is a "JMS 539 PM" chipset and this one is a "JMS 539 B", which suggests that they're quite similar. Nevertheless, it seems that several people with my exact issue have switched over and it's been working well for them.
The modification times on the files from various sketchy firmware sites strongly suggest that the "JMS 539 B" is a significantly more recent chip though, which gives me hope (sort of)?
I guess I'm still going to try this experiment, but this review indicates that it still has the "occasional reset" problem –although perhaps only during a hot-swap? – and this one indicates it's still from the cursed land of JMicron.
After 48 hours of nearly continuous web searching for solutions to this problem, it appears that the most reliable guidance is to buy a discrete USB3/eSATA translation device with substantially better reviews than any of the enclosures, and then connect everything with eSATA cables.
So I guess I'm going to spend $60 to find out if that's a viable solution.
Thoughts by this user that have been liked by others.
Generator comprehensions are a not-terribly-effective workaround for the lack of anonymous blocks.
While they work nicely enough for some common use-cases, they also lead to heinous hacks when trying to satisfy reasonable use-cases.
Maybe this means that you need to be able to express suites as expressions. That's fine.
Consider this fairly typical Python class:
class Publisher(object):
def __init__(self):
self._subscribers = []
def publish(self, event):
for subscriber in self._subscribers:
subscriber.published(event)
def subscribe(self, subscriber):
self._subscribers.append(subscriber)
If a user of this class has a bug where they accidentally pass something that isn't a Subscriber
to subscribe
, it fails slow. You don't find out that you screwed up until the next call to publish
, and by that point, it's likely too late to figure out who, exactly, screwed up.
The only way to deal with this is to insert tedious isinstance
checks everywhere, raise your own TypeErrors
with your own error messages, and so on. The result: nobody ever bothers.
I want to be able to do, instead, something like this:
self._subscribers = list[Subscriber]()
Then I could get a nice TypeError
at subscribe
time instead of publish
time.
It would also be great if I could do:
self._subscribers = dict[str:Subscriber]()
There is a popular misconception that "HTTPS" stands for "HTTP over SSL" which reinforces this confusion. For the record, doesn't, it stands for "HTTP Secure".
If you're a game programmer who wants to start adding some networking to a game, just use TCP for now.
Then, learn about:
When you are done with all of that, then it's time to start investigating UDP.
No more “Terms And Conditions”. Contracts of adhesion are bullshit. You want to obligate some people to a gigantic pile of fine print that you just pulled out of your ass? Fine, you have two options:
No more vague patents on software. A vague description of an application of a concept to programming is not a "design", it's a daydream. If you want patent protection, you submit a fully working program, including a functioning build system. Then the patent office makes that available, and you only get to bring litigation against people who have actually downloaded that program and either used it or read it. If they had a parallel invention, no deal. If they wrote a program that did something similar but without using your exact source code as a reference, no deal.
No more arbitrary licensing on copyright. Given that it's a state-granted monopoly, you can live with state-granted pricing.
Here's how damages work: if you make a copyrighted work, and someone else sells media (where "media" could be paper, could be a service where you can download things, could be CDs; anything where you transfer information via some mechanism for a fee), you are then entitled to a percentage of their gross revenue. Not their net profit: they can't just pay themselves all the gains as "costs" like salary and then pretend that the information is worthless. But you get a fixed percentage of that net.
And if the marginal cost of reproduction is zero, and nobody is running a business to copy your content, people are just doing it for free as gifts to each other, then TOUGH. Get another job.