Neural Networks as we know them are fundamentally flawed. Here are some thoughts on how to improve them.
This paper gives some technical attacks on neural networks.
To understand what’s going on here, consider this plot to a Hollywood caper (which somebody should totally produce):
An employee of a company wishes to steal a McGuffin from their vault. To do this she follows instructions from a super-spy phone app on how to apply a few smudges to her makeup to make the vault’s security systems think she’s the company’s CEO despite appearing almost unchanged to other people. She then walks through the lobby and automatically gets let in. The actual CEO gets a notification on his phone that he’s entered, freaks out, and races over to the vault, not knowing that a confederate of the thief previously used a super-spy phone app for guidance to make a few carefully placed marks on his glasses which make the security system think he’s the thief when he puts them on. The security system goes off when he tries to go through the lobby, causing the police to show up almost immediately, at which point they get a report from a human watching the security cameras that the thief has broken in impersonating the CEO. Misunderstanding this report as applying to the person in front of them, the police arrest and haul off the actual CEO. After they leave the thief calmly walks out, McGuffin in hand, and escapes.
While getting the timing of this caper to work would be very Hollywood, the technology of fooling the security system exists today, and the problems of neural networks being weak against adversarial threat models is very deeply embedded into their architecture and hard to improve. This is just as terrifying as it sounds.
(This post now gets a bit technical)
This latest analysis, while clearly a negative result, atleast hints at avenues for improvement. For starters, it gives a suggestion of a real well-defined criterion for a high performing neural network, specifically that if a network gives a level of certainty of whether a particular image is or is not of a cat, then if you alter the image a bit the amount of certainty of it being a cat should not change by more than a constant factor of the amount of change which was done, or more well-definedly the L2 norm (Commentary on the differences between the L0, L2, and Linf norms at the bottom of this post. For now I’ll just assume that the L2 norm is the One True Norm.)
Thinking through this criterion leads to some interesting conclusions about our training set and what our network must do, even thinking about it as a black box with perfect behavior. If a picture contains a cat but that cat is too small/far away then the picture as a whole can’t be classified as definitely containing a cat because there’s a very small change necessary to alter the picture to contain no cat. Likewise as the cat gets further away the change from ‘definitely cat’ to ‘definitely not cat’ has to be fairly smooth because if there’s a cliff threshold of distance then the delta between the cat being just within and just outside of that threshold would be a small delta in the image with a big delta in confidence, which is not allowed. Current training sets don’t hint that there’s this cat-in-the-distance effect exists, much less give guidance as to how far far away correlates with what level confidence, so we can’t fault the trained neural networks for getting it wrong, even if they were based on technology which didn’t have the current weaknesses. Likewise cats in super-closeup have the same issues and falloff in confidence (is a picture of a cat’s head a picture of a cat? How about the eye? A portion of the retina?) Adding in these test cases is reasonably straightforward by zooming the same images in and out, but you better be using a pinhole camera with super high resolution because the network is likely to cheat and guess the zoom based on focal depth and zoom artifacts.
Even after we’ve improved out training set that way, there’s another question: If we have an image consisting entirely of 10% cats, meaning a whole pile of cats in the distance, does that add up to the image as a whole being almost certainly cat, or still 10% cat? Current training sets again provide no guidance about this edge case whatsoever, so we can’t blame the networks for getting it wrong. The correct answer is that it should rate the image as a whole as 10% cat, because we want recognition of pictures of cats not pictures full of cat textures. Also you can’t artificially make these photos using image editing because neural networks are vastly better at noticing and evaluating image editing artifacts than they are at cat pictures, mostly because current image editing tools are very tailored to very specifically fool human cognition. Hope you like taking pictures of lots of cats. Is it also necessary to have training set pictures which have backgrounds which are completely filled with 50% cat but a foreground 99% cat to tell it that this should rate 99% cat? Having training data with a foreground 99% cat and background 0% cat probably hints at this enough, but maybe not.
Texture recognition is a lot of the problem here. The above training set augmentations would do a decent job of explaining that far-away textures aren’t quite right and ultra-close-up textures aren’t quite right, but the network is still likely to rate correct distance textures which aren’t actually configured into correct cat anatomy as cats (are six-legged cats cats?) and making training set images to explain the negative is likely to run into the aforementioned image editing recognition problems. Good luck with that.
Now that we’ve firmly established that our current training sets are garbage, we can move on to the problems which are specific to neural networks as they exist today.
The paper linked at the top gives a negative result, showing fundamental limitations of current designs. While it’s a refreshingly rigorous result, arguably the very first one, in a field completely dominated by woowoo bullshit of the form ‘eh, seems to work well in practice’, it leaves completely open the question of why these things work at all, which is now even more mysterious. Everything I’m going to propose below is about getting around the newly articulated problems, but is completely handwavy/optimistic about why these alterations should preserve the aspects of neural networks which work. Hopefully someone can get this all to function well using the current very empirical approach.
That caveat aside, here are some ideas:
(This post now gets really really technical.)
The most straightforward way to stop these sorts of attacks is to make it so that instead of a small constant number of features in the output we have 2N features in the output, since the N inputs can’t control the overall direction of the sum of those in a controlled way for simple information theoretic reasons. This may be why neural networks sometimes seem to work better when they’re seemingly badly overparameterized, in exact contradiction with the intuition behind normal statistical models, because they’re hitting on something like this technique accidentally. Arguably this is a gimmick which doesn’t generalize or result in one-time rather than fundamental/asymptotic improvement but at least it hits the desired criterion and if it results in empirical performance improvements that’s a win.
Making 2N outputs has an obvious problem: The outputs are all likely to be highly correlated with each other. Trying to use training to decorrelate them seems highly problematic, but there’s a blunt force approach which might work: For each output pick some fraction of the inputs, for example N^(2/3), and train a neural network for that output based on only those inputs. This can be made a little cleaner by partitioning the inputs into completely non-overlapping sets of the correct size for the first N^(1/3) outputs, then repeating that N^(1/3) times. It also may be helpful to select the sets using a gaussian distribution around one point instead of uniformly. In any case this looks like another non-generalizing brute force hammer, but at least it has a certain amount of intuitive justification for why it should help with the criterion we’re aiming for.
Obviously this has a serious problem: It requires N^(5/3) computation and space for the neural networks, which is completely outrageous even for small images. (If you use a different number of inputs for each output this value will be different but I’ll be sticking with a single example for clarity). But don’t worry, I have another gimmicky trick. We can stick a layer of 2N intermediate outputs in the middle, and make it so that each of the middle values is based off N^(1/3) of the inputs and each of the final outputs is based off N^(1/3) of the middle values. As before, the training system is not allowed to alter which of the inputs are factored into which outputs but is free to structure the traditional neural networks in the middle however it wants. This approach reduces the asymptotic to N^(4/3), which is much more reasonable. Putting in more layers would reduce that even more, asymptotically approaching N¹, presumably with tradeoffs between computation required and how well the resulting network performs.
Sidebar about norms: The result linked at the top talks about L0 norms, but that’s a very weak and artificial criterion and anything which can work against L2 norms can work against L0 norms. My commentary about training sets works on L2 norms but is cheesecloth against the Linf norm. The techniques given likely/hopefully work on the L2 norm and might have some limited defense on the Linf norm. My suspicion is that trying to be defended in the Linf norm is profoundly misguided. It’s possible to force an Linf norm change to make a meaningful L2 norm change without very much change to human perception just by dropping some of the low-order information, but that’s meaningful information which is being dropped. Aiming for the Linf norm is probably just attempting to force neural networks to replicate a particular artifact of the limitations of human cognition because we can’t admit that our perception sucks and techniques which can make better use of that information are valuable.