Posts

Showing posts from June, 2016

Do Neural Networks Dream of Strictly Convex Sheep?

Image
  A parting thought before leaving Herndon, written on a flight to Dallas after a week at Amazon Machine Learning University. About what we wanted from optimization landscapes and what we got instead. Courtesy: Amazon MLU There is a moment in training a deep network on a g2.2xlarge when the loss does something I can only call insolent. It drops for a few epochs, plateaus for what feels like forever, jitters sideways, then drops again into a region where the gradient is essentially noise around a slow downward drift. My textbooks call this descending. My eyes call it wandering. The optimizer is moving through a country we do not have good language for, because the language we have was built for a country that does not exist. That country is convex. I am writing this somewhere over Tennessee after a week in a windowless room learning what Amazon thinks its engineers should know about machine learning. The instructors were good. The math was rigorous. But there was a recurring tension...