
 
Below the short overview is provided from the Deep Learning Summer school 2016 in Montreal and papers with high impact. 
  
 Neural Networks, Hugo Larochelle
  Tips for training NNs:
  
    -  Random search over the grid search of parameters, as grid search repeats lots of experiments for each value of non-reasonable parameter value. Bayesian optimization in practice will not work better than a random search
-  Tuning regularization has much less benefit than tuning learning rate, etc 
-  Early stopping on the validation dataset comes at low cost and should be used. We need to divide by 2 learning rate each time when validation error stops decreasing.)
-  Don't neglect dropout, bigger regularized models are more powerful than simpler ones
Recurrent Neural Networks,Yoshua Bengio
High gradients are sensitive to noise, we want to keep gradients less than 1. However vanishing gradients phenomena can also occur.
When gradient is large, dont trust it, use gradient norm clipping (Mikolov thesis)
  
Yoshua told that forward propagation could be used in the future instead of the back-propagation..
Convolutional Neural Networks, Rob Fergus
Pooling is used for feature invariance plus a larger receptive field.
  Depth of the network is the key. Towards the deeper layer both vertical and horizontal translations are better.
  Anneal the learning rates, take a small batch. Take better a bigger model and regularize it better than a smaller model.
  The most interesting papers mentioned:
  
    - Going Deeper with Convolutions Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed,Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich. The structure of the inception model, the winner of ILSVRC 2014 is described there.
    
-  Yet another nonlinearity proposed for CNNS, where nonlinearity is learned for each layer
  Parametric Exponential Linear Unit for Deep Convolutional Neural Networks( These results suggest that varying the shape of the activations during training along with the other parameters helps to control vanishing gradients and bias shift, thus facilitating learning.)
  
-  Residual networks (everybody talks about them now, used for CNN, won the competition in ILSVRC 2015)
     Deep Residual Learning for Image Recognition  (introducing short-cut connections of input to deeper layers)
    
- Segmentation. Research done by Girchik on the segmentation. Now doing the research in FB
    Learning to Segment Object Candidates Detecting fully grained segmentation not only borders, scale resistance
- Segmentation: Research done by Ross Girshick on the segmentation. Now doing the research in facebook.
- Learning to Segment Object Candidates.Detecting fully grained segmentation not only borders, scale resistance
Computer vision, Antonio Garibaldi
Natural Language Processing, Kyunghyun Cho 
    - Problems in NLP: sparsity and zero probabilities - need to introduce smoothing to the probabilities.
    Lack of generalization: neural language modelling (word-to-vec representation), that can generalize to the unseen relations.
- RNN concept realizes non-markovial modelling, where the probability depends on the previous states and can take into account a sequence of previous words.
- In RNN the biggest problem is vanishing gradient. Cho says we can introduce shortcut connections.
- All we need in RNN for now: LSTMs or Gated Recurrent Units (GRU) model. Both work good and are trained easier than classical RNNS. 
- GRU model: Learning phrase representations using rnn encoder-decoder for statistical machine translation.
- Effective Approaches to Attention-based Neural Machine Translation(global and local attention architectures, application german to english in both directions)
- Larger Context Language Modelling with RNN (improved LSTM capturing context). This analysis suggests that larger-context language model improves the unconditional language model by capturing the theme of a document better and more easily. 
The summary was written by Luiza Sayfullina. All comments please send to luiza.sayfullina@aalto.fi.