AI Conference Technology

Talk Review: Deep Learning: Practice and Trends, NIPS 2017 – Part I

Deep Learning has become an essential toolbox which is used in a wide variety of applications, research labs, industries, etc. In this tutorial given at NIPS 2017, the speakers provide a set of guidelines which will help newcomers to the field understand the most recent and advanced models and their application to diverse data modalities.

1 Introduction

Deep Learning has become an essential toolbox which is used in a wide variety of applications, research labs, industries, etc. In this tutorial given at NIPS 2017, the speakers provide a set of guidelines which will help newcomers to the field understand the most recent and advanced models and their application to diverse data modalities.

2 Practice

It is common to think of deep learning as a toolbox enabler, with a rich source of papers, source codes, and tutorials available to people interested in deep learning. Generally, users may have the options to decide the aspects of their neural network architecture at the model level, i.e., what are the inputs and outputs, what is the task and how does the network optimize to perform the task. The users can put these together and obtain a working model. However, zooming out from the model level, there are also some important decisions to be made even prior to constructing the model. These include the following issues:

  • Platform
    • How to deploy the model?
    • What will the model be trained using, i.e., GPUs vs CPUs?
  • Framework
    • What are the differences between the available frameworks, and which one to select?
    • What are the limitations of the selected framework?
    • Is the selected framework suitable for the platform of interest?
  • Dataset
    • There is a vast amount of datasets to work with, which is most appropriate?
    • What does the dataset look like, how big are the dimensions, etc.

Once these decisions are made, the user can zoom in on the model level, and focus on decisions which impact the neural network architecture. These include the following decisions:

  • Activations: Which non-linearities should be chosen, i.e., ReLu, sigmoid, tanh, GRU, etc.
  • Algorithms: Which optimizer should be chosen, i.e., SGD, Momentum, Adam, etc.
  • Connectivity patterns: Which type of connection does the neural network take on, i.e., fully connected, convolutional, recurrent, recursive, etc.
  • Loss function: Which type of loss function should the network optimize, i.e., cross entropy, MSE, adversarial, etc.
  • Hyperparameters: These include learning rate, layer size, batch size, dropout rate, weight initialization, etc.

These are all important decisions that go into the process of building a neural network. Nando de Freitas categorizes these decisions into three main components: Inputs and Outputs, Architectures, and Losses.

2.1 input and outputs (I/O)

The most common I/Os are in the form of vectors. The elements of these vectors are often the attributes of interest in the data. These vectors are often weakly structured, i.e., elements corresponding to different attributes may take on different types of data, or vary by orders of magnitude.

Images are also an important type of I/Os. Images have much higher dimensions than vectorized inputs in general, and can be used for a wide range of applications, i.e., classification, segmentation, generative models, art, etc.

Another type of I/O is sequences. Some common sequences include words/letters, speech, videos, sequential decision making, etc. Sequences are in a sense, extension of images.

2.2 Architectures

Here, Nando de Freitas explains three key building blocks that are heavily used in deep learning. All of the architectures discussed have a common characteristic: they have the correct inductive biases. As commonly known, deep learning often tries to avoid hand-tuned, or engineered features. However, it is beneficial to have correct inductive biases induced into the architectures. This concept of inductive biases will be important in the following subsections.

2.2.1 Convolutional Networks

Convolutional neural networks (ConvNet) have been around for quite a long time, and are very similar to ordinary neural networks. They are made up of neurons that have learnable weights and biases. Each neuron receives some inputs, performs a dot product and optionally follows it with a non-linearity. So what is the difference between a ConvNet and ordinary neural network? ConvNet architectures make the explicit assumption that the inputs are images, which allows us to encode certain properties into the architecture. These then make the forward function more efficient to implement and vastly reduce the amount of parameters in the network.

With the explicit assumption on the inputs, the key inductive bias that convolutional neural networks use is invariance. There are two types of invariances which are of interest when dealing with images, locality and translation invariance. Locality means that pixels nearby are correlated. Translational invariance refers the appearance of objects being independent of location. By incorporating locality as an assumption, the architecture may go from fully-connected to locally-connected, therefore reducing computation without losing much information. The second assumption of translational invariance says that for example whether an object is on the top-left or bottom-right, the network will use the same filter to analyze the image (the weight matrix of a convolutional layer is often called a convolution kernel, or filter).

The ImageNet challenge was a highlight of ConvNets. In 2012, ConvNets provided the first major classification error improvement from 0.26 to 0.16 (the previous improvement was 0.28 to 0.26 using ordinary neural nets). By 2016, ConvNets had achieved a classification error rate of 0.03. ConvNets have thus since become the standard for processing images. Figure 1. shows ConvNets progress in the past 8 years.

image (3).png
Figure 1. ImageNet challenge winners and their corresponding architecture and performance.

Here, the reviewer will leave readers with several classic ConvNets to look up if interested.

  • AlexNet: The first work that popularized Convolutional Networks in Computer Vision was the AlexNet, developed by Alex Krizhevsky, Ilya Sutskever and Geoff Hinton (2012).
  • ZF Net: The ILSVRC 2013 winner was a Convolutional Network from Matthew Zeiler and Rob Fergus.
  • GoogLeNet: The ILSVRC 2014 winner was a Convolutional Network from Szegedy et al. from Google.

Training a ConvNet is not an easy task. The depth (number of layers) of a ConvNet is an important design decision. Computation complexity is the main bottleneck produced by adding additional layers. In theory, convolution can be parallelized, but not depth. As a result, ConvNets will become slower with increasing depth. One way to tackle this is to use smaller convolutions (current state of the art ConvNets almost always use 3×3 convolutions). For example with a 7×7 pixel image, instead of performing a single convolution of the image with itself (7×7 convolution), compute 25 3×3 convolutions.

2.2.2 Recurrence Networks

Recurrent neural networks (RecNets) are popular architectures that have shown great promise in many natural language processing (NLP) tasks. There are two key ingredients when processing languages, neural embeddings and recurrent language models. The discovery of the two ingredients is the reason why deep learning has been able to develop into a tool box for efficient NLP.

The consequences of embedding vectors gives rise to the encoder-decoder paradigm. An encoder-decoder framework is one where the encoder encodes the input word and the decoder produces a target word. In addition, the key insight of neural embedding is that a word can be represented as a one-hot encoding. This allows systems to take text and convert it into a vector, and define a vector space representation. Recurrent language models on the other hand, have been (empirically) shown to outperform other language processing modeling approaches. As mentioned, the key insight here is the vectorization of context. The idea is that each word of a sequence of words is one-hot encoded, and used as input to the network which predicts the next word in the sequence. The next word is then calculated to be the one with the highest likelihood, Prob(w_t | w_1, w_2, …, w_(t-1)), according to some model. The problem here is that the system must keep track of all the previous words, or there must be a predetermined, fixed number of words to keep in memory. RecNets solve this issue in a very natural way.

RecNets embed the words one at a time, but each word has (an unfixed number of) hidden states which are continuously updated. Thus, the RecNet takes the word embedding (multiplied by a matrix) and the previous state (multiplied by another matrix), sum these two vectors together, apply a non-linearity and therefore, defines the next hidden state. From the hidden states, the RecNet can predict the next word. Due to the flexibility in the number of hidden states to keep in memory, RecNets have more invariance. RecNets are currently the state-of-art of language models in terms of the performance they achieve on test sets.

2.2.3 Recurrence Networks with Attention

A slight extension to RecNets for language models is a network which can read in a sequence of words and output another sequence of words. The main idea is that, instead of generating word-by-word, the network will generate an output from a sequence of words (and hidden states). For instance, the network will take a sequence of French words, read it all in, and then start generating the translation in English. The sequence-to-sequence (Seq2Seq) framework also relies on the encoder-decoder paradigm, where the encoder encodes a sequence and the decoder outputs a sequence.

This simple idea along with the use of RecNets has become the cornerstone of machine translation and has be shown to generate a lot of success. Figure 2. displays the progress of RecNet-based machine translation models’ performance in the past 4 years. Machine translation is measured using BLEU (bilingual evaluation understudy), an algorithm for evaluating the quality of text which has been machine translated from one natural language to another.

image (4).png
Figure 2. The BLEU progress of RecNet-based models for machine translation.

A performance increase can be seen in RecNets for machine translation which eventually outperform traditional statistical models (Moses SMT) and state-of-the-art (SOTA) statistical models.

The reviewer will refer interested readers to papers proposing the idea of Seq2Seq type language processing.

  • Auli, M., et al. “Joint Language and Translation Modeling with Recurrent Neural Networks.” EMNLP (2013)
  • Cho, K., et al. “Learning Phrase Representations using RNN Encoder-Decoder for Statistical MT.” EMNLP (2014)
  • Sutskever, I., et al. “Sequence to Sequence Learning with Neural Networks.” NIPS (2014)
  • Bahdanau, D., et al. “Neural Machine Translation by Jointly Learning to Align and Translate” ICRL (2015)

Recently, there has been further development in RecNets for machine translation by combining the concept of attention with RecNets. However, it has been observed that fixed size embeddings are easily overwhelmed by long inputs or long outputs, leading to a decrease in performance.

Attention relieves this bottleneck. Attention is a mechanism that forces the model to learn to focus on specific parts of the input sequence when decoding, instead of relying solely on the hidden states. The model now includes a “context” vector at the input, where the context vector computes a weight for each hidden state of the encoder. Intuitively, this allows the decoder to predict the output based on the most relevant hidden states, thereby effectively reducing the length of the input sequence. The Bahdanau paper above goes into more detail on this.

Finally, the reviewer will leave the reader with some references regarding attention.

  • Luong, M.-T. et al. “Effective Approaches to Attention-based Neural Machine Translation.” EMNLP (2015)
  • Xu, K., et al. “Show, attend and tell: Neural Image Caption Generation with Visual Attention.” ICML (2015)
  • Andrychowicz, et al. “Learning Efficient Algorithms with Hierarchical Attentive Memory.” arXiv preprint arXiv:1602.03218 (2016)


Loss functions are functions that map an event or values of one or more variables into a real number to intuitively represent some “cost” associated with the event. Depending on the I/Os and the architecture of the system, the loss function specified may be different. Nando lists the common loss functions for the architectures which were discussed. For the convolutional architectures which perform classification, the loss function that is commonly used is the softmax cross-entropy which may add an L2 norm regularization term. When it comes to the recurrent architectures, the common loss functions used are the softmax cross entropy for discrete cases, and Gaussian (mixture) likelihood models for continuous cases.

3 Summary

Freitas introduced the current practices of deep learning in the applications of image processing and natural language processing. Freitas discussed the progress made in image processing, with a focus on ConvNet architectures. Despite their success, there are still some challenges regarding computational complexity and optimization techniques in training deep ConvNets. These are still very open research areas. As for the domain of natural language processing, current practice is to combine the use of RecNets and the attention mechanism to achieve optimal performance. For machine translation applications, the performance of current best models still has lots of room for improvement. We are excited to see how much progress can be made in near future.

Author: Joshua Chou | Editor: Hao Wang, Michael Sarazen


1 comment on “Talk Review: Deep Learning: Practice and Trends, NIPS 2017 – Part I

  1. Er zulässig andererseits nicht die Implantation ausgesprochen großer Brustimplantate ferner es
    besteht das Risiko, dass bei bestimmten Körperhaltungen eine Narbe in jener Achselhöhle sichtbar bleibt.

    Chip Form dieser Brustimplantate kann hiermit kreisrund Oder anatomisch Sein und eine glatte oder texturierte
    (raue) Oberfläche (über etwas) verfügen. Angenommen, dass Chip Brustimplantate oberhalb der Muskeln günstig werden, sind Chip Wehtun kleiner besonders.
    Hier (etwas) zu tun haben Chip Wünsche überdies Vorstellungen mit den Realisierungsmöglichkeiten abgeglichen werden, um ein Optimalwert Resultat zu erzielen. Falls möglichst unsichtbare Narben gesucht werden, so
    Bedingung jener Chirurg den Schnitt im Brustwarzenhof serialisieren.
    Die ersten Wochen entsprechend welcher Operation sollten Ebendiese in der Regel hinauf Deutsche Mark Gebirge schlafen, alles andere als bloß hinaus DEM Bauch.
    Vermeiden Selbige Ansuchen in welcher Zeit, hinaus dem
    (dicker) Bauch zu liegen. Was zu Gunsten von eine
    Struktur ist ür eine Brustvergrößerung (D) es ist vorzuziehen, dass
    geeignet: um … herum oder oval? Eine schönheitschirurgische Brustvergrösserung
    wird welches Volumen Ihrer Brust anheizen im Übrigen gefestigt ebenso die
    Geflecht entscheidend heben. So lassen eine große Anzahl Patientinnen, anliegend denen die Brustkasten ausgelaugt ist u.
    a. weiterhin Volumen verloren hat, eine Brustvergrößerung zu
    zweit eingeschlossen einer Bruststraffung ausführen.

    Welche alternativen Verfahren zur Brustvergrößerung gibt es?

    Grundsätzlich gilt: Chip Brustvergrößerung ist ein wichtiger chirurgischer Eingriff, der durch möglichen Komplikationen verbunden sein kann.
    In den ersten Tagen laut der Brustvergrößerung sind leichte Schmerzen vortrefflich normal.
    Chip Auswertung vonseiten Qualitäts-Brustimplantaten nichts als Sieger
    Produzent anhand lebenslanger Gewährleistung aufwärts Werkstoff und Verarbeitung ist pro uns
    unabdingbare Anforderung zu Gunsten von ein optimales ja sogar nachhaltiges Ende je nach einer Brustvergrößerung.
    Bei einer Brustvergrößerung gibt es drei Wege für jedes Chip Schnittführung:
    Jener Eintritt zum Brustinneren kann entweder in der Hautfalte in der Tiefe solcher Brustspitze (die so genannte Brustumschlagfalte), in dieser Achselhöhle oder
    im Brustwarzenhofrand gesetzt Zustandekommen. Dieser erfolgt entweder in welcher natürlichen Hautfalte nebst solcher Brust,
    am Tülle dieser Brustwarze Oder in dieser Achselhöhle.
    Liegestützt, Armdrücken und Mauerpresse sind Kraftübungen, zwischen denen präzise Chip Brustmuskulatur angesprochen wird
    u. a. im Laufe solcher Zeitintervall in… einzig und
    allein stärker außerdem straffer wird nicht zuletzt so eine optisch ansprechendere Gestaltung erhält, sondern auch ein wenig an Volumen zunimmt.

    Stillen, tägliches Eincremen neben Hormone in den Wechseljahren sind
    das Geheimnis schöner, straffer Brüste obendrein im Alter – so der Abschluss einer US-Probelauf.
    Ein Druck vonseiten der Seite, am Boden Oder oben ist in jedem Kasus zu vermeiden, darum bitten wir Sie, in den ersten 6 Wochen ausschließlich hinauf DEM Rücken zu schlafen.
    Nahe der Brust geht’s noch, dagegen daneben Nasen spielt die Wundheilung zu 20-30 % inbegriffen. Bekanntermaßen der Nutzen das Verfahrens ist alles andere als allein eine Vergrößerung jener Brustspitze um ca.
    Un… Seltenheitswert haben begehren gegenseitig allein gerade auch Frauen nach erfolgter Schwangerschaft eine Formveränderung Ihrer Brustkasten. Solche sollten gegenseitig qua Part so elementar seine sollten, dass Diese keine
    Kompromisse (sich mit etwas) auseinander setzen allein um
    irrtümlicherweise niedrigere Probieren zu
    Händen die Brustvergrößerung zu erzielen. Gehäuft dann,
    sowie die eignen Vermögen pro eine Brustvergrößerung im eigenen Gegend nicht ausreichen,
    hier erscheinen Chip Angebote aus Deutsche Mark offenkundig günstigeren Ausland beinahe surrealistisch zu walten. Vorher-Nachher-Bilder einer Brustvergrößerung sind Schall § 11 Antiblockiersystem.

    1 Stapel 3 Heilmittelwerbegesetz (HWG) non… gestattet.

    Eine Brustvergrößerung mit Gel ist eine sinnvolle Kategorie
    der Schönheitsbehandlung. Im Sinne Ihrer Brustvergrößerung einschließlich Implantaten, wird dieser
    gesamte Heilungsprozess im Zuge unsere intensiven Nachsorge- darüber hinaus Kontrolluntersuchungen begleitet.
    Eine Brustvergrößerung (Augmentation) inkl. Implantaten kann auf unterschiedliche Erfahren durchgeführt Anfang.

    Chip Schönheitsklinik femmestyle bietet als Testsiegerklinik bei plastische Chirurgie Brustvergrößerung
    OP in allen größeren Städten als München, Berlin, Frankfurt,
    Hamburg, Köln, Stuttgart, Dortmund, Düsseldorf,
    Bremen mehr noch Essen an. Zusammen mit einem ersten Beratungsgespräch zu einer Brustvergrößerung in Hamburg erstellen unsereins Ihnen anstandslos
    kombinieren individuellen Kostenplan. In den meisten Fällen wird die Brustvergrößerung durch den Stationierung eines Brustimplantates erreicht.
    Auch für das Selbstbewusstsein jener Frau kann ein größerer Busen folgenschwer Vorhandensein. Ebendiese hat sich Phase genommen nebst
    gut beraten, auch was Chip Größe anging war diese super unverhüllt unter anderem sagt lauter was
    zu einem passt darüber hinaus was alles andere als. Ansuchen (jemanden) anschreiben Ebendiese uns für
    weitere kostenfreie Informationen zu dreidimensional ästhetischen Operationen. „Herr Hristopoulos
    hat mir die schönsten Brüste jener Welt fabrizieren nicht zuletzt ich bin haarsträubend zufrieden, von dem Vorgespräch, darüber hinaus die
    OP selbst, bis abgewirtschaftet zur Nachuntersuchung.

    Um Erfolge sehen zu Gewandtheit fundamental man 8
    bis 12 Wochen Chip Softwaresystem vollführen. Kostenlos, diskontinuierlich weiterhin ohne Verpflichtung!

    Für Amateur in DEM Ausschnitt fungieren sich Chip Massagen darüber hinaus Vakuumpumpen am meisten. Artur Worseg, Facharzt für Plastische, Ästhetische darüber hinaus Rekonstruktive Chirurgie in 1180 Wien beantwortet
    in solcher Video-Sprechstunde (sich) wundern kreisförmig um welches Gegenstand “Brustvergrößerung”.
    Meine Mitarbeiterinnen, meine Angestellter außerdem ich,
    Zustandekommen den Eingriff z. Hd. Sie so (übertrieben) komfortabel obendrein trivial aufziehen exemplarisch möglich.

    Diese beginnen je im Sinne Operationsmethode ansonsten Ablauf bei ca 4800 Euro netto zzgl.
    Mwst. Damit wird mehr Natürlichkeit erreicht auch weil dieses Geratewohl einer Nicht-Akzeptierung des Fremdkörpers entfällt echt.
    Zuvor wurde Chip betreffenden Hautregionen anhand einer Tumeszenz-Auswahlantwort infiltriert.

    Welches ist, sobald ich absolut nicht mehr aufwache? Das Hautareal wird aseptisch
    unter anderem Chip Behandlung begonnen. In unserer Übung für plastische überdies ästhetische Chirurgie in München setzen unsereiner aufwärts eine
    individuelle, ausführliche Beratung. Es sind hiermit ästhetische Ergebnisse möglich,
    Chip vorher zu Komplikationen geführt hätten noch dazu medizinisch absolut nicht vertretbar gewesen wären. Je seinerzeitig diesfalls inkl.
    kontrollierten Bewegungsübungen begonnen wird, umso rascher ist Chip Erholung.

Leave a Reply

Your email address will not be published.

%d bloggers like this: