Content
Seminar in Probability Theory
The Seminar in Probability Theory takes place during the semester, normally on Wednesday at 11:00.
Program FS 2019
Date/Time  Speaker  Title  Location 

27 Febuary 2019 
Mo Dick Wong University of Cambridge 
Universal tail profile of Gaussian multiplicative chaos >  Spiegelgasse 5 Room 05.002 
We study the tail probability of the mass of Gaussian
multiplicative chaos and establish
a formula for the leading order asymptotics under very mild assumptions,
resolving a recent conjecture of Rhodes and Vargas. The leading order
coefficient can be described by the product of two constants, one
capturing the dependence on the test set and any nonstationarity and
the other one encoding the universal properties of multiplicative chaos.
This may be seen as a first step in understanding the full
distributional properties of Gaussian multiplicative chaos.


20 March 2019 
David Belius University of Basel 
Theory of Deep Learning 1: Introduction to the main questions >
slides 
Spiegelgasse 5 Room 05.002 
This is the first talk in a five part series of talks on deep learning from a theoretical point of view, held jointly
between the probability theory and machine learning groups of the Department of Mathematics and Computer Science. The four
invited speakers that follow after this talk are young researchers who are contributing in different ways to what
will hopefully eventually be a comprehensive theory of deep neural networks.
In this first talk I will introduce the main theoretical questions about deep neural networks: 1. Representation  what can deep neural networks represent? 2. Optimization  why and under what circumstances can we successfully train neural networks? 3. Generalization  why do deep neural networks often generalize well, despite huge capacity? As a preface I will review the basic models and algorithms (Neural Networks, (stochastic) gradient descent, ...) and some important concepts from machine learning (capacity, overfitting/underfitting, generalization, ...). 

27 March 2019 
Levent Sagun EPFL 
Theory of Deep Learning 2: Overparametrization in neural networks: an overview and a definition >
slides 
Spiegelgasse 5 Room 05.002 
An excursion around the ideas for why the stochastic gradient descent algorithm works well on training deep neural networks leads to considerations about the underlying geometry of the related loss function. Recently, we gained a lot of insight into how tuning SGD leads to better or worse generalization properties on a given model and task. Furthermore, we have a reasonably large set of observations that lead to the conclusion that more parameters typically lead to better accuracies as long as the training process is not hampered. In this talk, I will speculatively argue that as long as the model is overparameterized (OP), all solutions are equivalent up to finite size fluctuations.
We will start by reviewing some of the recent literature on the geometry of the loss function, and how SGD navigates the landscape in the OP regime. Then we will see how to define OP by finding a sharp transition described by the models fitting abilities to its training set. Finally, we will discuss how this critical threshold is connected to the generalization properties of the model, and argue that life beyond this threshold is (more or less) as good as it gets. 

3 April 2019 
Arthur Jacot EPFL 
Theory of Deep Learning 3: Neural Tangent Kernel: Convergence and Generalization of Deep Neural Networks >
slides 
Spiegelgasse 5 Room 05.002 
We show that the behaviour of a Deep Neural Network (DNN) during gradient descent is described by a new kernel: the Neural Tangent Kernel (NTK). More precisely, as the parameters are trained using gradient descent, the network function (which maps the network inputs to the network outputs) follows a socalled kernel gradient descent w.r.t. the NTK. We prove that as the network layers get wider and wider, the NTK converges to a deterministic limit at initialization, which stays constant during training. This implies in particular that if the NTK is positive definite, the network function converges to a global minimum. The NTK also describes how DNNs generalise outside the training set: for a least squares cost, the network function converges in expectation to the NTK kernel ridgeless regression, explaining how DNNs generalise in the socalled overparametrized regime, which is at the heart of most recent developments in deep learning.


10 April 2019 
Lenaïc Chizat Université ParisSud 
Theory of Deep Learning 4: Training Neural Networks in the Lazy and Mean Field Regimes >
slides 
Spiegelgasse 5 Room 05.002 
The current successes achieved by neural networks are mostly driven by experimental exploration of various architectures, pipelines, and hyperparameters, motivated by intuition rather than precise theories. Focusing on the optimization/training aspect, we will see in this talk why pushing theory forward is challenging, but also why it matters and key insights it may lead to. We will review some recent results on the phenomenon of "lazy training", on the role of overparameterization, and on training neural networks with a single hidden layer.


15 April 2019 (Monday) 13:00 
Marylou Gabrié ENS 
Theory of Deep Learning 5: Information theoretic approach to deep learning theory: a test using statistical physics methods > slides  Spiegelgasse 5 Room 05.002 
The complexity of deep neural networks remains an obstacle to the understanding of their great efficiency. Their generalisation ability, a priori counter intuitive, is not yet fully accounted for. Recently an information theoretic approach was proposed to investigate this question.
Relying on the heuristic replica method from statistical physics we present an estimator for entropies and mutual informations in models of deep model networks. Using this new tool, we test numerically the relation between generalisation and information. 

TBA


8 May 2019 
Roland Bauerschmidt Universitiy of Cambridge 
The geometry of random walk isomorphisms >  Spiegelgasse 5 Room 05.002 
The classical random walk isomorphism theorems relate the
local time of a random walk to the square of a Gaussian free field. I
will present nonGaussian versions of these theorems, relating
hyperbolic and hemispherical sigma models (and their supersymmetric
versions) to nonMarkovian random walks interacting through their local
time. Applications include a short proof of the SabotTarres limiting
formula for the vertexreinforced jump process (VRJP) and a
MerminWagner theorem for hyperbolic sigma models and the VRJP. This is
joint work with Tyler Helmuth and Andrew Swan.


15 May 2019 
Augusto Teixeira IMPA 
Random walk on a simple exclusion process >  Spiegelgasse 5 Room 05.002 
In this talk we will study the asymptotic behavior of a random walk that
evolves on top of a simple symmetric exclusion process. This nice
example of a random walk on a dynamical random environment presents its
own challenges due to the slow mixing properties of the underlying
medium. We will discuss a law of large numbers that has been proved
recently for this random walk. Interestingly, we can only prove this law
of large numbers for all but two exceptional densities of the exclusion
process. The main technique that we have employed is a multiscale
renormalization that has been derived from works in percolation theory.


Monday 17 June 2019 11:00 
Shuta Nakajima University of Nagoya 
Gaussian fluctuations in directed polymers >  Spiegelgasse 5 Room 05.001 
In this talk, we consider the discrete directed polymer
model with i.i.d. environment and we study the fluctuations of the
partition function. It was proven by Comets and Liu that for
sufficiently high temperature, the fluctuations converge in
distribution towards the product of the limiting partition function
and an independent Gaussian random variable. We extend the result to
the whole L^2region, which is predicted to be the maximal
hightemperature region where the Gaussian fluctuations should occur
under the considered scaling. This is joint work with Clément Cosco.

Program HS 2018
Date/Time  Speaker  Title  Location 

6 September 2018 
Lisa Hartung New York University 
The Ginibre ensemble and Gaussian multiplicative
chaos >
It was proven by Rider and Virag that the logarithm of the characteristic
polynomial of the Ginibre ensemble converges to a logarithmically correlated random
field. In this talk we will see how this connection can be established on the level
if powers of the characteristic polynomial by proving convergence to Gaussian
multiplicative chaos. We consider the range of powers in the \(L^2\) phase.
(Joint work in progress with Paul Bourgade and Guillaume Dubach). 
Spiegelgasse 1 Room 00.003 
19 September 2018 
Alexander
Drewitz Universität Köln 
Ubiquity of phases in some percolation models with
longrange correlations >
We consider two fundamental percolation models with longrange correlations: The
Gaussian free field and (the vacant set) of Random Interlacements. Both models have
been the subject of intensive research during the last years and decades, on
\(\mathbb Z^d\) as well as on some more general graphs. We investigate some
structural percolative properties around their critical parameters, in particular
the ubiquity of the infinite components of complementary phases.
This talk is based on joint works with A. Prévost (Köln) and P.F. Rodriguez (BuressurYvette). 
Spiegelgasse 1 Room 00.003 
31 October 2018 
Anton Klimovsky Universität DuisburgEssen 
Highdimensional Gaussian fields with isotropic
increments seen through spin glasses >
Finding the (spaceheight) distribution of the (local) extrema of highdimensional
strongly correlated random fields is a notorious hard problem with many
applications. Following Fyodorov and Sommers (2007), we focus on the Gaussian
fields with isotropic increments and take the viewpoint of statistical physics. By
exploiting various probabilistic symmetries, we rigorously derive the
FyodorovSommers formula for the logpartition function in the highdimensional
limit. The formula suggests a rich picture for the distribution of the local
extrema akin to the celebrated spherical SherringtonKirkpatrick model with mixed
pspin interactions.

Spiegelgasse 1 Room 00.003 
7 November 2018 
Dominik
Schröder IST Austria 
Cusp Universality for Wignertype Random Matrices
>
For Wignertype matrices, i.e. Hermitian random matrices with independent, not
necessarily identically distributed entries above the diagonal, we show that at any
cusp singularity of the limiting eigenvalue distribution the local eigenvalue
statistics are universal and form a Pearcey process. Since the density of states
typically exhibits only square root or cubic root cusp singularities, our work
complements previous results on the bulk and edge universality and it thus
completes the resolution of the WignerDysonMehta universality conjecture for the
last remaining universality type.

Spiegelgasse 1 Room 00.003 
14 November 2018 
Marius Schmidt Universität Basel 
Oriented first passage percolation on the
hypercube >
Consider the hypercube as a graph with vertex set \({0,1}^N\) and edges between two
vertices if they are only one coordinate flip apart. Choosing independent standard
exponentially distributed lengths for all edges and asking how long the shortest
directed paths from \((0,..,0)\) to \((1,..,1)\) is defines oriented first passage
percolation on the hypercube. We will discuss the conceptual steps needed to answer
this question to the precision of extremal process following the two paper series
"Oriented first passage percolation in the mean field limit" by Nicola Kistler,
Adrien Schertzer and Marius A. Schmidt: arXiv:1804.03117 [math.PR] and
arXiv:1808.04598 [math.PR].

Spiegelgasse 1 Room 00.003 
21 November 2018 
Antti Knowles University of Geneva 
Local law and eigenvector delocalization for
supercritical ErdosRenyi graphs >
We consider the adjacency matrix of the ErdosRenyi graph \(G(N,p)\) in the
supercritical regime \(pN > C \log N\) for some universal constant C. We show that
the eigenvalue density is with high probability well approximated by the semicircle
law on all spectral scales larger than the typical eigenvalue spacing. We also show
that all eigenvectors are completely delocalized with high probability. Both
results are optimal in the sense that they are known to be false for \(pN < \log N\).
A key ingredient of the proof is a new family of large deviation estimates for
multilinear forms of sparse vectors. Joint work with Yukun He and Matteo Marcozzi.

Spiegelgasse 1 Room 00.003 
28 November 2018 
Gaultier
Lambert University of Zurich 
How much can the eigenvalue of a random matrix
fluctuate? >
The goal of this talk is to explain how much the eigenvalues of large Hermitian
random matrices deviate from certain deterministic locations. These are known as
“rigidity estimates” in the literature and they play a crucial role in the proof of
universality. I will review some of the current results on eigenvalues’
fluctuations and present a new approach which relies on the theory of Gaussian
Multiplicative Chaos and leads to optimal rigidity estimates for the Gaussian
Unitary Ensemble. I will also mention how it is also deduce a central limit theorem
from our proof.
This is joint work with Tom Claeys, Benjamin Fahs and Christian
Webb.

Spiegelgasse 1 Room 00.003 
12 December 2018 
Ioan
Manulescu University of Fribourg 
Uniform Lipschitz functions on the triangular lattice
have logarithmic variations >
Uniform integervalued Lipschitz functions on a finite domain of the triangular
lattice are shown to have variations of logarithmic order in the radius of the
domain. The level lines of such functions form a loop \(O(2)\) model on the edges of
the hexagonal lattice with edgeweight one. An infinitevolume Gibbs measure for
the loop \(O(2)\) model is constructed as a thermodynamic limit and is shown to be
unique. It contains only finite loops and has properties indicative of
scaleinvariance: macroscopic loops appearing at every scale. The existence of the
infinitevolume measure carries over to height functions pinned at 0; the
uniqueness of the Gibbs measure does not. The proof is based on a representation of
the loop \(O(2)\) model via a pair of spin configurations that are shown to satisfy the
FKG inequality. We prove RSWtype estimates for a certain connectivity notion in
the aforementioned spin model.
Based on joint work with Alexander Glazman.

Spiegelgasse 1 Room 00.003 