Learning Theory from First Principles

Dekorationsartikel gehören nicht zum Leistungsumfang.

Sprache: Englisch

98,50 €*

inkl. MwSt.

Versandkostenfrei per Post / DHL

Aktuell nicht verfügbar

Kategorien:

Der Artikel ist nicht in deutscher Sprache verfasst.
Die Sprache der Produktbeschreibung und Details kann in Englisch angegeben sein. Sollten Sie Verständnisprobleme haben, wenden Sie sich gerne an uns.

Beschreibung

"The aim of this book is to provide the simplest formulations that can be derived "from first principles" with simple arguments"--

Über den Autor

Francis Bach

Inhaltsverzeichnis

Preface ix
I Preliminaries 1
1 Mathematical preliminaries 3
1.1 Linear algebra and differentiable calculus.................. 3
1.1.1 Minimization of quadratic forms................... 3
1.1.2 Inverting a 2 × 2 matrix........................ 4
1.1.3 Inverting matrices defined by blocks, matrix inversion lemma... 4
1.1.4 Eigenvalue and singular value decomposition............ 6
1.1.5 Differential calculus.......................... 7
1.2 Concentration inequalities........................... 7
1.2.1 Hoeffding’s inequality......................... 9
1.2.2 McDiarmid’s inequality........................ 12
1.2.3 Bernstein’s inequality (_x0007_)....................... 14
1.2.4 Expectation of the maximum..................... 16
1.2.5 Estimation of expectations through quadrature (_x0007_)......... 17
1.2.6 Concentration inequalities for matrices (_x0007__x0007_)............. 18
2 Introduction to supervised learning 21
2.1 From training data to predictions....................... 22
2.2 Decision theory................................. 25
2.2.1 Loss functions............................. 25
2.2.2 Risks................................... 26
2.2.3 Bayes risk and Bayes predictor.................... 27
2.3 Learning from data............................... 30
2.3.1 Local averaging............................. 30
2.3.2 Empirical risk minimization...................... 31
2.4 Statistical learning theory........................... 33
2.4.1 Measures of performance....................... 35
2.4.2 Notions of consistency over classes of problems........... 35
2.5 No free lunch theorems (_x0007_).......................... 36
2.6 Quest for adaptivity.............................. 38
2.7 Beyond supervised learning.......................... 39
2.8 Summary - book outline............................ 40
3 Linear least-squares regression 43
3.1 Introduction................................... 43
3.2 Least-squares framework............................ 44
3.3 Ordinary least-squares (OLS) estimator................... 45
3.3.1 Closed-form solution.......................... 45
3.3.2 Geometric interpretation........................ 46
3.3.3 Numerical resolution.......................... 47
3.4 Statistical analysis of OLS........................... 47
3.5 Fixed design setting.............................. 48
3.5.1 Statistical properties of the OLS estimator............. 50
3.5.2 Experiments.............................. 52
3.6 Ridge least-squares regression......................... 53
3.7 Lower-bound (_x0007_)................................ 57
3.8 Random design analysis............................ 59
3.8.1 Gaussian designs............................ 61
3.8.2 General designs (_x0007__x0007_)......................... 61
3.9 Principal component analysis (_x0007_)....................... 63
3.10 Conclusion................................... 65
II Generalization bounds for learning algorithms 67
4 Empirical risk minimization 69
4.1 Convexification of the risk........................... 70
4.1.1 Convex surrogates........................... 71
4.1.2 Geometric interpretation of the support vector machine (_x0007_).... 72
4.1.3 Conditional Φ-risk and classification calibration (_x0007_)........ 74
4.1.4 Relationship between risk and Φ-risk (_x0007__x0007_).............. 76
4.2 Risk minimization decomposition....................... 80
4.3 Approximation error.............................. 80
4.4 Estimation error................................ 81
4.4.1 Application of McDiarmid’s inequality................ 82
4.4.2 Easy case I: quadratic functions.................... 83
4.4.3 Easy case II: Finite number of models................ 84
4.4.4 Beyond finitely many models through covering numbers (_x0007_)... 84
4.5 Rademacher complexity............................ 86
4.5.1 Symmetrization............................. 87
4.5.2 Lipschitz-continuous losses...................... 89
4.5.3 Ball-constrained linear predictions.................. 91
4.5.4 Putting things together (linear predictions)............. 92
4.5.5 From constrained to regularized estimation (_x0007_)........... 93
4.5.6 Extensions and improvements..................... 96
4.6 Model selection (_x0007_)............................... 98
4.6.1 Structural risk minimization...................... 98
4.6.2 Selection based on validation set................... 99
4.7 Relationship with asymptotic statistics (_x0007_)................. 99
4.8 Summary.................................... 101
5 Optimization for machine learning 103
5.1 Optimization in machine learning....................... 103
5.2 Gradient descent................................ 105
5.2.1 Simplest analysis: ordinary least-squares............... 106
5.2.2 Convex functions and their properties................ 110
5.2.3 Analysis of GD for strongly convex and smooth functions..... 112
5.2.4 Analysis of GD for convex and smooth functions (_x0007_)....... 117
5.2.5 Beyond gradient descent (_x0007_)..................... 120
5.2.6 Non-convex objective functions (_x0007_)................. 122
5.3 Gradient methods on non-smooth problems................. 123
5.4 Convergence rate of stochastic gradient descent (SGD)........... 127
5.4.1 Strongly convex problems (_x0007_).................... 132
5.4.2 Adaptive methods (_x0007_)......................... 134
5.4.3 Bias-variance trade-offs for least-squares (_x0007_)............. 135
5.4.4 Variance reduction (_x0007_)........................ 138
5.5 Conclusion................................... 143
6 Local averaging methods 145
6.1 Introduction................................... 145
6.2 Local averaging methods............................ 147
6.2.1 Linear estimators............................ 147
6.2.2 Partition estimators.......................... 148
6.2.3 Nearest-neighbors........................... 150
6.2.4 Nadaraya-Watson estimator a.k.a. kernel regression (_x0007_)...... 151
6.3 Generic “simplest” consistency analysis................... 153
6.3.1 Fixed partition............................. 155
6.3.2 k-nearest neighbor........................... 158
6.3.3 Kernel regression (Nadaraya-Watson) (_x0007_).............. 160
6.4 Universal consistency (_x0007_)........................... 163
6.5 Adaptivity (_x0007__x0007_)................................ 166
6.6 Conclusion................................... 167
7 Kernel methods 169
7.1 Introduction................................... 170
7.2 Representer theorem.............................. 170
7.3 Kernels..................................... 173
7.3.1 Linear and polynomial kernels.................... 175
7.3.2 Translation-invariant kernels on [0, 1]................. 176
7.3.3 Translation-invariant kernels on Rd.................. 179
7.3.4 Beyond vectorial input spaces (_x0007_).................. 182
7.4 Algorithms................................... 184
7.4.1 Representer theorem.......................... 184
7.4.2 Column sampling............................ 185
7.4.3 Random features............................ 185
7.4.4 Dual algorithms (_x0007_).......................... 186
7.4.5 Stochastic gradient descent (_x0007_).................... 187
7.4.6 “Kernelization” of linear algorithms................. 188
7.5 Generalization guarantees - Lipschitz-continuous losses........... 189
7.5.1 Risk decomposition........................... 190
7.5.2 Approximation error for translation-invariant kernels on Rd.... 191
7.6 Theoretical analysis of ridge regression (_x0007_)................. 194
7.6.1 Kernel ridge regression as a “linear” estimator........... 194
7.6.2 Bias and variance decomposition (_x0007_)................ 195
7.6.3 Relating empirical and population covariance operators...... 198
7.6.4 Analysis for well-specified problems (_x0007_)............... 200
7.6.5 Analysis beyond well-specified problems (_x0007_)............. 201
7.6.6 Balancing bias and variance (_x0007_)................... 202
7.7 Experiments................................... 203
7.8 Conclusion................................... 205
8 Sparse methods 207
8.1 Introduction................................... 207
8.1.1 Dedicated proof technique for constrained least-squares...... 209
8.1.2 Probabilistic and combinatorial lemmas............... 210
8.2 Variable selection by the ℓ0-penalty...................... 212
8.2.1 Assuming k is known.......................... 212
8.2.2 Estimating k (_x0007_)............................ 214
8.3 Variable selection by ℓ1-regularization.................... 216
8.3.1 Intuition and algorithms........................ 217
8.3.2 Slow rates - random design...................... 221
8.3.3 Slow rates - fixed design........................ 221
8.3.4 Fast rates (_x0007_).............................. 224
8.3.5 Zoo of conditions (_x0007__x0007_)......................... 225
8.3.6 Random design (_x0007_)........................... 227
8.4 Experiments................................... 228
8.5 Extensions.................................... 229
8.6 Conclusion................................... 230
9 Neural networks 233
9.1 Introduction................................... 233
9.2 Single hidden layer neural network...................... 235
9.2.1 Optimization.............................. 236
9.2.2 Rectified linear units and homogeneity................ 237
9.2.3 Estimation error............................ 239
9.3 Approximation properties........................... 241
9.3.1 Universal approximation property in one dimension........ 242
9.3.2 Infinitely many neurons and variation norm............. 243
9.3.3 Variation norm...

Details

Erscheinungsjahr:	2024
Fachbereich:	EDV
Genre:	Importe , Informatik
Rubrik:	Naturwissenschaften & Technik
Thema:	Lexika
Medium:	Buch
ISBN-13:	9780262049443
ISBN-10:	0262049449
Sprache:	Englisch
Einband:	Gebunden
Autor:	Bach, Francis
Hersteller:	MIT Press Ltd
Verantwortliche Person für die EU:	Produktsicherheitsverantwortliche/r, Europaallee 1, D-36244 Bad Hersfeld, gpsr@libri.de
Maße:	188 x 243 x 38 mm
Von/Mit:	Francis Bach
Erscheinungsdatum:	24.12.2024
Gewicht:	1,104 kg

Artikel-ID: 129782026

Über den Autor

Francis Bach

Inhaltsverzeichnis

Details

Erscheinungsjahr:	2024
Fachbereich:	EDV
Genre:	Importe , Informatik
Rubrik:	Naturwissenschaften & Technik
Thema:	Lexika
Medium:	Buch
ISBN-13:	9780262049443
ISBN-10:	0262049449
Sprache:	Englisch
Einband:	Gebunden
Autor:	Bach, Francis
Hersteller:	MIT Press Ltd
Verantwortliche Person für die EU:	Produktsicherheitsverantwortliche/r, Europaallee 1, D-36244 Bad Hersfeld, gpsr@libri.de
Maße:	188 x 243 x 38 mm
Von/Mit:	Francis Bach
Erscheinungsdatum:	24.12.2024
Gewicht:	1,104 kg

Artikel-ID: 129782026

Sicherheitshinweis

Ähnliche Produkte