68,85 €*
Versandkostenfrei per Post / DHL
Lieferzeit 1-2 Wochen
In the MCA Microsoft Certified Associate Azure Data Engineer Study Guide: Exam DP-203, accomplished data engineer and tech educator Benjamin Perkins delivers a hands-on, practical guide to preparing for the challenging Azure Data Engineer certification and for a new career in an exciting and growing field of tech.
In the book, you'll explore all the objectives covered on the DP-203 exam while learning the job roles and responsibilities of a newly minted Azure data engineer. From integrating, transforming, and consolidating data from various structured and unstructured data systems into a structure that is suitable for building analytics solutions, you'll get up to speed quickly and efficiently with Sybex's easy-to-use study aids and tools.
This Study Guide also offers:
* Career-ready advice for anyone hoping to ace their first data engineering job interview and excel in their first day in the field
* Indispensable tips and tricks to familiarize yourself with the DP-203 exam structure and help reduce test anxiety
* Complimentary access to Sybex's expansive online study tools, accessible across multiple devices, and offering access to hundreds of bonus practice questions, electronic flashcards, and a searchable, digital glossary of key terms
A one-of-a-kind study aid designed to help you get straight to the crucial material you need to succeed on the exam and on the job, the MCA Microsoft Certified Associate Azure Data Engineer Study Guide: Exam DP-203 belongs on the bookshelves of anyone hoping to increase their data analytics skills, advance their data engineering career with an in-demand certification, or hoping to make a career change into a popular new area of tech.
In the MCA Microsoft Certified Associate Azure Data Engineer Study Guide: Exam DP-203, accomplished data engineer and tech educator Benjamin Perkins delivers a hands-on, practical guide to preparing for the challenging Azure Data Engineer certification and for a new career in an exciting and growing field of tech.
In the book, you'll explore all the objectives covered on the DP-203 exam while learning the job roles and responsibilities of a newly minted Azure data engineer. From integrating, transforming, and consolidating data from various structured and unstructured data systems into a structure that is suitable for building analytics solutions, you'll get up to speed quickly and efficiently with Sybex's easy-to-use study aids and tools.
This Study Guide also offers:
* Career-ready advice for anyone hoping to ace their first data engineering job interview and excel in their first day in the field
* Indispensable tips and tricks to familiarize yourself with the DP-203 exam structure and help reduce test anxiety
* Complimentary access to Sybex's expansive online study tools, accessible across multiple devices, and offering access to hundreds of bonus practice questions, electronic flashcards, and a searchable, digital glossary of key terms
A one-of-a-kind study aid designed to help you get straight to the crucial material you need to succeed on the exam and on the job, the MCA Microsoft Certified Associate Azure Data Engineer Study Guide: Exam DP-203 belongs on the bookshelves of anyone hoping to increase their data analytics skills, advance their data engineering career with an in-demand certification, or hoping to make a career change into a popular new area of tech.
ABOUT THE AUTHOR
Benjamin Perkins is currently employed at Microsoft in Munich, Germany, as a Senior Escalation Engineer on the Azure team. He is a C# programming expert and cloud engineer who has been working professionally in the IT industry for almost three decades. His roles in IT have spanned the entire spectrum including programmer, system architect, technical support engineer, team leader, and mid-level management. While employed at Hewlett-Packard and Compaq Computer Corporation, he received numerous awards, degrees, and certifications.
Introduction xxvii
Part I Azure Data Engineer Certification and Azure Products 1
Chapter 1 Gaining the Azure Data Engineer Associate Certification 3
The Journey to Certification 7
How to Pass Exam DP- 203 8
Understanding the Exam Expectations and Requirements 9
Use Azure Daily 17
Read Azure Articles to Stay Current 17
Have an Understanding of All Azure Products 20
Azure Product Name Recognition 21
Azure Data Analytics 23
Azure Synapse Analytics 23
Azure Databricks 26
Azure HDInsight 28
Azure Analysis Services 30
Azure Data Factory 31
Azure Event Hubs 33
Azure Stream Analytics 34
Other Products 35
Azure Storage Products 36
Azure Data Lake Storage 37
Azure Storage 40
Other Products 42
Azure Databases 43
Azure Cosmos DB 43
Azure SQL Server Products 46
Additional Azure Databases 46
Other Products 47
Azure Security 48
Azure Active Directory 48
Role- Based Access Control 51
Attribute- Based Access Control 53
Azure Key Vault 53
Other Products 55
Azure Networking 56
Virtual Networks 56
Other Products 59
Azure Compute 59
Azure Virtual Machines 59
Azure Virtual Machine Scale Sets 60
Azure App Service Web Apps 60
Azure Functions 60
Azure Batch 60
Azure Management and Governance 60
Azure Monitor 61
Azure Purview 61
Azure Policy 62
Azure Blueprints (Preview) 62
Azure Lighthouse 62
Azure Cost Management and Billing 62
Other Products 63
Summary 64
Exam Essentials 64
Review Questions 66
Chapter 2 CREATE DATABASE dbName; GO 69
The Brainjammer 70
A Historical Look at Data 71
Variety 73
Velocity 74
Volume 74
Data Locations 74
Data File Formats 75
Data Structures, Types, and Concepts 83
Data Structures 83
Data Types and Management 92
Data Concepts 95
Data Programming and Querying for Data Engineers 125
Data Programming 126
Querying Data 143
Understanding Big Data Processing 169
Big Data Stages 169
Etl, Elt, Eltl 174
Analytics Types 175
Big Data Layers 176
Summary 177
Exam Essentials 177
Review Questions 179
Part II Design and Implement Data Storage 181
Chapter 3 Data Sources and Ingestion 183
Where Does Data Come From? 185
Design a Data Storage Structure 189
Design an Azure Data Lake Solution 190
Recommended File Types for Storage 198
Recommended File Types for Analytical Queries 199
Design for Efficient Querying 200
Design for Data Pruning 203
Design a Folder Structure That Represents the Levels of Data Transformation 203
Design a Distribution Strategy 205
Design a Data Archiving Solution 206
Design a Partition Strategy 207
Design a Partition Strategy for Files 209
Design a Partition Strategy for Analytical Workloads 210
Design a Partition Strategy for Efficiency and Performance 211
Design a Partition Strategy for Azure Synapse Analytics 211
Identify When Partitioning Is Needed in Azure Data Lake Storage Gen 2 212
Design the Serving/Data Exploration Layer 213
Design Star Schemas 214
Design Slowly Changing Dimensions 215
Design a Dimensional Hierarchy 219
Design a Solution for Temporal Data 220
Design for Incremental Loading 222
Design Analytical Stores 223
Design Metastores in Azure Synapse Analytics and Azure Databricks 224
The Ingestion of Data into a Pipeline 228
Azure Synapse Analytics 228
Azure Data Factory 268
Azure Databricks 275
Event Hubs and IoT Hub 301
Azure Stream Analytics 303
Apache Kafka for HDInsight 314
Migrating and Moving Data 316
Summary 317
Exam Essentials 317
Review Questions 319
Chapter 4 The Storage of Data 321
Implement Physical Data Storage Structures 322
Implement Compression 322
Implement Partitioning 325
Implement Sharding 328
Implement Different Table Geometries with Azure Synapse Analytics Pools 329
Implement Data Redundancy 331
Implement Distributions 341
Implement Data Archiving 342
Azure Synapse Analytics Develop Hub 346
Implement Logical Data Structures 360
Build a Temporal Data Solution 361
Build a Slowly Changing Dimension 365
Build a Logical Folder Structure 368
Build External Tables 369
Implement File and Folder Structures for Efficient Querying and Data Pruning 372
Implement a Partition Strategy 375
Implement a Partition Strategy for Files 376
Implement a Partition Strategy for Analytical Workloads 377
Implement a Partition Strategy for Streaming Workloads 378
Implement a Partition Strategy for Azure Synapse Analytics 378
Design and Implement the Data Exploration Layer 379
Deliver Data in a Relational Star Schema 379
Deliver Data in Parquet Files 385
Maintain Metadata 386
Implement a Dimensional Hierarchy 386
Create and Execute Queries by Using a Compute Solution That Leverages SQL Serverless and Spark Cluster 388
Recommend Azure Synapse Analytics Database Templates 389
Implement Azure Synapse Analytics Database Templates 389
Additional Data Storage Topics 390
Storing Raw Data in Azure Databricks for Transformation 390
Storing Data Using Azure HDInsight 392
Storing Prepared, Trained, and Modeled Data 393
Summary 394
Exam Essentials 395
Review Questions 396
Part III Develop Data Processing 399
Chapter 5 Transform, Manage, and Prepare Data 401
Chapter 6 Ingest and Transform Data 402
Transform Data Using Azure Synapse Pipelines 404
Transform Data Using Azure Data Factory 410
Transform Data Using Apache Spark 414
Transform Data Using Transact- SQL 429
Transform Data Using Stream Analytics 431
Cleanse Data 433
Split Data 435
Shred JSON 439
Encode and Decode Data 445
Configure Error Handling for the Transformation 450
Normalize and Denormalize Values 451
Transform Data by Using Scala 461
Perform Exploratory Data Analysis 463
Transformation and Data Management Concepts 473
Transformation 473
Data Management 480
Azure Databricks 481
Data Modeling and Usage 485
Data Modeling with Machine Learning 486
Usage 494
Summary 500
Exam Essentials 500
Review Questions 502
Create and Manage Batch Processing and Pipelines 505
Design and Develop a Batch Processing Solution 507
Design a Batch Processing Solution 510
Develop Batch Processing Solutions 512
Create Data Pipelines 538
Handle Duplicate Data 560
Handle Missing Data 569
Handle Late- Arriving Data 571
Upsert Data 572
Configure the Batch Size 578
Configure Batch Retention 581
Design and Develop Slowly Changing Dimensions 582
Design and Implement Incremental Data Loads 583
Integrate Jupyter/IPython Notebooks into a Data Pipeline 590
Chapter 7 Revert Data to a Previous State 591
Handle Security and Compliance Requirements 592
Design and Create Tests for Data Pipelines 593
Scale Resources 593
Design and Configure Exception Handling 593
Debug Spark Jobs Using the Spark UI 594
Implement Azure Synapse Link and Query the Replicated Data 594
Use PolyBase to Load Data to a SQL Pool 595
Read from and Write to a Delta Table 595
Manage Batches and Pipelines 596
Trigger Batches 597
Schedule Data Pipelines 597
Validate Batch Loads 598
Implement Version Control for Pipeline Artifacts 604
Manage Data Pipelines 607
Manage Spark Jobs in a Pipeline 609
Handle Failed Batch Loads 610
Summary 610
Exam Essentials 611
Review Questions 612
Design and Implement a Data Stream Processing Solution 615
Develop a Stream Processing Solution 617
Design a Stream Processing Solution 618
Create a Stream Processing Solution 630
Process Time Series Data 657
Design and Create Windowed Aggregates 658
Process Data Within One Partition 661
Process Data Across Partitions 663
Upsert Data 665
Handle Schema Drift 674
Configure Checkpoints/Watermarking During Processing 680
Replay Archived Stream Data 685
Design and Create Tests for Data Pipelines 688
Monitor for Performance and Functional Regressions 689
Optimize Pipelines for Analytical or Transactional Purposes 689
Scale Resources 690
Design and Configure Exception Handling 691
Handle Interruptions 694
Ingest and Transform Data 694
Transform Data Using Azure Stream Analytics 694
Monitor Data Storage and Data Processing 695
Monitor Stream Processing 695
Summary 695
Exam Essentials 696
Review Questions 697
Part IV Secure, Monitor, and Optimize Data Storage and Data Processing 699
Chapter 8 Keeping Data Safe and Secure 701
Design Security for Data Policies and Standards 702
Design a Data Auditing Strategy 711
Design a Data Retention Policy 716
Design for Data Privacy 717
Design to Purge Data Based on Business Requirements 719
Design Data Encryption for Data at Rest and in Transit 719
...Erscheinungsjahr: | 2023 |
---|---|
Genre: | Importe, Mathematik |
Rubrik: | Naturwissenschaften & Technik |
Medium: | Taschenbuch |
Inhalt: | 1008 S. |
ISBN-13: | 9781119885429 |
ISBN-10: | 1119885426 |
Sprache: | Englisch |
Einband: | Kartoniert / Broschiert |
Autor: | Perkins, Benjamin |
Hersteller: | John Wiley & Sons Inc |
Verantwortliche Person für die EU: | Wiley-VCH GmbH, Boschstr. 12, D-69469 Weinheim, amartine@wiley-vch.de |
Maße: | 234 x 186 x 53 mm |
Von/Mit: | Benjamin Perkins |
Erscheinungsdatum: | 06.09.2023 |
Gewicht: | 1,832 kg |
ABOUT THE AUTHOR
Benjamin Perkins is currently employed at Microsoft in Munich, Germany, as a Senior Escalation Engineer on the Azure team. He is a C# programming expert and cloud engineer who has been working professionally in the IT industry for almost three decades. His roles in IT have spanned the entire spectrum including programmer, system architect, technical support engineer, team leader, and mid-level management. While employed at Hewlett-Packard and Compaq Computer Corporation, he received numerous awards, degrees, and certifications.
Introduction xxvii
Part I Azure Data Engineer Certification and Azure Products 1
Chapter 1 Gaining the Azure Data Engineer Associate Certification 3
The Journey to Certification 7
How to Pass Exam DP- 203 8
Understanding the Exam Expectations and Requirements 9
Use Azure Daily 17
Read Azure Articles to Stay Current 17
Have an Understanding of All Azure Products 20
Azure Product Name Recognition 21
Azure Data Analytics 23
Azure Synapse Analytics 23
Azure Databricks 26
Azure HDInsight 28
Azure Analysis Services 30
Azure Data Factory 31
Azure Event Hubs 33
Azure Stream Analytics 34
Other Products 35
Azure Storage Products 36
Azure Data Lake Storage 37
Azure Storage 40
Other Products 42
Azure Databases 43
Azure Cosmos DB 43
Azure SQL Server Products 46
Additional Azure Databases 46
Other Products 47
Azure Security 48
Azure Active Directory 48
Role- Based Access Control 51
Attribute- Based Access Control 53
Azure Key Vault 53
Other Products 55
Azure Networking 56
Virtual Networks 56
Other Products 59
Azure Compute 59
Azure Virtual Machines 59
Azure Virtual Machine Scale Sets 60
Azure App Service Web Apps 60
Azure Functions 60
Azure Batch 60
Azure Management and Governance 60
Azure Monitor 61
Azure Purview 61
Azure Policy 62
Azure Blueprints (Preview) 62
Azure Lighthouse 62
Azure Cost Management and Billing 62
Other Products 63
Summary 64
Exam Essentials 64
Review Questions 66
Chapter 2 CREATE DATABASE dbName; GO 69
The Brainjammer 70
A Historical Look at Data 71
Variety 73
Velocity 74
Volume 74
Data Locations 74
Data File Formats 75
Data Structures, Types, and Concepts 83
Data Structures 83
Data Types and Management 92
Data Concepts 95
Data Programming and Querying for Data Engineers 125
Data Programming 126
Querying Data 143
Understanding Big Data Processing 169
Big Data Stages 169
Etl, Elt, Eltl 174
Analytics Types 175
Big Data Layers 176
Summary 177
Exam Essentials 177
Review Questions 179
Part II Design and Implement Data Storage 181
Chapter 3 Data Sources and Ingestion 183
Where Does Data Come From? 185
Design a Data Storage Structure 189
Design an Azure Data Lake Solution 190
Recommended File Types for Storage 198
Recommended File Types for Analytical Queries 199
Design for Efficient Querying 200
Design for Data Pruning 203
Design a Folder Structure That Represents the Levels of Data Transformation 203
Design a Distribution Strategy 205
Design a Data Archiving Solution 206
Design a Partition Strategy 207
Design a Partition Strategy for Files 209
Design a Partition Strategy for Analytical Workloads 210
Design a Partition Strategy for Efficiency and Performance 211
Design a Partition Strategy for Azure Synapse Analytics 211
Identify When Partitioning Is Needed in Azure Data Lake Storage Gen 2 212
Design the Serving/Data Exploration Layer 213
Design Star Schemas 214
Design Slowly Changing Dimensions 215
Design a Dimensional Hierarchy 219
Design a Solution for Temporal Data 220
Design for Incremental Loading 222
Design Analytical Stores 223
Design Metastores in Azure Synapse Analytics and Azure Databricks 224
The Ingestion of Data into a Pipeline 228
Azure Synapse Analytics 228
Azure Data Factory 268
Azure Databricks 275
Event Hubs and IoT Hub 301
Azure Stream Analytics 303
Apache Kafka for HDInsight 314
Migrating and Moving Data 316
Summary 317
Exam Essentials 317
Review Questions 319
Chapter 4 The Storage of Data 321
Implement Physical Data Storage Structures 322
Implement Compression 322
Implement Partitioning 325
Implement Sharding 328
Implement Different Table Geometries with Azure Synapse Analytics Pools 329
Implement Data Redundancy 331
Implement Distributions 341
Implement Data Archiving 342
Azure Synapse Analytics Develop Hub 346
Implement Logical Data Structures 360
Build a Temporal Data Solution 361
Build a Slowly Changing Dimension 365
Build a Logical Folder Structure 368
Build External Tables 369
Implement File and Folder Structures for Efficient Querying and Data Pruning 372
Implement a Partition Strategy 375
Implement a Partition Strategy for Files 376
Implement a Partition Strategy for Analytical Workloads 377
Implement a Partition Strategy for Streaming Workloads 378
Implement a Partition Strategy for Azure Synapse Analytics 378
Design and Implement the Data Exploration Layer 379
Deliver Data in a Relational Star Schema 379
Deliver Data in Parquet Files 385
Maintain Metadata 386
Implement a Dimensional Hierarchy 386
Create and Execute Queries by Using a Compute Solution That Leverages SQL Serverless and Spark Cluster 388
Recommend Azure Synapse Analytics Database Templates 389
Implement Azure Synapse Analytics Database Templates 389
Additional Data Storage Topics 390
Storing Raw Data in Azure Databricks for Transformation 390
Storing Data Using Azure HDInsight 392
Storing Prepared, Trained, and Modeled Data 393
Summary 394
Exam Essentials 395
Review Questions 396
Part III Develop Data Processing 399
Chapter 5 Transform, Manage, and Prepare Data 401
Chapter 6 Ingest and Transform Data 402
Transform Data Using Azure Synapse Pipelines 404
Transform Data Using Azure Data Factory 410
Transform Data Using Apache Spark 414
Transform Data Using Transact- SQL 429
Transform Data Using Stream Analytics 431
Cleanse Data 433
Split Data 435
Shred JSON 439
Encode and Decode Data 445
Configure Error Handling for the Transformation 450
Normalize and Denormalize Values 451
Transform Data by Using Scala 461
Perform Exploratory Data Analysis 463
Transformation and Data Management Concepts 473
Transformation 473
Data Management 480
Azure Databricks 481
Data Modeling and Usage 485
Data Modeling with Machine Learning 486
Usage 494
Summary 500
Exam Essentials 500
Review Questions 502
Create and Manage Batch Processing and Pipelines 505
Design and Develop a Batch Processing Solution 507
Design a Batch Processing Solution 510
Develop Batch Processing Solutions 512
Create Data Pipelines 538
Handle Duplicate Data 560
Handle Missing Data 569
Handle Late- Arriving Data 571
Upsert Data 572
Configure the Batch Size 578
Configure Batch Retention 581
Design and Develop Slowly Changing Dimensions 582
Design and Implement Incremental Data Loads 583
Integrate Jupyter/IPython Notebooks into a Data Pipeline 590
Chapter 7 Revert Data to a Previous State 591
Handle Security and Compliance Requirements 592
Design and Create Tests for Data Pipelines 593
Scale Resources 593
Design and Configure Exception Handling 593
Debug Spark Jobs Using the Spark UI 594
Implement Azure Synapse Link and Query the Replicated Data 594
Use PolyBase to Load Data to a SQL Pool 595
Read from and Write to a Delta Table 595
Manage Batches and Pipelines 596
Trigger Batches 597
Schedule Data Pipelines 597
Validate Batch Loads 598
Implement Version Control for Pipeline Artifacts 604
Manage Data Pipelines 607
Manage Spark Jobs in a Pipeline 609
Handle Failed Batch Loads 610
Summary 610
Exam Essentials 611
Review Questions 612
Design and Implement a Data Stream Processing Solution 615
Develop a Stream Processing Solution 617
Design a Stream Processing Solution 618
Create a Stream Processing Solution 630
Process Time Series Data 657
Design and Create Windowed Aggregates 658
Process Data Within One Partition 661
Process Data Across Partitions 663
Upsert Data 665
Handle Schema Drift 674
Configure Checkpoints/Watermarking During Processing 680
Replay Archived Stream Data 685
Design and Create Tests for Data Pipelines 688
Monitor for Performance and Functional Regressions 689
Optimize Pipelines for Analytical or Transactional Purposes 689
Scale Resources 690
Design and Configure Exception Handling 691
Handle Interruptions 694
Ingest and Transform Data 694
Transform Data Using Azure Stream Analytics 694
Monitor Data Storage and Data Processing 695
Monitor Stream Processing 695
Summary 695
Exam Essentials 696
Review Questions 697
Part IV Secure, Monitor, and Optimize Data Storage and Data Processing 699
Chapter 8 Keeping Data Safe and Secure 701
Design Security for Data Policies and Standards 702
Design a Data Auditing Strategy 711
Design a Data Retention Policy 716
Design for Data Privacy 717
Design to Purge Data Based on Business Requirements 719
Design Data Encryption for Data at Rest and in Transit 719
...Erscheinungsjahr: | 2023 |
---|---|
Genre: | Importe, Mathematik |
Rubrik: | Naturwissenschaften & Technik |
Medium: | Taschenbuch |
Inhalt: | 1008 S. |
ISBN-13: | 9781119885429 |
ISBN-10: | 1119885426 |
Sprache: | Englisch |
Einband: | Kartoniert / Broschiert |
Autor: | Perkins, Benjamin |
Hersteller: | John Wiley & Sons Inc |
Verantwortliche Person für die EU: | Wiley-VCH GmbH, Boschstr. 12, D-69469 Weinheim, amartine@wiley-vch.de |
Maße: | 234 x 186 x 53 mm |
Von/Mit: | Benjamin Perkins |
Erscheinungsdatum: | 06.09.2023 |
Gewicht: | 1,832 kg |