Spark- 概览
做事强势果断,做人不卑不亢!
Programming Guides:
Quick Start: a quick introduction to the Spark API; start here!
RDD Programming Guide: overview of Spark basics - RDDs (core but old API), accumulators, and broadcast variables
Spark SQL, Datasets, and DataFrames: processing structured data with relational queries (newer API than RDDs)
Structured Streaming: processing structured data streams with relation queries (using Datasets and DataFrames, newer API than DStreams)
Spark Streaming: processing data streams using DStreams (old API)
Deployment Guides:
Cluster Overview: overview of concepts and components when running on a cluster
Submitting Applications: packaging and deploying applications
Deployment modes:
Standalone Deploy Mode: launch a standalone cluster quickly without a third-party cluster manager
YARN: deploy Spark on top of Hadoop NextGen (YARN)
Other Documents:
Configuration: customize Spark via its configuration system
Monitoring: track the behavior of your applications
Tuning Guide: best practices to optimize performance and memory use
Job Scheduling: scheduling resources across and within Spark applications
Security: Spark security support
Hardware Provisioning: recommendations for cluster hardware
Integration with other storage systems:Cloud InfrastructuresOpenStack Swift
Migration Guide: Migration guides for Spark components
Building Spark: build Spark using the Maven system
Third Party Projects: related third party Spark projects
评论