--- title: Spark By Examples --- ::::{grid} :reverse: :gutter: 2 1 1 1 :margin: 4 4 1 1 :::{grid-item} :columns: 4 ```{image} ./_static/apachespark.svg :width: 150px ``` ::: :::{grid-item} :columns: 8 :class: sd-fs-3 Spark By Examples | Learn Spark Tutorial with Examples. ::: :::: # Spark By Examples ```{toctree} :maxdepth: 1 :caption: Spark Base base/installation-on-windows base/installation-on-linux base/spark-on-intelliJ base/spark-session base/spark-context ``` ```{toctree} :maxdepth: 1 :caption: Spark RDD spark-rdd/spark-rdd-summary Spark RDD — 创建RDD Spark RDD — Parallelize Spark RDD — textFile Spark RDD — 转换算子 Spark RDD — 行动算子 Spark RDD — DAG Spark RDD — 持久化 Spark RDD — 共享变量 ``` ## What is Apache Spark? Apache Spark is an Open source analytical processing engine for large scale powerful distributed data processing and machine learning applications. Spark is Originally developed at the University of California, Berkeley’s, and later donated to Apache Software Foundation. In February 2014, Spark became a [Top-Level Apache Project](https://en.wikipedia.org/wiki/Apache_Spark) and has been contributed by thousands of engineers and made Spark as one of the most active open-source projects in Apache. ## Apache Spark Features * In-memory computation * Distributed processing using parallelize * Can be used with many cluster managers (Spark, Yarn, Mesos e.t.c) * Fault-tolerant * Immutable * Lazy evaluation * Cache & persistence * Inbuild-optimization when using DataFrames * Supports ANSI SQL