Welcome to the Tutorials repo.
The "Tutorials_SparkSQL" folder has the Julia Pluto notebook tutorials and sample data. To run the notebook:
- Install Apache Spark 3.3.1 or later: http://spark.apache.org/downloads.html
- Install either OpenJDK 11 or 17:
- Setup your JAVA_HOME and SPARK_HOME enviroment variables:
export JAVA_HOME=/path/to/java
export SPARK_HOME=/path/to/Apache/Spark
- If using OpenJDK 11 on Linux set processReaperUseDefaultStackSize to true:
export _JAVA_OPTIONS='-Djdk.lang.processReaperUseDefaultStackSize=true'
- Start Apache Spark (note using default values):
/path/to/Apache/Spark/sbin/start-master.sh
/path/to/Apache/Spark/sbin/start-worker.sh --master localhost:7070
- Start Julia with "JULIA_COPY_STACKS=yes" required for JVM interop:
JULIA_COPY_STACKS=yes julia
- If using Julia on MacOS start with "handle-signals=no":
JULIA_COPY_STACKS=yes julia --handle-signals=no
- Install SparkSQL.jl along with other required Julia Packages:
] add SparkSQL; add DataFrames; add Decimals; add Pluto;
- Launch the Pluto notebook:
Using Pluto; Pluto.run();
- Download the tutorial Notebook and sample data from this repository. In Pluto, navigate to where you saved the tutorial notebook.
- The notebook will run automatically. The code shows the commonly used features so you can use that as the basis of your SparkSQL.jl and Julia projects.