Showing posts with label spark dataframe scala api. Show all posts
Showing posts with label spark dataframe scala api. Show all posts

Monday, February 26, 2024

Scala — Transpose or Pivot | Rows to Columns in Dataframe | Databricks

In this tutorial, you will learn "How to Transpose or Pivot | Rows to Columns in Dataframe by using Scala" in Databricks.

Data integrity refers to the quality, consistency, and reliability of data throughout its life cycle. Data engineering pipelines are methods and structures that collect, transform, store, and analyse data from many sources.

Scala is a computer language that combines the object-oriented and functional programming paradigms. Martin Odersky invented it, and it was initially made available in 2003. "Scala" is an abbreviation for "scalable language," signifying the language's capacity to grow from simple scripts to complex systems.

Scala is a language designed to be productive, expressive, and compact that can be used for a variety of tasks, from large-scale corporate applications to scripting. It has become more well-liked in sectors like banking, where its robust type system and expressive syntax are very helpful.

If you want Transpose or Pivot | Rows to Columns in Dataframe by using Scala in Databricks, then you have to follow the following steps - 💎 Import necessary Spark classes for DataFrame operations.

Saturday, February 24, 2024

DataBricks - Change column names from CamelCase to Snake_Case by Scala

In this tutorial, you will learn "How to Change column names from CamelCase to Snake_Case by using Scala" in Databricks.

💡Imagine we have an input Dataframe (as in the image). Our goal is to achieve the desired output Dataframe (also in the image).

Basically, you have to change the names of column as follows-
Age -> Age , 
FirstName -> First_Name, 
CityName -> City_Name, 
CountryName -> Country_Name


To create a Dataframe in Scala, you can use Apache Spark's Dataframe API.

In this example: 💎Import necessary Spark classes for Dataframe operations. 💎Create a SparkSession which is the entry point to Spark SQL functionality. 💎Define a schema for our Dataframe using StructType and StructField. 💎Define the data as a sequence of rows, where each row represents a record in the Dataframe. 💎Create the Dataframe using createDataFrame method of SparkSession, passing in the data and schema. 💎Display the Dataframe using show() method. 💎Create Variable to store Regex Pattern 💎Create Variable to store new Snake Case columns 💎Create new Dataframe with Snake Case Columns 💎Finally, display the data from the Dataframe