SlideShare a Scribd company logo
User Defined Functions 
… new in Apache Cassandra 3.0 
CASSANDRA-7395 + many more…
Me, Robert… 
• Contribute code to Apache Cassandra (UDFs, row cache + more) 
• Help customers to build Cassandra solutions 
• Freelancer, Coder 
Robert Stupp, Cologne, Germany 
@snazy snazy@snazy.de
Disclaimer 
• Apache Cassandra 3.0 is the next major release 
• Everything is under development 
• Things may change 
• Things may be different in final 3.0 release
Apache Cassandra 3.0 
• … will bring a lot of cool, new features ! 
• … will bring a lot of great improvements ! 
• UDFs is just one of these features :)
UDF - 
What’s that??
UDF 
• UDF means User Defined Function 
• You write the code that’s executed on Cassandra nodes 
• Functions are distributed transparently to the whole cluster 
• You may not have to wait for a new release for new functionality :)
UDF Characteristics 
• „Pure“ 
• just input parameters 
• no state, side effects, dependencies to other code, etc 
• Usually deterministic
Consider a Java function like… 
import nothing; 
public final class MyClass 
{ 
public static int myFunction ( int argument ) 
{ 
return argument * 42; 
} 
} 
This would be your UDF
CREATE FUNCTION sinPlusFoo 
( 
valueA double, 
valueB double 
arguments 
return type 
) 
RETURNS double 
LANGUAGE java 
AS ’return Math.sin(valueA) + valueB;’; 
Java works out of the box! 
Example 
define the UDF language 
Java code
Next Example 
CREATE FUNCTION sin ( 
value double ) 
RETURNS double 
LANGUAGE javascript 
AS ’Math.sin(value);’; 
JavaScript works out of the box! 
Cassandra 3.0 targets Java8 - so it’s „Nashorn“ 
JavaScript works, too! 
Javascript code
JSR 223 
• “Scripting for the Java Platform“ 
• UDFs can be written in Java and JavaScript 
• Optionally: Groovy, JRuby, Jython, Scala 
• Not: Clojure (JSR 223 implementation’s wrong)
Behind the scenes 
• Builds Java (or script) source 
• Compiles that code (Java class, or compiled script) 
• Loads the compiled code 
• Migrates the function to all other nodes 
• Done - UDF is executable on any node
Types for UDFs 
• Support for all Cassandra types for arguments and return value 
• All means 
• Primitives (boolean, int, double, uuid, etc) 
• Collections (list, set, map) 
• Tuple types, User Defined Types
UDF - 
For what?
UDF invocation 
SELECT sumThat ( colA, colB ) 
Now your application can 
sum two values in one row - 
or create the sin of a value! 
GREAT NEW FEATURES! 
Okay - not really… 
FROM myTable 
WHERE key = ... 
SELECT sin ( foo ) 
FROM myCircle 
WHERE pk = ...
UDFs are good for… 
• UDFs on their own are just „nice to have“ 
• Nothing you couldn’t do better in your application
Real Use Case for UDFs ?
User Defined Aggregates ! 
CASSANDRA-8053
User Defined Aggregates 
Use UDFs to code your own aggregation functions 
(Aggregates are things like SUM, AVG, MIN, MAX, etc) 
Aggregates : 
consume values from multiple rows & 
produce a single result
Example 
name of the aggregate 
function argument types 
CREATE AGGREGATE minimum ( int ) 
STYPE int 
SFUNC minimumState; 
name of the “state“ UDF 
Syntax similar to Postgres. 
state type
How an aggregate works 
SELECT minimum ( val ) FROM foo … 
1. Initial state is set to null 
2. for each row the state function is called with 
current state and column value - returns new state 
3. After all rows the aggregate returns the last state
More sophisticated 
CREATE AGGREGATE average ( int ) 
SFUNC averageState 
STYPE tuple<long,int> 
FINALFUNC averageFinal 
INITCOND (0, 0); 
UDF called after last 
row 
FINALFUNC + INITCOND are optional initial state value
How that aggregate works 
SELECT average ( val ) FROM foo … 
1. Initial state is set to (0,0) 
2. for each row the state function is called with 
current state + column value - returns new state 
3. After all rows the final function is called with last state 
4. final function calculates the aggregate
Now everybody can execute evil 
code on your cluster :)
UDF permissions 
• There will be permissions to restrict (allow) 
• UDF creation (DDL) 
• UDF execution (DML) 
CASSANDRA-7557
Built in functions 
• All known built-in functions are called native functions 
• Native functions belong to SYSTEM keyspace 
• Native functions cannot be modified (or dropped) 
Note: you already know native functions like 
now, count, unixtimestampof
UDF belong to a keyspace 
• User Defined Functions and 
• User Defined Aggregate 
• belong to a keyspace 
• SYSTEM keyspace is searched first for functions 
(then the current keyspace) 
if function/aggregate is not fully qualified
UDF - some final words… 
Keep in mind: 
• JSR-223 has overhead - Java UDFs are much faster 
• Do not allow everyone to create UDFs (in production) 
• Keep your UDFs “pure“ 
• Test your UDFs and user defined aggregates thoroughly
For the geeks :) 
• UDFs and user defined aggregates are executed on the coordinator node 
• Prefer to use Java-UDFs for performance reasons
Let a man dream… 
UDFs could be useful for… 
• Functional indexes 
• Partial indexes 
• Filtering 
• Distributed GROUP BY 
• etc etc
Q & A 
THANK YOU FOR YOUR ATTENTION :) 
Robert Stupp, Cologne, Germany 
@snazy snazy@snazy.de
User defined-functions-cassandra-summit-eu-2014

More Related Content

PDF
Cassandra UDF and Materialized Views
PDF
Cassandra and materialized views
PDF
Cassandra 3.0
PDF
Scala coated JVM
PDF
Scala @ TechMeetup Edinburgh
PDF
Xitrum Web Framework Live Coding Demos / Xitrum Web Framework ライブコーディング
PDF
Http4s, Doobie and Circe: The Functional Web Stack
KEY
The Why and How of Scala at Twitter
Cassandra UDF and Materialized Views
Cassandra and materialized views
Cassandra 3.0
Scala coated JVM
Scala @ TechMeetup Edinburgh
Xitrum Web Framework Live Coding Demos / Xitrum Web Framework ライブコーディング
Http4s, Doobie and Circe: The Functional Web Stack
The Why and How of Scala at Twitter

What's hot (20)

PDF
Solid And Sustainable Development in Scala
PDF
Squeak DBX
PDF
Cassandra 3.0 Awesomeness
PDF
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
PPTX
Demystifying Oak Search
PDF
How and Where in GLORP
PDF
Multithreading and Parallelism on iOS [MobOS 2013]
PDF
Objective-C Is Not Java
PDF
Data stax academy
PDF
Spark workshop
PPTX
PPTX
SenchaCon 2016: Modernizing the Ext JS Class System - Don Griffin
PDF
Node Boot Camp
PPTX
A Brief Intro to Scala
KEY
Scala Introduction
PDF
April 2010 - JBoss Web Services
PPTX
Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一
PDF
JUnit5 and TestContainers
PPTX
The CoFX Data Model
PDF
XQuery in the Cloud
Solid And Sustainable Development in Scala
Squeak DBX
Cassandra 3.0 Awesomeness
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
Demystifying Oak Search
How and Where in GLORP
Multithreading and Parallelism on iOS [MobOS 2013]
Objective-C Is Not Java
Data stax academy
Spark workshop
SenchaCon 2016: Modernizing the Ext JS Class System - Don Griffin
Node Boot Camp
A Brief Intro to Scala
Scala Introduction
April 2010 - JBoss Web Services
Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一
JUnit5 and TestContainers
The CoFX Data Model
XQuery in the Cloud
Ad

Similar to User defined-functions-cassandra-summit-eu-2014 (20)

PDF
Supporting Over a Thousand Custom Hive User Defined Functions
PPT
Basic info on java intro
PPT
Basic info on java intro
PPTX
Introduction to Murasaki
PDF
Новый InterSystems: open-source, митапы, хакатоны
PPTX
Cross Platform App Development with C++
PDF
Speed up UDFs with GPUs using the RAPIDS Accelerator
PPTX
Killer Scenarios with Data Lake in Azure with U-SQL
PDF
Advanced Node.JS Meetup
KEY
Exciting JavaScript - Part II
PPTX
Java - A broad introduction
PPTX
Unit-3.pptx node js ppt documents semester-5
PDF
Introducing BoxLang : A new JVM language for productivity and modularity!
PPT
What is Java Technology (An introduction with comparision of .net coding)
PDF
"Xapi-lang For declarative code generation" By James Nelson
PPTX
Hadoop cluster performance profiler
PDF
JavaCro'14 - Using WildFly core to build high performance web server – Tomaž ...
PPTX
Intro To Node.js
PPTX
How to implement a simple dalvik virtual machine
PPTX
uRequire@greecejs: An introduction to http://uRequire.org
Supporting Over a Thousand Custom Hive User Defined Functions
Basic info on java intro
Basic info on java intro
Introduction to Murasaki
Новый InterSystems: open-source, митапы, хакатоны
Cross Platform App Development with C++
Speed up UDFs with GPUs using the RAPIDS Accelerator
Killer Scenarios with Data Lake in Azure with U-SQL
Advanced Node.JS Meetup
Exciting JavaScript - Part II
Java - A broad introduction
Unit-3.pptx node js ppt documents semester-5
Introducing BoxLang : A new JVM language for productivity and modularity!
What is Java Technology (An introduction with comparision of .net coding)
"Xapi-lang For declarative code generation" By James Nelson
Hadoop cluster performance profiler
JavaCro'14 - Using WildFly core to build high performance web server – Tomaž ...
Intro To Node.js
How to implement a simple dalvik virtual machine
uRequire@greecejs: An introduction to http://uRequire.org
Ad

Recently uploaded (20)

PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PPTX
Transform Your Business with a Software ERP System
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Digital Systems & Binary Numbers (comprehensive )
PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PPTX
L1 - Introduction to python Backend.pptx
PPTX
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PPTX
Why Generative AI is the Future of Content, Code & Creativity?
PPTX
Operating system designcfffgfgggggggvggggggggg
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Designing Intelligence for the Shop Floor.pdf
PPTX
Introduction to Artificial Intelligence
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Nekopoi APK 2025 free lastest update
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PPTX
CHAPTER 2 - PM Management and IT Context
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Transform Your Business with a Software ERP System
Odoo Companies in India – Driving Business Transformation.pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Digital Systems & Binary Numbers (comprehensive )
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
L1 - Introduction to python Backend.pptx
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
How to Choose the Right IT Partner for Your Business in Malaysia
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Why Generative AI is the Future of Content, Code & Creativity?
Operating system designcfffgfgggggggvggggggggg
Odoo POS Development Services by CandidRoot Solutions
Designing Intelligence for the Shop Floor.pdf
Introduction to Artificial Intelligence
Design an Analysis of Algorithms I-SECS-1021-03
Nekopoi APK 2025 free lastest update
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
wealthsignaloriginal-com-DS-text-... (1).pdf
CHAPTER 2 - PM Management and IT Context

User defined-functions-cassandra-summit-eu-2014

  • 1. User Defined Functions … new in Apache Cassandra 3.0 CASSANDRA-7395 + many more…
  • 2. Me, Robert… • Contribute code to Apache Cassandra (UDFs, row cache + more) • Help customers to build Cassandra solutions • Freelancer, Coder Robert Stupp, Cologne, Germany @snazy [email protected]
  • 3. Disclaimer • Apache Cassandra 3.0 is the next major release • Everything is under development • Things may change • Things may be different in final 3.0 release
  • 4. Apache Cassandra 3.0 • … will bring a lot of cool, new features ! • … will bring a lot of great improvements ! • UDFs is just one of these features :)
  • 5. UDF - What’s that??
  • 6. UDF • UDF means User Defined Function • You write the code that’s executed on Cassandra nodes • Functions are distributed transparently to the whole cluster • You may not have to wait for a new release for new functionality :)
  • 7. UDF Characteristics • „Pure“ • just input parameters • no state, side effects, dependencies to other code, etc • Usually deterministic
  • 8. Consider a Java function like… import nothing; public final class MyClass { public static int myFunction ( int argument ) { return argument * 42; } } This would be your UDF
  • 9. CREATE FUNCTION sinPlusFoo ( valueA double, valueB double arguments return type ) RETURNS double LANGUAGE java AS ’return Math.sin(valueA) + valueB;’; Java works out of the box! Example define the UDF language Java code
  • 10. Next Example CREATE FUNCTION sin ( value double ) RETURNS double LANGUAGE javascript AS ’Math.sin(value);’; JavaScript works out of the box! Cassandra 3.0 targets Java8 - so it’s „Nashorn“ JavaScript works, too! Javascript code
  • 11. JSR 223 • “Scripting for the Java Platform“ • UDFs can be written in Java and JavaScript • Optionally: Groovy, JRuby, Jython, Scala • Not: Clojure (JSR 223 implementation’s wrong)
  • 12. Behind the scenes • Builds Java (or script) source • Compiles that code (Java class, or compiled script) • Loads the compiled code • Migrates the function to all other nodes • Done - UDF is executable on any node
  • 13. Types for UDFs • Support for all Cassandra types for arguments and return value • All means • Primitives (boolean, int, double, uuid, etc) • Collections (list, set, map) • Tuple types, User Defined Types
  • 14. UDF - For what?
  • 15. UDF invocation SELECT sumThat ( colA, colB ) Now your application can sum two values in one row - or create the sin of a value! GREAT NEW FEATURES! Okay - not really… FROM myTable WHERE key = ... SELECT sin ( foo ) FROM myCircle WHERE pk = ...
  • 16. UDFs are good for… • UDFs on their own are just „nice to have“ • Nothing you couldn’t do better in your application
  • 17. Real Use Case for UDFs ?
  • 18. User Defined Aggregates ! CASSANDRA-8053
  • 19. User Defined Aggregates Use UDFs to code your own aggregation functions (Aggregates are things like SUM, AVG, MIN, MAX, etc) Aggregates : consume values from multiple rows & produce a single result
  • 20. Example name of the aggregate function argument types CREATE AGGREGATE minimum ( int ) STYPE int SFUNC minimumState; name of the “state“ UDF Syntax similar to Postgres. state type
  • 21. How an aggregate works SELECT minimum ( val ) FROM foo … 1. Initial state is set to null 2. for each row the state function is called with current state and column value - returns new state 3. After all rows the aggregate returns the last state
  • 22. More sophisticated CREATE AGGREGATE average ( int ) SFUNC averageState STYPE tuple<long,int> FINALFUNC averageFinal INITCOND (0, 0); UDF called after last row FINALFUNC + INITCOND are optional initial state value
  • 23. How that aggregate works SELECT average ( val ) FROM foo … 1. Initial state is set to (0,0) 2. for each row the state function is called with current state + column value - returns new state 3. After all rows the final function is called with last state 4. final function calculates the aggregate
  • 24. Now everybody can execute evil code on your cluster :)
  • 25. UDF permissions • There will be permissions to restrict (allow) • UDF creation (DDL) • UDF execution (DML) CASSANDRA-7557
  • 26. Built in functions • All known built-in functions are called native functions • Native functions belong to SYSTEM keyspace • Native functions cannot be modified (or dropped) Note: you already know native functions like now, count, unixtimestampof
  • 27. UDF belong to a keyspace • User Defined Functions and • User Defined Aggregate • belong to a keyspace • SYSTEM keyspace is searched first for functions (then the current keyspace) if function/aggregate is not fully qualified
  • 28. UDF - some final words… Keep in mind: • JSR-223 has overhead - Java UDFs are much faster • Do not allow everyone to create UDFs (in production) • Keep your UDFs “pure“ • Test your UDFs and user defined aggregates thoroughly
  • 29. For the geeks :) • UDFs and user defined aggregates are executed on the coordinator node • Prefer to use Java-UDFs for performance reasons
  • 30. Let a man dream… UDFs could be useful for… • Functional indexes • Partial indexes • Filtering • Distributed GROUP BY • etc etc
  • 31. Q & A THANK YOU FOR YOUR ATTENTION :) Robert Stupp, Cologne, Germany @snazy [email protected]