SlideShare a Scribd company logo
BUILD ML ENHANCED
EVENT STREAMING
APPLICATIONS WITH
MICROSERVICES
Tim Spann | Developer Advocate
● Introduction
● What is Apache Pulsar?
● Why Apache Pulsar?
● Remember Hadoop and
Kafka?
● Functions
● Apache NiFi
● Apache Flink
● Demos
● Q&A
Tim Spann
Developer Advocate
Tim Spann
Developer Advocate at StreamNative
● FLiP(N) Stack = Flink, Pulsar and NiFi Stack
● Streaming Systems & Data Architecture Expert
● Experience:
○ 15+ years of experience with streaming technologies including Pulsar, Flink, Spark, NiFi, Big
Data, Cloud, MXNet, IoT, Python and more.
○ Today, he helps to grow the Pulsar community sharing rich technical knowledge and experience
at both global conferences and through individual conversations.
FLiP Stack Weekly
This week in Apache Flink, Apache Pulsar, Apache
NiFi, Apache Spark and open source friends.
https://bit.ly/32dAJft
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications with Java and Python Microservices
Apache Pulsar has a vibrant community
560+
Contributors
10,000+
Commits
7,000+
Slack Members
1,000+
Organizations
Using Pulsar
It is often assumed that Pulsar and Kafka have equal capabilities. In reality,
Pulsar offers a superset of Kafka.
● Pulsar is streaming and queuing together
● Pulsar is cloud-native with stateless brokers
● Natively includes geo-replication, multi-tenancy, and end-to-end
security out of the box
● Pulsar provides automated rebalancing
● Pulsar offers 100X lower latency w/ 2.5 greater throughput than Kafka
Advantages of Apache Pulsar
CREATED
Originally
developed inside
Yahoo! as Cloud
Messaging
Service
GROWTH
10x Contributors
10MM+ Downloads
Ecosystem Expands
Kafka on Pulsar
AMQ on Pulsar
Functions
. . .
2012 2016 2018 TODAY
APACHE TLP
Pulsar
becomes
Apache top
level project.
OPEN SOURCE
Pulsar
committed
to open source.
Apache Pulsar Timeline
Evolution of Pulsar Growth
Pulsar Has a Built-in Super Set of OSS
Features
Durability
Scalability Geo-Replication
Multi-Tenancy
Unified Messaging
Model
Reduced Vendor Dependency
Functions
Open-Source Features
Apache Pulsar is built to support legacy applications, handle the
needs of modern apps, and supports NextGen applications
Support legacy workloads.
Compatible with popular
messaging and streaming tools.
Legacy
Built for today's real-time
event driven applications.
Modern
Scalable, adaptive architecture
ready for the future of real-time
streaming.
NextGen
Apache Pulsar features
Cloud native with decoupled
storage and compute layers.
Built-in compatibility with your
existing code and messaging
infrastructure.
Geographic redundancy and high
availability included.
Centralized cluster management
and oversight.
Elastic horizontal and vertical
scalability.
Seamless and instant partitioning
rebalancing with no downtime.
Flexible subscription model
supports a wide array of use cases.
Compatible with the tools you use
to store, analyze, and process data.
Component Description
Value /
Data payload
The data carried by the message. All Pulsar messages contain raw bytes,
although message data can also conform to data schemas.
Key Messages are optionally tagged with keys, used in partitioning and also is useful
for things like topic compaction.
Properties An optional key/value map of user-defined properties.
Producer name The name of the producer who produces the message. If you do not specify a
producer name, the default name is used.
Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The
sequence ID of the message is its order in that sequence.
Messages - the Basic Unit of Apache Pulsar
● “Bookies”
● Stores messages and cursors
● Messages are grouped in
segments/ledgers
● A group of bookies form an
“ensemble” to store a ledger
● “Brokers”
● Handles message routing and
connections
● Stateless, but with caches
● Automatic load-balancing
● Topics are composed of
multiple segments
●
● Stores metadata for both
Pulsar and BookKeeper
● Service discovery
Store
Messages
Metadata &
Service Discovery
Metadata &
Service Discovery
Pulsar Cluster
Metadata
Storage
Pulsar Cluster
Different subscription modes
have different semantics:
Exclusive/Failover -
guaranteed order, single active
consumer
Shared - multiple active
consumers, no order
Key_Shared - multiple active
consumers, order for given key
Producer 1
Producer 2
Pulsar Topic
Subscription D
Consumer D-1
Consumer D-2
Key-Shared
<
K
1,
V
10
>
<
K
1,
V
11
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
2
,V
2
1>
<
K
2
,V
2
2
>
Subscription C
Consumer C-1
Consumer C-2
Shared
<
K
1,
V
10
>
<
K
2
,V
2
1>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
1,
V
11
>
<
K
2
,V
2
2
>
Subscription A Consumer A
Exclusive
Subscription B
Consumer B-1
Consumer B-2
In case of failure in
Consumer B-1
Failover
Apache Pulsar Subscription Modes
Streaming
Consumer
Consumer
Consumer
Subscription
Shared
Failover
Consumer
Consumer
Subscription
In case of failure in
Consumer B-0
Consumer
Consumer
Subscription
Exclusive
X
Consumer
Consumer
Key-Shared
Subscription
Pulsar
Topic/Partition
Messaging
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications with Java and Python Microservices
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications with Java and Python Microservices
Messaging Use Cases Streaming Use Cases
Service x commands service y to make some
change.
Example: order service removing item from
inventory service
Moving large amounts of data to another service
(real-time ETL).
Example: logs to elasticsearch
Distributing messages that represent work
among n workers.
Example: order processing not in main “thread”
Periodic jobs moving large amounts of data and
aggregating to more traditional stores.
Example: logs to s3
Sending “scheduled” messages.
Example: notification service for marketing emails
or push notifications
Computing a near real-time aggregate of a message
stream, split among n workers, with order being
important.
Example: real-time analytics over page views
Messaging vs Streaming
Messaging Use Case Streaming Use Case
Retention The amount of data retained is
relatively small - typically only a day
or two of data at most.
Large amounts of data are retained,
with higher ingest volumes and
longer retention periods.
Throughput Messaging systems are not designed
to manage big “catch-up” reads.
Streaming systems are designed to
scale and can handle use cases
such as catch-up reads.
Differences in Consumption
byte[] msgIdBytes = // Some byte
array
MessageId id =
MessageId.fromByteArray(msgIdBytes);
Reader<byte[]> reader =
pulsarClient.newReader()
.topic(topic)
.startMessageId(id)
.create();
Create a reader that will read from
some message between earliest and
latest.
Reader
Apache Pulsar Reader Interface
● New Consumer type added in Pulsar 2.10 that provides a
continuously updated key-value map view of compacted topic data.
● An abstraction of a changelog stream from a primary-keyed table,
where each record in the changelog stream is an update on the
primary-keyed table with the record key as the primary key.
● READ ONLY DATA STRUCTURE!
Apache Pulsar TableView
Schema Registry
schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3
(value=Avro/Protobuf/JSON)
Schema
Data
ID
Local Cache
for Schemas
+
Schema
Data
ID +
Local Cache
for Schemas
Send schema-1
(value=Avro/Protobuf/JSON) data
serialized per schema ID
Send (register)
schema (if not in
local cache)
Read schema-1
(value=Avro/Protobuf/JSON) data
deserialized per schema ID
Get schema by ID (if
not in local cache)
Producers Consumers
Schema Registry
● Utilizing JSON Data with a JSON Schema
● Consistency, Contracts, Clean Data
● This enables easy SQL:
○ Pulsar SQL (Presto SQL)
○ Flink SQL
○ Spark Structured Streaming
Use Schemas
• Functions - Lightweight Stream
Processing (Java, Python, Go)
• Connectors - Sources & Sinks
(Cassandra, Kafka, …)
• Protocol Handlers - AoP (AMQP), KoP
(Kafka), MoP (MQTT)
• Processing Engines - Flink, Spark,
Presto/Trino via Pulsar SQL
• Data Offloaders - Tiered Storage - (S3)
Sources, Sinks and Processing
Kafka on Pulsar (KoP)
MQTT on Pulsar (MoP)
AMQP on Pulsar (AoP)
The FLiPN Kitten crosses the stream,
4 ways with Pulsar
MoP AoP KoP WebSockets
Use Apache Pulsar For Ingest
Use Apache Pulsar To Stream to Lakehouses
● Lightweight computation similar
to AWS Lambda.
● Specifically designed to use
Apache Pulsar as a message
bus.
● Function runtime can be
located within Pulsar Broker.
● Java Functions
A serverless event
streaming framework
Pulsar Functions
● Consume messages from one or
more Pulsar topics.
● Apply user-supplied processing
logic to each message.
● Publish the results of the
computation to another topic.
● Support multiple programming
languages (Java, Python, Go)
● Can leverage 3rd-party libraries
to support the execution of ML
models on the edge.
Pulsar Functions
● Route
● Enrich
● Convert
● Lookups
● Run
Machine Learning
● Logging
● Auditing
● Parse
● Split
● Convert
Pulsar Functions
ML Java Coding (Deep Java Library)
Java Pulsar Functions
from pulsar import Function
import json
class Chat(Function):
def __init__(self):
pass
def process(self, input, context):
logger = context.get_logger()
logger.info("Message Content: {0}".format(input))
msg_id = context.get_message_id()
row = { }
row['id'] = str(msg_id)
json_string = json.dumps(row)
return json_string
Python Pulsar Functions
from pulsar import Function
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import json
class Chat(Function):
def __init__(self):
pass
def process(self, input, context):
fields = json.loads(input)
sid = SentimentIntensityAnalyzer()
ss = sid.polarity_scores(fields["comment"])
row = { }
row['id'] = str(msg_id)
if ss['compound'] < 0.00:
row['sentiment'] = 'Negative'
else:
row['sentiment'] = 'Positive'
row['comment'] = str(fields["comment"])
json_string = json.dumps(row)
return json_string
Entire Function
Pulsar Python NLP Function
Why Pulsar Functions for Microservices?
Desired Characteristic Pulsar Functions…
Highly maintainable and testable Are small pieces of code written in popular
languages such as Java, Python, or Go. They can be
easily maintained in source control repositories and
tested with existing frameworks automatically.
Loosely coupled with other
services
Are not directly linked to one another and
communicate via messages.
Independently deployable Are designed to be deployed independently
Can be developed by a small team Are often developed by a single developer.
Inter-service Communication Support all message patterns using Pulsar as the
underlying message bus.
Deployment & Composition Can run as individual threads, processes, or K8s
pods. The Function Mesh allows you to deploy
multiple Pulsar Functions as a single unit.
Function Mesh
Pulsar Functions, along with Pulsar
IO/Connectors, provide a powerful API for
ingesting, transforming, and outputting data.
Function Mesh, another StreamNative
project, makes it easier for developers to
create entire applications built from sources,
functions, and sinks all through a declarative
API.
Edge AI
● Apache Pulsar’s two-tier architecture separates the compute and storage layers, and interact
with one another over a TCP/IP connection. This allows us to run the computing layer (Broker)
on either Edge servers or IoT Gateway devices.
● Pulsar’s serverless computing framework, know as Pulsar Functions, can run inside the
Broker as threads. Effectively “stretching” the data processing layer.
Edge Computing with Pulsar
● Pulsar’s Serverless computing framework can run inside the Pulsar Broker as a
thread pool. This framework can be used as the execution environment for ML
models.
● The Apache Pulsar Broker supports the MQTT protocol and therefore can directly
receive incoming data from the sensor hubs and store it in a topic.
Benefits of Running Pulsar Broker on the Edge
● You can leverage 3rd
party libraries
within Pulsar Functions
● DeepLearning4J
● JPMML
● DJL.AI
● Keras
● Pulsar Functions are able to
support:
● A variety of ML model types.
● Models developed with
different languages and toolkits
Pulsar Function – Third Party Library Support
Building Real-Time Apps Requires a Team
https://www.influxdata.com/integration/mqtt-monitoring/
https://www.influxdata.com/integration/mqtt-monitoring/
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Supports push and pull
models
• Hundreds of processors
• Visual command and
control
• Over a 300 components
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering
• Version Control
Apache NiFi Basics
Apache NiFi - Apache Pulsar Connector
https://github.com/streamnative/pulsar-nifi-bundle
Apache NiFi - Apache Pulsar Connector
Apache NiFi - Apache Pulsar Connector
Apache NiFi - Apache Pulsar Connector
● Unified computing engine
● Batch processing is a special case of stream processing
● Stateful processing
● Massive Scalability
● Flink SQL for queries, inserts against Pulsar Topics
● Streaming Analytics
● Continuous SQL
● Continuous ETL
● Complex Event Processing
● Standard SQL Powered by Apache Calcite
Apache Flink
Apache Flink Job Dashboard
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications with Java and Python Microservices
NLP Streaming Architecture
IoT Streaming Architecture
Streaming FLiPN Java App
Learn More
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications with Java and Python Microservices
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications with Java and Python Microservices
Apache Pulsar Training
● Instructor-led courses
○ Pulsar Fundamentals
○ Pulsar Developers
○ Pulsar Operations
● On-demand learning with labs
● 300+ engineers, admins and
architects trained!
Now Available
On-Demand
Pulsar Training
Academy.StreamNative.io
StreamNative Academy
● https://github.com/tspannhw/pulsar-pychat-function
● https://streamnative.io/apache-nifi-connector/
● https://nightlies.apache.org/flink/flink-docs-master/docs/conne
ctors/datastream/pulsar/
● https://streamnative.io/en/blog/release/2021-04-20-flink-sql-o
n-streamnative-cloud
● https://github.com/streamnative/flink-example
● https://pulsar.apache.org/docs/en/adaptors-spark/
Apache Pulsar Links
● https://github.com/tspannhw/FLiP-Pi-BreakoutGarden
● https://github.com/tspannhw/FLiP-Pi-Thermal
● https://github.com/tspannhw/FLiP-Pi-Weather
● https://github.com/tspannhw/FLiP-RP400
● https://github.com/tspannhw/FLiP-Py-Pi-GasThermal
● https://github.com/tspannhw/FLiP-PY-FakeDataPulsar
● https://github.com/tspannhw/FLiP-Py-Pi-EnviroPlus
● https://github.com/tspannhw/PythonPulsarExamples
● https://github.com/tspannhw/pulsar-pychat-function
● https://github.com/tspannhw/FLiP-PulsarDevPython101
Apache Pulsar Examples
Deploying AI With an
Event-Driven
Platform
https://dzone.com/trendreports/enterprise-ai-1
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications with Java and Python Microservices
Apache Pulsar in Action
http://tinyurl.com/bdha5p4r
Please enjoy David’s complete book which is the ultimate guide to Pulsar.
https://streamnative.io/blog/engineering/2021-11-17-building-edge-applications-with-apache-pulsar/
Scan the QR code
to learn more about
Apache Pulsar and
StreamNative.
Scan the QR code
to build your own
apps today.
Tim Spann
Developer Advocate
@PaaSDev
https://www.linkedin.com/in/timothyspann
https://github.com/tspannhw
Let’s Keep in Touch
Ad

Recommended

bigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Apps
Timothy Spann
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for ML
Edunomica
 
Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
Timothy Spann
 
Apache Pulsar Development 101 with Python
Apache Pulsar Development 101 with Python
Timothy Spann
 
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)
Timothy Spann
 
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
Altinity Ltd
 
OSA Con 2022: Streaming Data Made Easy
OSA Con 2022: Streaming Data Made Easy
Timothy Spann
 
[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lake
Timothy Spann
 
Python web conference 2022 apache pulsar development 101 with python (f li-...
Python web conference 2022 apache pulsar development 101 with python (f li-...
Timothy Spann
 
Princeton Dec 2022 Meetup_ NiFi + Flink + Pulsar
Princeton Dec 2022 Meetup_ NiFi + Flink + Pulsar
Timothy Spann
 
(Current22) Let's Monitor The Conditions at the Conference
(Current22) Let's Monitor The Conditions at the Conference
Timothy Spann
 
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
HostedbyConfluent
 
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
Timothy Spann
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
Timothy Spann
 
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Timothy Spann
 
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Timothy Spann
 
NYC Dec 2022 Meetup_ Building Real-Time Requires a Team
NYC Dec 2022 Meetup_ Building Real-Time Requires a Team
Timothy Spann
 
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
Timothy Spann
 
MLconf 2022 NYC Event-Driven Machine Learning at Scale.pdf
MLconf 2022 NYC Event-Driven Machine Learning at Scale.pdf
Timothy Spann
 
[AerospikeRoadshow] Apache Pulsar Unifies Streaming and Messaging for Real-Ti...
[AerospikeRoadshow] Apache Pulsar Unifies Streaming and Messaging for Real-Ti...
Timothy Spann
 
Open keynote_carolyn&matteo&sijie
Open keynote_carolyn&matteo&sijie
StreamNative
 
The Dream Stream Team for Pulsar and Spring
The Dream Stream Team for Pulsar and Spring
Timothy Spann
 
Apache Pulsar in Action MEAP V04 David Kjerrumgaard
Apache Pulsar in Action MEAP V04 David Kjerrumgaard
biruktresehb
 
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Timothy Spann
 
PhillyJug Getting Started With Real-time Cloud Native Streaming With Java
PhillyJug Getting Started With Real-time Cloud Native Streaming With Java
Timothy Spann
 
Unified Messaging and Data Streaming 101
Unified Messaging and Data Streaming 101
Timothy Spann
 
Pulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scale
Matteo Merli
 
Deep Dive into Building Streaming Applications with Apache Pulsar
Deep Dive into Building Streaming Applications with Apache Pulsar
Timothy Spann
 
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
Timothy Spann
 
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
Timothy Spann
 

More Related Content

Similar to Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications with Java and Python Microservices (20)

Python web conference 2022 apache pulsar development 101 with python (f li-...
Python web conference 2022 apache pulsar development 101 with python (f li-...
Timothy Spann
 
Princeton Dec 2022 Meetup_ NiFi + Flink + Pulsar
Princeton Dec 2022 Meetup_ NiFi + Flink + Pulsar
Timothy Spann
 
(Current22) Let's Monitor The Conditions at the Conference
(Current22) Let's Monitor The Conditions at the Conference
Timothy Spann
 
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
HostedbyConfluent
 
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
Timothy Spann
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
Timothy Spann
 
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Timothy Spann
 
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Timothy Spann
 
NYC Dec 2022 Meetup_ Building Real-Time Requires a Team
NYC Dec 2022 Meetup_ Building Real-Time Requires a Team
Timothy Spann
 
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
Timothy Spann
 
MLconf 2022 NYC Event-Driven Machine Learning at Scale.pdf
MLconf 2022 NYC Event-Driven Machine Learning at Scale.pdf
Timothy Spann
 
[AerospikeRoadshow] Apache Pulsar Unifies Streaming and Messaging for Real-Ti...
[AerospikeRoadshow] Apache Pulsar Unifies Streaming and Messaging for Real-Ti...
Timothy Spann
 
Open keynote_carolyn&matteo&sijie
Open keynote_carolyn&matteo&sijie
StreamNative
 
The Dream Stream Team for Pulsar and Spring
The Dream Stream Team for Pulsar and Spring
Timothy Spann
 
Apache Pulsar in Action MEAP V04 David Kjerrumgaard
Apache Pulsar in Action MEAP V04 David Kjerrumgaard
biruktresehb
 
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Timothy Spann
 
PhillyJug Getting Started With Real-time Cloud Native Streaming With Java
PhillyJug Getting Started With Real-time Cloud Native Streaming With Java
Timothy Spann
 
Unified Messaging and Data Streaming 101
Unified Messaging and Data Streaming 101
Timothy Spann
 
Pulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scale
Matteo Merli
 
Deep Dive into Building Streaming Applications with Apache Pulsar
Deep Dive into Building Streaming Applications with Apache Pulsar
Timothy Spann
 
Python web conference 2022 apache pulsar development 101 with python (f li-...
Python web conference 2022 apache pulsar development 101 with python (f li-...
Timothy Spann
 
Princeton Dec 2022 Meetup_ NiFi + Flink + Pulsar
Princeton Dec 2022 Meetup_ NiFi + Flink + Pulsar
Timothy Spann
 
(Current22) Let's Monitor The Conditions at the Conference
(Current22) Let's Monitor The Conditions at the Conference
Timothy Spann
 
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
HostedbyConfluent
 
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
Timothy Spann
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
Timothy Spann
 
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Timothy Spann
 
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Timothy Spann
 
NYC Dec 2022 Meetup_ Building Real-Time Requires a Team
NYC Dec 2022 Meetup_ Building Real-Time Requires a Team
Timothy Spann
 
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
Timothy Spann
 
MLconf 2022 NYC Event-Driven Machine Learning at Scale.pdf
MLconf 2022 NYC Event-Driven Machine Learning at Scale.pdf
Timothy Spann
 
[AerospikeRoadshow] Apache Pulsar Unifies Streaming and Messaging for Real-Ti...
[AerospikeRoadshow] Apache Pulsar Unifies Streaming and Messaging for Real-Ti...
Timothy Spann
 
Open keynote_carolyn&matteo&sijie
Open keynote_carolyn&matteo&sijie
StreamNative
 
The Dream Stream Team for Pulsar and Spring
The Dream Stream Team for Pulsar and Spring
Timothy Spann
 
Apache Pulsar in Action MEAP V04 David Kjerrumgaard
Apache Pulsar in Action MEAP V04 David Kjerrumgaard
biruktresehb
 
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Timothy Spann
 
PhillyJug Getting Started With Real-time Cloud Native Streaming With Java
PhillyJug Getting Started With Real-time Cloud Native Streaming With Java
Timothy Spann
 
Unified Messaging and Data Streaming 101
Unified Messaging and Data Streaming 101
Timothy Spann
 
Pulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scale
Matteo Merli
 
Deep Dive into Building Streaming Applications with Apache Pulsar
Deep Dive into Building Streaming Applications with Apache Pulsar
Timothy Spann
 

More from Timothy Spann (20)

14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
Timothy Spann
 
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
Timothy Spann
 
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
Timothy Spann
 
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
Timothy Spann
 
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
Timothy Spann
 
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
Timothy Spann
 
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
Timothy Spann
 
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
Timothy Spann
 
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
Timothy Spann
 
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
Timothy Spann
 
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
Timothy Spann
 
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
Timothy Spann
 
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
Timothy Spann
 
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
Timothy Spann
 
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
Timothy Spann
 
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
Timothy Spann
 
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
Timothy Spann
 
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
Timothy Spann
 
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
Timothy Spann
 
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
Timothy Spann
 
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
Timothy Spann
 
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
Timothy Spann
 
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
Timothy Spann
 
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
Timothy Spann
 
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
Timothy Spann
 
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
Timothy Spann
 
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
Timothy Spann
 
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
Timothy Spann
 
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
Timothy Spann
 
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
Timothy Spann
 
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
Timothy Spann
 
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
Timothy Spann
 
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
Timothy Spann
 
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
Timothy Spann
 
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
Timothy Spann
 
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
Timothy Spann
 
Ad

Recently uploaded (20)

FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Alliance
 
SAP Modernization Strategies for a Successful S/4HANA Journey.pdf
SAP Modernization Strategies for a Successful S/4HANA Journey.pdf
Precisely
 
June Patch Tuesday
June Patch Tuesday
Ivanti
 
Security Tips for Enterprise Azure Solutions
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
Edge AI and Vision Alliance
 
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
Muhammad Rizwan Akram
 
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
AmirStern2
 
AI VIDEO MAGAZINE - June 2025 - r/aivideo
AI VIDEO MAGAZINE - June 2025 - r/aivideo
1pcity Studios, Inc
 
Artificial Intelligence in the Nonprofit Boardroom.pdf
Artificial Intelligence in the Nonprofit Boardroom.pdf
OnBoard
 
Providing an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME Flow
Safe Software
 
Bridging the divide: A conversation on tariffs today in the book industry - T...
Bridging the divide: A conversation on tariffs today in the book industry - T...
BookNet Canada
 
Enabling BIM / GIS integrations with Other Systems with FME
Enabling BIM / GIS integrations with Other Systems with FME
Safe Software
 
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Safe Software
 
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
 
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
biswajitbanerjee38
 
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Safe Software
 
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
Edge AI and Vision Alliance
 
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
Safe Software
 
Down the Rabbit Hole – Solving 5 Training Roadblocks
Down the Rabbit Hole – Solving 5 Training Roadblocks
Rustici Software
 
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
 
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Alliance
 
SAP Modernization Strategies for a Successful S/4HANA Journey.pdf
SAP Modernization Strategies for a Successful S/4HANA Journey.pdf
Precisely
 
June Patch Tuesday
June Patch Tuesday
Ivanti
 
Security Tips for Enterprise Azure Solutions
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
Edge AI and Vision Alliance
 
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
Muhammad Rizwan Akram
 
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
vertical-cnc-processing-centers-drillteq-v-200-en.pdf
AmirStern2
 
AI VIDEO MAGAZINE - June 2025 - r/aivideo
AI VIDEO MAGAZINE - June 2025 - r/aivideo
1pcity Studios, Inc
 
Artificial Intelligence in the Nonprofit Boardroom.pdf
Artificial Intelligence in the Nonprofit Boardroom.pdf
OnBoard
 
Providing an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME Flow
Safe Software
 
Bridging the divide: A conversation on tariffs today in the book industry - T...
Bridging the divide: A conversation on tariffs today in the book industry - T...
BookNet Canada
 
Enabling BIM / GIS integrations with Other Systems with FME
Enabling BIM / GIS integrations with Other Systems with FME
Safe Software
 
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Reducing Conflicts and Increasing Safety Along the Cycling Networks of East-F...
Safe Software
 
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
 
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
biswajitbanerjee38
 
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...
Safe Software
 
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
Edge AI and Vision Alliance
 
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
FME for Distribution & Transmission Integrity Management Program (DIMP & TIMP)
Safe Software
 
Down the Rabbit Hole – Solving 5 Training Roadblocks
Down the Rabbit Hole – Solving 5 Training Roadblocks
Rustici Software
 
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Seminar: New Data: Passkey Adoption in the Workforce.pptx
FIDO Alliance
 
Ad

Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications with Java and Python Microservices

  • 1. BUILD ML ENHANCED EVENT STREAMING APPLICATIONS WITH MICROSERVICES Tim Spann | Developer Advocate
  • 2. ● Introduction ● What is Apache Pulsar? ● Why Apache Pulsar? ● Remember Hadoop and Kafka? ● Functions ● Apache NiFi ● Apache Flink ● Demos ● Q&A
  • 3. Tim Spann Developer Advocate Tim Spann Developer Advocate at StreamNative ● FLiP(N) Stack = Flink, Pulsar and NiFi Stack ● Streaming Systems & Data Architecture Expert ● Experience: ○ 15+ years of experience with streaming technologies including Pulsar, Flink, Spark, NiFi, Big Data, Cloud, MXNet, IoT, Python and more. ○ Today, he helps to grow the Pulsar community sharing rich technical knowledge and experience at both global conferences and through individual conversations.
  • 4. FLiP Stack Weekly This week in Apache Flink, Apache Pulsar, Apache NiFi, Apache Spark and open source friends. https://bit.ly/32dAJft
  • 6. Apache Pulsar has a vibrant community 560+ Contributors 10,000+ Commits 7,000+ Slack Members 1,000+ Organizations Using Pulsar
  • 7. It is often assumed that Pulsar and Kafka have equal capabilities. In reality, Pulsar offers a superset of Kafka. ● Pulsar is streaming and queuing together ● Pulsar is cloud-native with stateless brokers ● Natively includes geo-replication, multi-tenancy, and end-to-end security out of the box ● Pulsar provides automated rebalancing ● Pulsar offers 100X lower latency w/ 2.5 greater throughput than Kafka Advantages of Apache Pulsar
  • 8. CREATED Originally developed inside Yahoo! as Cloud Messaging Service GROWTH 10x Contributors 10MM+ Downloads Ecosystem Expands Kafka on Pulsar AMQ on Pulsar Functions . . . 2012 2016 2018 TODAY APACHE TLP Pulsar becomes Apache top level project. OPEN SOURCE Pulsar committed to open source. Apache Pulsar Timeline
  • 10. Pulsar Has a Built-in Super Set of OSS Features Durability Scalability Geo-Replication Multi-Tenancy Unified Messaging Model Reduced Vendor Dependency Functions Open-Source Features
  • 11. Apache Pulsar is built to support legacy applications, handle the needs of modern apps, and supports NextGen applications Support legacy workloads. Compatible with popular messaging and streaming tools. Legacy Built for today's real-time event driven applications. Modern Scalable, adaptive architecture ready for the future of real-time streaming. NextGen
  • 12. Apache Pulsar features Cloud native with decoupled storage and compute layers. Built-in compatibility with your existing code and messaging infrastructure. Geographic redundancy and high availability included. Centralized cluster management and oversight. Elastic horizontal and vertical scalability. Seamless and instant partitioning rebalancing with no downtime. Flexible subscription model supports a wide array of use cases. Compatible with the tools you use to store, analyze, and process data.
  • 13. Component Description Value / Data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data can also conform to data schemas. Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like topic compaction. Properties An optional key/value map of user-defined properties. Producer name The name of the producer who produces the message. If you do not specify a producer name, the default name is used. Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the message is its order in that sequence. Messages - the Basic Unit of Apache Pulsar
  • 14. ● “Bookies” ● Stores messages and cursors ● Messages are grouped in segments/ledgers ● A group of bookies form an “ensemble” to store a ledger ● “Brokers” ● Handles message routing and connections ● Stateless, but with caches ● Automatic load-balancing ● Topics are composed of multiple segments ● ● Stores metadata for both Pulsar and BookKeeper ● Service discovery Store Messages Metadata & Service Discovery Metadata & Service Discovery Pulsar Cluster Metadata Storage Pulsar Cluster
  • 15. Different subscription modes have different semantics: Exclusive/Failover - guaranteed order, single active consumer Shared - multiple active consumers, no order Key_Shared - multiple active consumers, order for given key Producer 1 Producer 2 Pulsar Topic Subscription D Consumer D-1 Consumer D-2 Key-Shared < K 1, V 10 > < K 1, V 11 > < K 1, V 12 > < K 2 ,V 2 0 > < K 2 ,V 2 1> < K 2 ,V 2 2 > Subscription C Consumer C-1 Consumer C-2 Shared < K 1, V 10 > < K 2 ,V 2 1> < K 1, V 12 > < K 2 ,V 2 0 > < K 1, V 11 > < K 2 ,V 2 2 > Subscription A Consumer A Exclusive Subscription B Consumer B-1 Consumer B-2 In case of failure in Consumer B-1 Failover Apache Pulsar Subscription Modes
  • 16. Streaming Consumer Consumer Consumer Subscription Shared Failover Consumer Consumer Subscription In case of failure in Consumer B-0 Consumer Consumer Subscription Exclusive X Consumer Consumer Key-Shared Subscription Pulsar Topic/Partition Messaging
  • 19. Messaging Use Cases Streaming Use Cases Service x commands service y to make some change. Example: order service removing item from inventory service Moving large amounts of data to another service (real-time ETL). Example: logs to elasticsearch Distributing messages that represent work among n workers. Example: order processing not in main “thread” Periodic jobs moving large amounts of data and aggregating to more traditional stores. Example: logs to s3 Sending “scheduled” messages. Example: notification service for marketing emails or push notifications Computing a near real-time aggregate of a message stream, split among n workers, with order being important. Example: real-time analytics over page views Messaging vs Streaming
  • 20. Messaging Use Case Streaming Use Case Retention The amount of data retained is relatively small - typically only a day or two of data at most. Large amounts of data are retained, with higher ingest volumes and longer retention periods. Throughput Messaging systems are not designed to manage big “catch-up” reads. Streaming systems are designed to scale and can handle use cases such as catch-up reads. Differences in Consumption
  • 21. byte[] msgIdBytes = // Some byte array MessageId id = MessageId.fromByteArray(msgIdBytes); Reader<byte[]> reader = pulsarClient.newReader() .topic(topic) .startMessageId(id) .create(); Create a reader that will read from some message between earliest and latest. Reader Apache Pulsar Reader Interface
  • 22. ● New Consumer type added in Pulsar 2.10 that provides a continuously updated key-value map view of compacted topic data. ● An abstraction of a changelog stream from a primary-keyed table, where each record in the changelog stream is an update on the primary-keyed table with the record key as the primary key. ● READ ONLY DATA STRUCTURE! Apache Pulsar TableView
  • 23. Schema Registry schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3 (value=Avro/Protobuf/JSON) Schema Data ID Local Cache for Schemas + Schema Data ID + Local Cache for Schemas Send schema-1 (value=Avro/Protobuf/JSON) data serialized per schema ID Send (register) schema (if not in local cache) Read schema-1 (value=Avro/Protobuf/JSON) data deserialized per schema ID Get schema by ID (if not in local cache) Producers Consumers Schema Registry
  • 24. ● Utilizing JSON Data with a JSON Schema ● Consistency, Contracts, Clean Data ● This enables easy SQL: ○ Pulsar SQL (Presto SQL) ○ Flink SQL ○ Spark Structured Streaming Use Schemas
  • 25. • Functions - Lightweight Stream Processing (Java, Python, Go) • Connectors - Sources & Sinks (Cassandra, Kafka, …) • Protocol Handlers - AoP (AMQP), KoP (Kafka), MoP (MQTT) • Processing Engines - Flink, Spark, Presto/Trino via Pulsar SQL • Data Offloaders - Tiered Storage - (S3) Sources, Sinks and Processing
  • 27. MQTT on Pulsar (MoP)
  • 28. AMQP on Pulsar (AoP)
  • 29. The FLiPN Kitten crosses the stream, 4 ways with Pulsar MoP AoP KoP WebSockets
  • 30. Use Apache Pulsar For Ingest
  • 31. Use Apache Pulsar To Stream to Lakehouses
  • 32. ● Lightweight computation similar to AWS Lambda. ● Specifically designed to use Apache Pulsar as a message bus. ● Function runtime can be located within Pulsar Broker. ● Java Functions A serverless event streaming framework Pulsar Functions
  • 33. ● Consume messages from one or more Pulsar topics. ● Apply user-supplied processing logic to each message. ● Publish the results of the computation to another topic. ● Support multiple programming languages (Java, Python, Go) ● Can leverage 3rd-party libraries to support the execution of ML models on the edge. Pulsar Functions
  • 34. ● Route ● Enrich ● Convert ● Lookups ● Run Machine Learning ● Logging ● Auditing ● Parse ● Split ● Convert Pulsar Functions
  • 35. ML Java Coding (Deep Java Library)
  • 37. from pulsar import Function import json class Chat(Function): def __init__(self): pass def process(self, input, context): logger = context.get_logger() logger.info("Message Content: {0}".format(input)) msg_id = context.get_message_id() row = { } row['id'] = str(msg_id) json_string = json.dumps(row) return json_string Python Pulsar Functions
  • 38. from pulsar import Function from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer import json class Chat(Function): def __init__(self): pass def process(self, input, context): fields = json.loads(input) sid = SentimentIntensityAnalyzer() ss = sid.polarity_scores(fields["comment"]) row = { } row['id'] = str(msg_id) if ss['compound'] < 0.00: row['sentiment'] = 'Negative' else: row['sentiment'] = 'Positive' row['comment'] = str(fields["comment"]) json_string = json.dumps(row) return json_string Entire Function Pulsar Python NLP Function
  • 39. Why Pulsar Functions for Microservices? Desired Characteristic Pulsar Functions… Highly maintainable and testable Are small pieces of code written in popular languages such as Java, Python, or Go. They can be easily maintained in source control repositories and tested with existing frameworks automatically. Loosely coupled with other services Are not directly linked to one another and communicate via messages. Independently deployable Are designed to be deployed independently Can be developed by a small team Are often developed by a single developer. Inter-service Communication Support all message patterns using Pulsar as the underlying message bus. Deployment & Composition Can run as individual threads, processes, or K8s pods. The Function Mesh allows you to deploy multiple Pulsar Functions as a single unit.
  • 40. Function Mesh Pulsar Functions, along with Pulsar IO/Connectors, provide a powerful API for ingesting, transforming, and outputting data. Function Mesh, another StreamNative project, makes it easier for developers to create entire applications built from sources, functions, and sinks all through a declarative API.
  • 42. ● Apache Pulsar’s two-tier architecture separates the compute and storage layers, and interact with one another over a TCP/IP connection. This allows us to run the computing layer (Broker) on either Edge servers or IoT Gateway devices. ● Pulsar’s serverless computing framework, know as Pulsar Functions, can run inside the Broker as threads. Effectively “stretching” the data processing layer. Edge Computing with Pulsar
  • 43. ● Pulsar’s Serverless computing framework can run inside the Pulsar Broker as a thread pool. This framework can be used as the execution environment for ML models. ● The Apache Pulsar Broker supports the MQTT protocol and therefore can directly receive incoming data from the sensor hubs and store it in a topic. Benefits of Running Pulsar Broker on the Edge
  • 44. ● You can leverage 3rd party libraries within Pulsar Functions ● DeepLearning4J ● JPMML ● DJL.AI ● Keras ● Pulsar Functions are able to support: ● A variety of ML model types. ● Models developed with different languages and toolkits Pulsar Function – Third Party Library Support
  • 45. Building Real-Time Apps Requires a Team
  • 46. https://www.influxdata.com/integration/mqtt-monitoring/ https://www.influxdata.com/integration/mqtt-monitoring/ • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Supports push and pull models • Hundreds of processors • Visual command and control • Over a 300 components • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering • Version Control Apache NiFi Basics
  • 47. Apache NiFi - Apache Pulsar Connector
  • 49. Apache NiFi - Apache Pulsar Connector
  • 50. Apache NiFi - Apache Pulsar Connector
  • 51. ● Unified computing engine ● Batch processing is a special case of stream processing ● Stateful processing ● Massive Scalability ● Flink SQL for queries, inserts against Pulsar Topics ● Streaming Analytics ● Continuous SQL ● Continuous ETL ● Complex Event Processing ● Standard SQL Powered by Apache Calcite Apache Flink
  • 52. Apache Flink Job Dashboard
  • 60. Apache Pulsar Training ● Instructor-led courses ○ Pulsar Fundamentals ○ Pulsar Developers ○ Pulsar Operations ● On-demand learning with labs ● 300+ engineers, admins and architects trained! Now Available On-Demand Pulsar Training Academy.StreamNative.io StreamNative Academy
  • 61. ● https://github.com/tspannhw/pulsar-pychat-function ● https://streamnative.io/apache-nifi-connector/ ● https://nightlies.apache.org/flink/flink-docs-master/docs/conne ctors/datastream/pulsar/ ● https://streamnative.io/en/blog/release/2021-04-20-flink-sql-o n-streamnative-cloud ● https://github.com/streamnative/flink-example ● https://pulsar.apache.org/docs/en/adaptors-spark/ Apache Pulsar Links
  • 62. ● https://github.com/tspannhw/FLiP-Pi-BreakoutGarden ● https://github.com/tspannhw/FLiP-Pi-Thermal ● https://github.com/tspannhw/FLiP-Pi-Weather ● https://github.com/tspannhw/FLiP-RP400 ● https://github.com/tspannhw/FLiP-Py-Pi-GasThermal ● https://github.com/tspannhw/FLiP-PY-FakeDataPulsar ● https://github.com/tspannhw/FLiP-Py-Pi-EnviroPlus ● https://github.com/tspannhw/PythonPulsarExamples ● https://github.com/tspannhw/pulsar-pychat-function ● https://github.com/tspannhw/FLiP-PulsarDevPython101 Apache Pulsar Examples
  • 63. Deploying AI With an Event-Driven Platform https://dzone.com/trendreports/enterprise-ai-1
  • 65. Apache Pulsar in Action http://tinyurl.com/bdha5p4r Please enjoy David’s complete book which is the ultimate guide to Pulsar.
  • 67. Scan the QR code to learn more about Apache Pulsar and StreamNative.
  • 68. Scan the QR code to build your own apps today.