Slides from the Meetup Monday March 7, 2016 just before the beginning of #GeodeSummit, where we cover an introduction of the technology and community that is Apache Geode, the in-memory data grid.
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement VMware Tanzu
This document provides an agenda for a hands-on introduction and hackathon kickoff for Apache Geode. The agenda includes details about the hackathon, an introduction to Apache Geode including its history and key features, a hands-on lab to build, run, and use Geode, and a Q&A session. It also outlines how to contribute to the Geode project through code, documentation, issue tracking, and mailing lists.
This document discusses implementing a Redis adaptor using Apache Geode. It provides an overview of Redis data structures and commands, describes how Geode partitioned regions and indexes can be used to store and access Redis data, outlines advantages like scalability and high availability, and presents a roadmap for further development including supporting additional commands and performance optimization.
Apache Geode is an open source in-memory data grid that provides data distribution, replication and high availability. It can be used for caching, messaging and interactive queries. The presentation discusses Geode concepts like cache, region and member. It provides examples of how large companies use Geode for applications requiring real-time response, high concurrency and global data visibility. Geode's performance comes from minimizing data copying and contention through flexible consistency and partitioning. The project is now hosted by Apache and the community is encouraged to get involved through mailing lists, code contributions and example applications.
Apache Geode Meetup, Cork, Ireland at CITApache Geode
This document provides an introduction to Apache Geode (incubating), including:
- A brief history of Geode and why it was developed
- An overview of key Geode concepts such as regions, caching, and functions
- Examples of interesting large-scale use cases from companies like Indian Railways
- A demonstration of using Geode with Apache Spark and Spring XD for a stock prediction application
- Information on how to get involved with the Geode open source project community
How to use the WAN Gateway feature of Apache Geode to implement multi-site and active-active failover, disaster recovery, and global scale applications.
Build your first Internet of Things app today with Open SourceApache Geode
This document provides an overview of Apache Geode, an in-memory data management platform. It discusses using Geode for high-performance and scalable applications that require fast access to critical datasets. Key concepts explained include regions, caching of data, and the use of functions to enable distributed processing across a Geode cluster. The document also mentions integrations with Spark and Cloud Foundry that allow persisting RDDs in Geode and exposing regions as RDDs.
1. The document discusses Project Geode, an open source distributed in-memory database for big data applications. It provides scale-out performance, consistent operations across nodes, high availability, powerful developer features, and easy administration of distributed nodes.
2. The document outlines Geode's architecture and roadmap. It also discusses why the project is being open sourced under Apache and describes some key use cases and customers of Geode.
3. The presentation includes a demo of Geode's capabilities including partitioning, queries, indexing, colocation, and transactions.
An Introduction to Apache Geode (incubating)Anthony Baker
Geode is a data management platform that provides real-time, consistent access to data-intensive applications throughout widely distributed cloud architectures.
Geode pools memory (along with CPU, network and optionally local disk) across multiple processes to manage application objects and behavior. It uses dynamic replication and data partitioning techniques for high availability, improved performance, scalability, and fault tolerance. Geode is both a distributed data container and an in-memory data management system providing reliable asynchronous event notifications and guaranteed message delivery.
Pivotal GemFire has had a long and winding journey, starting in 2002, winding through VMware, Pivotal, and finding it's way to Apache in 2015. Companies using GemFire have deployed it in some of the most mission critical latency sensitive applications in their enterprises, making sure tickets are purchased in a timely fashion, hotel rooms are booked, trades are made, and credit card transactions are cleared. This presentation discusses:
- A brief history of GemFire
- Architecture and use cases
- Why we are taking GemFire Open Source
- Design philosophy and principles
But most importantly: how you can join this exciting community to work on the bleeding edge in-memory platform.
In April 2015, Apache Geode (incubating) was born from Pivotal’s GemFire, the distributed in-memory database. However, the donation of over 1M LOC was just the beginning of the journey. In this talk we discuss how the GemFire engineering team has adapted their development infrastructure, processes, and culture to embrace the “Apache Way". We present lessons learned and best practices for new and incubating open source projects in areas of initial code submission, IP clearance, governance policies, code review, and community building. We discuss the challenges the team faced and how we changed internal communication and software design processes to a community-driven model. In particular, we highlight effective strategies for growing a project community and embracing new members. Finally, we show how changing to the open source model has increased both productivity and quality.
#GeodeSummit - Where Does Geode Fit in Modern System ArchitecturesPivotalOpenSourceHub
The document discusses how Apache Geode fits into modern system architectures using the Command Query Responsibility Segregation (CQRS) pattern. CQRS separates reads and writes so that each can be optimized independently. Geode is well-suited as the read store in a CQRS system due to its ability to efficiently handle queries and cache data through regions. The document provides references on CQRS and related patterns to help understand how they can be applied with Geode.
The document discusses security models in Apache Kafka. It describes the PLAINTEXT, SSL, SASL_PLAINTEXT and SASL_SSL security models, covering authentication, authorization, and encryption capabilities. It also provides tips on troubleshooting security issues, including enabling debug logs, and common errors seen with Kafka security.
Apache Geode is a distributed, memory-based data management platform that provides high performance, scalability, resiliency and continuous availability for data-oriented applications. It originated from Pivotal's open sourcing of Gemfire in 2015. Some key features of Geode include fast access to critical datasets, location-aware distributed data processing, and an event-driven data architecture. It has been used in many large-scale production systems and sees adoption rates increasing.
Highly available databases are essential to organizations depending on mission-critical, 24/7 access to data. Postgres is widely recognized as an excellent open-source database, with critical maturity and features that allow organizations to scale and achieve high availability.
This webinar will explore:
- Evolution of replication in Postgres
- Streaming replication
- Logical replication
- Replication for high availability
- Important high availability parameters
- Options to monitor high availability
- HA infrastructure to patch the database with minimal downtime
- EDB Postgres Failover Manager (EFM)
- EDB tools to create a highly available Postgres architecture
Apache Hive 3 introduces new capabilities for data analytics including materialized views, default columns, constraints, and improved JDBC and Kafka connectors to enable real-time streaming and integration with external systems like Druid; Hive 3 also improves performance and query optimization through a new query result cache, workload management, and cloud storage optimizations. Data Analytics Studio provides self-service analytics on top of Hive 3 through a visual interface to optimize queries, monitor performance, and manage data lifecycles.
Development of concurrent services using In-Memory Data Gridsjlorenzocima
As part of OTN Tour 2014 believes this presentation which is intented for covers the basic explanation of a solution of IMDG, explains how it works and how it can be used within an architecture and shows some use cases. Enjoy
This document provides an overview of GemFire, an in-memory data grid that pools memory across processes to manage application data and behavior. Some key points:
- GemFire allows distributed applications to achieve low-latency data access through an in-memory shared cache. It supports features like caching, querying, transactions, and event notifications.
- Data in GemFire is organized into regions, which allow data to be stored across multiple servers without regard to location. Region types include replicated, partitioned, and local.
- The CAP theorem states that only two of three properties - consistency, availability, and partition tolerance - can be achieved in a distributed system. GemFire aims to balance availability and partition tolerance.
This presentation will be useful to those
who would like to get acquainted with lifetime history
of successful monolithic Java application.
It shows architectural and technical evolution of one Java web startup that is beyond daily coding routine and contains a lot of simplifications, Captain Obvious and internet memes.
But this presentation is not intended for monolithic vs. micro services architectures comparison.
Running secured Spark job in Kubernetes compute cluster and integrating with ...DataWorks Summit
This presentation will provide technical design and development insights to run a secured Spark job in Kubernetes compute cluster that accesses job data from a Kerberized HDFS cluster. Joy will show how to run a long-running machine learning or ETL Spark job in Kubernetes and to access data from HDFS using Kerberos Principal and Delegation token.
The first part of this presentation will unleash the design and best practices to deploy and run Spark in Kubernetes integrated with HDFS that creates on-demand multi-node Spark cluster during job submission, installing/resolving software dependencies (packages), executing/monitoring the workload, and finally disposing the resources at the end of job completion. The second part of this presentation covers the design and development details to setup a Spark+Kubernetes cluster that supports long-running jobs accessing data from secured HDFS storage by creating and renewing Kerberos delegation tokens seamlessly from end-user's Kerberos Principal.
All the techniques covered in this presentation are essential in order to set up a Spark+Kubernetes compute cluster that accesses data securely from distributed storage cluster such as HDFS in a corporate environment. No prior knowledge of any of these technologies is required to attend this presentation.
Speaker
Joy Chakraborty, Data Architect
This document discusses database as a service and cloud computing. It introduces concepts like software as a service (SaaS), platform as a service (PaaS), and infrastructure as a service (IaaS). It also covers topics like virtualization, multi-tenancy, service level agreements, storage models, distributed storage, replication, and security in the context of database as a service. The document will be covering these topics in more depth throughout the seminar.
Deploying MariaDB databases with containers at Nokia NetworksMariaDB plc
Nokia is focused on providing software and products that facilitate rapid development, deployment and scaling of products and services to customers. The Common Software Foundation (CSF) within Nokia develops and supports product reuse by multiple applications within Nokia, including MariaDB. Their focus over the last year has been to develop a containerized MariaDB solution supporting multiple architectures, including both clustering and primary/secondary replication with MariaDB MaxScale. In this talk, Rick Lane discusses this journey of these containerized solutions from development to customer trials, including problems encountered and solutions.
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase Cloudera, Inc.
This document discusses high-throughput transactional stream processing on HBase using Continuuity Reactor. Continuuity Reactor is a platform built on Hadoop and HBase that allows collecting, processing, storing and querying data with ACID guarantees using a flow-based programming model. Flows in Reactor are composed of flowlets connected by queues, with all processing occurring within transactions for consistency. The document outlines how Reactor implements transactions using an optimistic concurrency control approach based on multi-version concurrency control and HBase timestamps. It also discusses queue design and optimizations for performance.
An Expert Guide to Migrating Legacy Databases to PostgreSQLEDB
his webinar will review the challenges teams face when migrating from Oracle databases to PostgreSQL. We will share insights gained from running large scale Oracle compatibility assessments over the last two years, including the over 2,200,000 Oracle DDL constructs that were assessed through EDB’s Migration Portal in 2020.
During this session we will address:
Storage definitions
Packages
Stored procedures
PL/SQL code
Proprietary database APIs
Large scale data migrations
We will end the session demonstrating migration tools that significantly simplify and aid in reducing the risk of migrating Oracle databases to PostgreSQL.
The document discusses eBay's cloud configuration management system (CMS). It provides an overview of eBay's scale and need for cloud technologies. It then describes the architecture and functionality of CMS, including its use of MongoDB for data storage. CMS uses a metadata-driven model and provides APIs and services for configuration persistence, querying, and management. The document also addresses some challenges in using MongoDB and how CMS resolves issues related to performance and scalability.
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...Yahoo Developer Network
Monte Zweben Co-Founder and CEO of Splice Machine, will discuss how to use HBase co-processors to build an ANSI-99 SQL database with 1) parallelization of SQL execution plans, 2) ACID transactions with snapshot isolation and 3) consistent secondary indexing.
Transactions are critical in traditional RDBMSs because they ensure reliable updates across multiple rows and tables. Most operational applications require transactions, but even analytics systems use transactions to reliably update secondary indexes after a record insert or update.
In the Hadoop ecosystem, HBase is a key-value store with real-time updates, but it does not have multi-row, multi-table transactions, secondary indexes or a robust query language like SQL. Combining SQL with a full transactional model over HBase opens a whole new set of OLTP and OLAP use cases for Hadoop that was traditionally reserved for RDBMSs like MySQL or Oracle. However, a transactional HBase system has the advantage of scaling out with commodity servers, leading to a 5x-10x cost savings over traditional databases like MySQL or Oracle.
HBase co-processors, introduced in release 0.92, provide a flexible and high-performance framework to extend HBase. In this talk, we show how we used HBase co-processors to support a full ANSI SQL RDBMS without modifying the core HBase source. We will discuss how endpoint transactions are used to serialize SQL execution plans over to regions so that computation is local to where the data is stored. Additionally, we will show how observer co-processors simultaneously support both transactions and secondary indexing.
The talk will also discuss how Splice Machine extended the work of Google Percolator, Yahoo Labs’ OMID, and the University of Waterloo on distributed snapshot isolation for transactions. Lastly, performance benchmarks will be provided, including full TPC-C and TPC-H results that show how Hadoop/HBase can be a replacement of traditional RDBMS solutions.
Apache Geode provides a database-like consistency model, reliable transaction processing and a shared-nothing architecture to maintain very low latency performance with high concurrency processing.
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...Christian Tzolov
When working with BigData & IoT systems we often feel the need for a Common Query Language. The system specific languages usually require longer adoption time and are harder to integrate within the existing stacks.
To fill this gap some NoSql vendors are building SQL access to their systems. Building SQL engine from scratch is a daunting job and frameworks like Apache Calcite can help you with the heavy lifting. Calcite allow you to integrate SQL parser, cost-based optimizer, and JDBC with your NoSql system.
We will walk through the process of building a SQL access layer for Apache Geode (In-Memory Data Grid). I will share my experience, pitfalls and technical consideration like balancing between the SQL/RDBMS semantics and the design choices and limitations of the data system.
Hopefully this will enable you to add SQL capabilities to your prefered NoSQL data system.
An Introduction to Apache Geode (incubating) - Geode is a data management platform that provides real-time, consistent access to data-intensive applications throughout widely distributed cloud architectures.
An Introduction to Apache Geode (incubating)Anthony Baker
Geode is a data management platform that provides real-time, consistent access to data-intensive applications throughout widely distributed cloud architectures.
Geode pools memory (along with CPU, network and optionally local disk) across multiple processes to manage application objects and behavior. It uses dynamic replication and data partitioning techniques for high availability, improved performance, scalability, and fault tolerance. Geode is both a distributed data container and an in-memory data management system providing reliable asynchronous event notifications and guaranteed message delivery.
Pivotal GemFire has had a long and winding journey, starting in 2002, winding through VMware, Pivotal, and finding it's way to Apache in 2015. Companies using GemFire have deployed it in some of the most mission critical latency sensitive applications in their enterprises, making sure tickets are purchased in a timely fashion, hotel rooms are booked, trades are made, and credit card transactions are cleared. This presentation discusses:
- A brief history of GemFire
- Architecture and use cases
- Why we are taking GemFire Open Source
- Design philosophy and principles
But most importantly: how you can join this exciting community to work on the bleeding edge in-memory platform.
In April 2015, Apache Geode (incubating) was born from Pivotal’s GemFire, the distributed in-memory database. However, the donation of over 1M LOC was just the beginning of the journey. In this talk we discuss how the GemFire engineering team has adapted their development infrastructure, processes, and culture to embrace the “Apache Way". We present lessons learned and best practices for new and incubating open source projects in areas of initial code submission, IP clearance, governance policies, code review, and community building. We discuss the challenges the team faced and how we changed internal communication and software design processes to a community-driven model. In particular, we highlight effective strategies for growing a project community and embracing new members. Finally, we show how changing to the open source model has increased both productivity and quality.
#GeodeSummit - Where Does Geode Fit in Modern System ArchitecturesPivotalOpenSourceHub
The document discusses how Apache Geode fits into modern system architectures using the Command Query Responsibility Segregation (CQRS) pattern. CQRS separates reads and writes so that each can be optimized independently. Geode is well-suited as the read store in a CQRS system due to its ability to efficiently handle queries and cache data through regions. The document provides references on CQRS and related patterns to help understand how they can be applied with Geode.
The document discusses security models in Apache Kafka. It describes the PLAINTEXT, SSL, SASL_PLAINTEXT and SASL_SSL security models, covering authentication, authorization, and encryption capabilities. It also provides tips on troubleshooting security issues, including enabling debug logs, and common errors seen with Kafka security.
Apache Geode is a distributed, memory-based data management platform that provides high performance, scalability, resiliency and continuous availability for data-oriented applications. It originated from Pivotal's open sourcing of Gemfire in 2015. Some key features of Geode include fast access to critical datasets, location-aware distributed data processing, and an event-driven data architecture. It has been used in many large-scale production systems and sees adoption rates increasing.
Highly available databases are essential to organizations depending on mission-critical, 24/7 access to data. Postgres is widely recognized as an excellent open-source database, with critical maturity and features that allow organizations to scale and achieve high availability.
This webinar will explore:
- Evolution of replication in Postgres
- Streaming replication
- Logical replication
- Replication for high availability
- Important high availability parameters
- Options to monitor high availability
- HA infrastructure to patch the database with minimal downtime
- EDB Postgres Failover Manager (EFM)
- EDB tools to create a highly available Postgres architecture
Apache Hive 3 introduces new capabilities for data analytics including materialized views, default columns, constraints, and improved JDBC and Kafka connectors to enable real-time streaming and integration with external systems like Druid; Hive 3 also improves performance and query optimization through a new query result cache, workload management, and cloud storage optimizations. Data Analytics Studio provides self-service analytics on top of Hive 3 through a visual interface to optimize queries, monitor performance, and manage data lifecycles.
Development of concurrent services using In-Memory Data Gridsjlorenzocima
As part of OTN Tour 2014 believes this presentation which is intented for covers the basic explanation of a solution of IMDG, explains how it works and how it can be used within an architecture and shows some use cases. Enjoy
This document provides an overview of GemFire, an in-memory data grid that pools memory across processes to manage application data and behavior. Some key points:
- GemFire allows distributed applications to achieve low-latency data access through an in-memory shared cache. It supports features like caching, querying, transactions, and event notifications.
- Data in GemFire is organized into regions, which allow data to be stored across multiple servers without regard to location. Region types include replicated, partitioned, and local.
- The CAP theorem states that only two of three properties - consistency, availability, and partition tolerance - can be achieved in a distributed system. GemFire aims to balance availability and partition tolerance.
This presentation will be useful to those
who would like to get acquainted with lifetime history
of successful monolithic Java application.
It shows architectural and technical evolution of one Java web startup that is beyond daily coding routine and contains a lot of simplifications, Captain Obvious and internet memes.
But this presentation is not intended for monolithic vs. micro services architectures comparison.
Running secured Spark job in Kubernetes compute cluster and integrating with ...DataWorks Summit
This presentation will provide technical design and development insights to run a secured Spark job in Kubernetes compute cluster that accesses job data from a Kerberized HDFS cluster. Joy will show how to run a long-running machine learning or ETL Spark job in Kubernetes and to access data from HDFS using Kerberos Principal and Delegation token.
The first part of this presentation will unleash the design and best practices to deploy and run Spark in Kubernetes integrated with HDFS that creates on-demand multi-node Spark cluster during job submission, installing/resolving software dependencies (packages), executing/monitoring the workload, and finally disposing the resources at the end of job completion. The second part of this presentation covers the design and development details to setup a Spark+Kubernetes cluster that supports long-running jobs accessing data from secured HDFS storage by creating and renewing Kerberos delegation tokens seamlessly from end-user's Kerberos Principal.
All the techniques covered in this presentation are essential in order to set up a Spark+Kubernetes compute cluster that accesses data securely from distributed storage cluster such as HDFS in a corporate environment. No prior knowledge of any of these technologies is required to attend this presentation.
Speaker
Joy Chakraborty, Data Architect
This document discusses database as a service and cloud computing. It introduces concepts like software as a service (SaaS), platform as a service (PaaS), and infrastructure as a service (IaaS). It also covers topics like virtualization, multi-tenancy, service level agreements, storage models, distributed storage, replication, and security in the context of database as a service. The document will be covering these topics in more depth throughout the seminar.
Deploying MariaDB databases with containers at Nokia NetworksMariaDB plc
Nokia is focused on providing software and products that facilitate rapid development, deployment and scaling of products and services to customers. The Common Software Foundation (CSF) within Nokia develops and supports product reuse by multiple applications within Nokia, including MariaDB. Their focus over the last year has been to develop a containerized MariaDB solution supporting multiple architectures, including both clustering and primary/secondary replication with MariaDB MaxScale. In this talk, Rick Lane discusses this journey of these containerized solutions from development to customer trials, including problems encountered and solutions.
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase Cloudera, Inc.
This document discusses high-throughput transactional stream processing on HBase using Continuuity Reactor. Continuuity Reactor is a platform built on Hadoop and HBase that allows collecting, processing, storing and querying data with ACID guarantees using a flow-based programming model. Flows in Reactor are composed of flowlets connected by queues, with all processing occurring within transactions for consistency. The document outlines how Reactor implements transactions using an optimistic concurrency control approach based on multi-version concurrency control and HBase timestamps. It also discusses queue design and optimizations for performance.
An Expert Guide to Migrating Legacy Databases to PostgreSQLEDB
his webinar will review the challenges teams face when migrating from Oracle databases to PostgreSQL. We will share insights gained from running large scale Oracle compatibility assessments over the last two years, including the over 2,200,000 Oracle DDL constructs that were assessed through EDB’s Migration Portal in 2020.
During this session we will address:
Storage definitions
Packages
Stored procedures
PL/SQL code
Proprietary database APIs
Large scale data migrations
We will end the session demonstrating migration tools that significantly simplify and aid in reducing the risk of migrating Oracle databases to PostgreSQL.
The document discusses eBay's cloud configuration management system (CMS). It provides an overview of eBay's scale and need for cloud technologies. It then describes the architecture and functionality of CMS, including its use of MongoDB for data storage. CMS uses a metadata-driven model and provides APIs and services for configuration persistence, querying, and management. The document also addresses some challenges in using MongoDB and how CMS resolves issues related to performance and scalability.
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...Yahoo Developer Network
Monte Zweben Co-Founder and CEO of Splice Machine, will discuss how to use HBase co-processors to build an ANSI-99 SQL database with 1) parallelization of SQL execution plans, 2) ACID transactions with snapshot isolation and 3) consistent secondary indexing.
Transactions are critical in traditional RDBMSs because they ensure reliable updates across multiple rows and tables. Most operational applications require transactions, but even analytics systems use transactions to reliably update secondary indexes after a record insert or update.
In the Hadoop ecosystem, HBase is a key-value store with real-time updates, but it does not have multi-row, multi-table transactions, secondary indexes or a robust query language like SQL. Combining SQL with a full transactional model over HBase opens a whole new set of OLTP and OLAP use cases for Hadoop that was traditionally reserved for RDBMSs like MySQL or Oracle. However, a transactional HBase system has the advantage of scaling out with commodity servers, leading to a 5x-10x cost savings over traditional databases like MySQL or Oracle.
HBase co-processors, introduced in release 0.92, provide a flexible and high-performance framework to extend HBase. In this talk, we show how we used HBase co-processors to support a full ANSI SQL RDBMS without modifying the core HBase source. We will discuss how endpoint transactions are used to serialize SQL execution plans over to regions so that computation is local to where the data is stored. Additionally, we will show how observer co-processors simultaneously support both transactions and secondary indexing.
The talk will also discuss how Splice Machine extended the work of Google Percolator, Yahoo Labs’ OMID, and the University of Waterloo on distributed snapshot isolation for transactions. Lastly, performance benchmarks will be provided, including full TPC-C and TPC-H results that show how Hadoop/HBase can be a replacement of traditional RDBMS solutions.
Apache Geode provides a database-like consistency model, reliable transaction processing and a shared-nothing architecture to maintain very low latency performance with high concurrency processing.
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...Christian Tzolov
When working with BigData & IoT systems we often feel the need for a Common Query Language. The system specific languages usually require longer adoption time and are harder to integrate within the existing stacks.
To fill this gap some NoSql vendors are building SQL access to their systems. Building SQL engine from scratch is a daunting job and frameworks like Apache Calcite can help you with the heavy lifting. Calcite allow you to integrate SQL parser, cost-based optimizer, and JDBC with your NoSql system.
We will walk through the process of building a SQL access layer for Apache Geode (In-Memory Data Grid). I will share my experience, pitfalls and technical consideration like balancing between the SQL/RDBMS semantics and the design choices and limitations of the data system.
Hopefully this will enable you to add SQL capabilities to your prefered NoSQL data system.
An Introduction to Apache Geode (incubating) - Geode is a data management platform that provides real-time, consistent access to data-intensive applications throughout widely distributed cloud architectures.
Introducing Apache Geode and Spring Data GemFireJohn Blum
This document introduces Apache Geode, an open source distributed in-memory data management platform. It discusses what Geode is, how it is implemented, and some key features like high availability, scalability and low latency. It also introduces Spring Data GemFire, which simplifies using Geode with Spring applications through features like repositories and caching. Finally, it outlines the project roadmap and opportunities to get involved in the Geode community.
The document discusses the evolution of Pivotal Gemfire, now known as Apache Geode, from a proprietary product to an open source project. It provides an overview of Gemfire/Geode's capabilities including elastic scalability, high performance, and flexibility for developers. It also outlines Geode's role as a potential in-memory data exchange layer and integration point across modern data infrastructure technologies. Key aspects of Geode like its PDX serialization and asynchronous events are highlighted as building blocks that position it well for this role.
Here are the slides for Greenplum Chat #8. You can view the replay here: https://www.youtube.com/watch?v=FKFiyJDgdQk
The increased frequency and sophistication of high-profile data breaches and malicious hacking is putting organizations at continued risk of data theft and significant business disruption. Complicating this scenario is the unbounded growth of Big Data and petabyte-scale data storage, new open source database and distribution schemes, and the continued adoption of cloud services by enterprises.
Pivotal Greenplum customers often look for additional encryption of data-at-rest and data-in-motion. The massively parallel processing (MPP) architecture of Pivotal Greenplum provides an architecture that is unlike traditional OLAP on RDBMS for data warehousing, and encryption capabilities must address the scale-out architecture.
The Zettaset Big Data Encryption Suite has been designed for optimal performance and scalability in distributed Big Data systems like Greenplum Database and Apache HAWQ.
Here is a replay of our recent Greenplum Chat with Zettaset:
00:59 What is Greenplum’s approach for encryption and why Zettaset?
02:17 Results of field testing Zettaset with Greenplum
03:50 Introduction to Zettaset, the security company
05:36 Overview of Zettaset and their solutions
14:51 Different layers for encrypting data at rest
16:50 Encryption key management for big data
20:51 Zettaset BD Encrypt for data at rest and data in motion
22:19 How to mitigate encryption overhead with an MPP scale-out system
24:12 How to deploy BD Encrypt
25:50 Deep dive on data at rest encryption
30:44 Deep dive on data in motion encryption
36:72 Q: How does Zettaset deal with encrypting Greenplums multiple interfaces?
38:08 Q: Can I encrypt data for a particular column?
40:26 How Zettaset fits into a security strategy
41:21 Q: What is the performance impact on queries by encrypting the entire database?
43:28 How Zettaset helps Greenplum meet IT compliance requirements
45:12 Q: How authentication for keys is obtained
48:50 Q: How can Greenplum users try out Zettaset?
50:53 Q: What is a ‘Zettaset Security Coach’?
The document introduces the JBoss Community, which was started in 1999 by Marc Fluery with a focus on middleware. It grew popular as the EJB container when Java grew and now has over 100 open source projects focused on Java standards and middleware development. Red Hat acquired JBoss Community in 2006 and supports enterprise middleware subscriptions. The community's principles are standards-driven innovation and rapid technology adoption seen in technologies like parallel loading in JBoss 7. Major projects include Hibernate and Drools.
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347Manik Surtani
Manik Surtani is the founder and project lead of Infinispan, an open source data grid platform. He discussed data grids, NoSQL, and their role in cloud storage. Data grids evolved from distributed caches to provide features like querying, task execution, and co-location control. NoSQL systems are alternative data storage that is scalable and distributed but lacks relational structure. JSR 347 aims to standardize data grid APIs for the Java platform. Infinispan implements JSR 107 and will support JSR 347, acting as the reference backend for Hibernate OGM.
Building Wall St Risk Systems with Apache GeodeAndre Langevin
In this talk from the 2016 Apache Geode Summit, I discuss how Geode forms the core of many Wall Street derivative risk solutions. By externalizing risk from trading systems, Geode-based solutions provide cross-product risk management at speeds suitable for automated hedging, while simultaneously eliminating the back office costs associated with traditional trading system based solutions.
Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- lev...Christian Tzolov
Slides from ApacheCon BigData 2015 HAWQ/GEODE talk: http://sched.co/3zut
In the space of Big Data, two powerful data processing tools compliment each other. Namely HAWQ and Geode. HAWQ is a scalable OLAP SQL-on-Hadoop system, while Geode is OLTP like, in-memory data grid and event processing system. This presentation will show different integration approaches that allow integration and data exchange between HAWQ and Geode. Presentation will walking you through the implementation of the different Integration strategies demonstrating the power of combining various OSS technologies for processing bit and fast data. Presentation will touch upon OSS technologies like HAWQ, Geode, SpringXD, Hadoop and Spring Boot.
Infinispan Servers: Beyond peer-to-peer data gridsGalder Zamarreño
In this session, Infinispan developer Galder Zamarreño will:
- Provide a brief introduction to peer-to-peer and client/server architectures.
- Describe the benefits of using Infinispan in a client/server mode, particularly in cloud-style environments.
- Introduce the audience to Infinispan’s selection of server modules that provide varied access methods: REST and WebSocket for HTTP access, Memcached protocol access and Hot Rod, Infinispan’s very own highly efficient binary protocol which supports smart-clients.
- Demonstrate an Infinispan client/server example showing how geographically separated Infinispan data grids could be linked together via Hot Rod client/server modules in order to provide different disaster recovery strategies.
Infinspan: In-memory data grid meets NoSQLManik Surtani
The document discusses Infinispan, an open source in-memory data grid. It describes Infinispan's features like its map-like API, persistence, querying support and transactional capabilities. It also discusses how Infinispan can be used to build elastic, scalable data services and compares data grids to NoSQL databases.
Keeping Infinispan In Shape: Highly-Precise, Scalable Data EvictionGalder Zamarreño
Java Collections Framework represents one of the key building blocks of any Java application. Although the standard JDK devoted a lot of attention to developing a coherent and easy to use collections framework one important aspect remains overlooked – collection element eviction. Collection memory footprint can not grow indefinitely because we would eventually run out of memory; we either have to remove elements from a collection or somehow periodically evict certain elements according to a chosen eviction algorithm. Since day one eviction has been a key feature of Infinispan, and in the latest 4.1 release thanks to event update batching, Infinispan has reduced the eviction overhead to such an extent that it hardly affects application performance. On top of that, Infinispan implements LIRS, a more precise eviction algorithm compared to the traditional LRU, making it the first open source project to implement this revolutionary algorithm in the data grid space. In this session, Galder and Vladimir will present to the details behind these changes, performance measurements and third-party use case testimonies.
This document discusses in-memory data grids and JBoss Infinispan. It begins with an overview of in-memory data grids, their uses for caching, performance boosting, scalability, and high availability. It then discusses Infinispan specifically, describing it as an open-source, distributed in-memory key-value data grid and cache. The document outlines Infinispan's architecture, features like persistence, transactions, querying, distributed execution, and map-reduce capabilities. It also provides a case study on using Infinispan for session clustering in a web application.
Infinispan is an in-memory data grid that provides a distributed key-value store. It allows for data replication across nodes for high availability and partitions data using consistent hashing to enable horizontal scalability. Infinispan supports transactions, caching, querying and more. It can be configured programmatically or via XML and integrates with various Java technologies like JPA, CDI and Spring.
The document summarizes the journey of HAWQ and MADlib from being proprietary Pivotal technologies to becoming Apache open source projects. It provides an overview of HAWQ, including its key features like SQL compliance, performance advantages over other SQL-on-Hadoop systems, and flexible deployment options. It also summarizes MADlib, describing its machine learning functions and advantages of scalable in-database machine learning. Both projects are now available on open source platforms like Hadoop and aim to advance SQL and machine learning on big data through open collaboration.
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
Speakers: Ravi Dubey, Senior Manager, Software Engineering, Capital One + Jeff Sharpe, Software Engineer, Capital One
Capital One supports interactions with real-time streaming transactional data using Apache Kafka®. Kafka helps deliver information to internal operation teams and bank tellers to assist with assessing risk and protect customers in a myriad of ways.
Inside the bank, Kafka allows Capital One to build a real-time system that takes advantage of modern data and cloud technologies without exposing customers to unnecessary data breaches, or violating privacy regulations. These examples demonstrate how a streaming platform enables Capital One to act on their visions faster and in a more scalable way through the Kafka solution, helping establish Capital One as an innovator in the banking space.
Join us for this online talk on lessons learned, best practices and technical patterns of Capital One’s deployment of Apache Kafka.
-Find out how Kafka delivers on a 5-second service-level agreement (SLA) for inside branch tellers.
-Learn how to combine and host data in-memory and prevent personally identifiable information (PII) violations of in-flight transactions.
-Understand how Capital One manages Kafka Docker containers using Kubernetes.
Watch the recording: https://videos.confluent.io/watch/6e6ukQNnmASwkf9Gkdhh69?.
This document provides an overview of best practices and lessons learned for deploying XenDesktop in an enterprise environment. It discusses the scalability of various components in the XenDesktop architecture including the Web Interface, XenDesktop Controllers, SQL database, hypervisors, Provisioning Server, storage, and virtual desktop operating systems. Key recommendations include load balancing critical services, properly sizing SQL and storage infrastructure to handle load, and testing to determine optimal virtual desktop density based on workload.
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
Spark SQL is a highly scalable and efficient relational processing engine with ease-to-use APIs and mid-query fault tolerance. It is a core module of Apache Spark. Spark SQL can process, integrate and analyze the data from diverse data sources (e.g., Hive, Cassandra, Kafka and Oracle) and file formats (e.g., Parquet, ORC, CSV, and JSON). This talk will dive into the technical details of SparkSQL spanning the entire lifecycle of a query execution. The audience will get a deeper understanding of Spark SQL and understand how to tune Spark SQL performance.
The document discusses various concepts related to Apache Geode including clients, functions, serialization, transactions, and data colocation. It describes how clients can maintain local caches, run queries, and register for notifications. The document also covers how functions allow for distributed concurrent processing by pushing code to servers and executing across the distributed system.
MySQL is commonly used as the default database in OpenStack. It provides high availability through options like Galera and MySQL Group Replication. Galera is a third party active/active cluster that provides synchronous replication, while Group Replication is a native MySQL plugin that also enables active/active clusters with built-in conflict detection. MySQL NDB Cluster is an alternative that provides in-memory data storage with automatic sharding and strong consistency across shards. Both Galera/Group Replication and NDB Cluster can be used to implement highly available MySQL services in OpenStack environments.
This document provides an overview of Kea DHCP, an open source DHCP server. It discusses Kea's modular design, configuration options at different levels, database backend support for information storage, high availability capabilities including load balancing and hot standby, and extension points through hook libraries. The webinar aims to introduce administrators to Kea's features and capabilities.
Patterns and Pains of Migrating Legacy Applications to KubernetesJosef Adersberger
Running applications on Kubernetes can provide a lot of benefits: more dev speed, lower ops costs, and a higher elasticity & resiliency in production. Kubernetes is the place to be for cloud native apps. But what to do if you’ve no shiny new cloud native apps but a whole bunch of JEE legacy systems? No chance to leverage the advantages of Kubernetes? Yes you can!
We’re facing the challenge of migrating hundreds of JEE legacy applications of a German blue chip company onto a Kubernetes cluster within one year.
The talk will be about the lessons we've learned - the best practices and pitfalls we've discovered along our way.
Patterns and Pains of Migrating Legacy Applications to KubernetesQAware GmbH
Open Source Summit 2018, Vancouver (Canada): Talk by Josef Adersberger (@adersberger, CTO at QAware), Michael Frank (Software Architect at QAware) and Robert Bichler (IT Project Manager at Allianz Germany)
Abstract:
Running applications on Kubernetes can provide a lot of benefits: more dev speed, lower ops costs and a higher elasticity & resiliency in production. Kubernetes is the place to be for cloud-native apps. But what to do if you’ve no shiny new cloud-native apps but a whole bunch of JEE legacy systems? No chance to leverage the advantages of Kubernetes? Yes you can!
We’re facing the challenge of migrating hundreds of JEE legacy applications of a German blue chip company onto a Kubernetes cluster within one year.
The talk will be about the lessons we've learned - the best practices and pitfalls we've discovered along our way.
OpenStack Days East -- MySQL Options in OpenStackMatt Lord
In most production OpenStack installations, you want the backing metadata store to be highly available. For this, the de facto standard has become MySQL+Galera. In order to help you meet this basic use case even better, I will introduce you to the brand new native MySQL HA solution called MySQL Group Replication. This allows you to easily go from a single instance of MySQL to a MySQL service that's natively distributed and highly available, while eliminating the need for any third party library and implementations.
If you have an extremely large OpenStack installation in production, then you are likely to eventually run into write scaling issues and the metadata store itself can become a bottleneck. For this use case, MySQL NDB Cluster can allow you to linearly scale the metadata store as your needs grow. I will introduce you to the core features of MySQL NDB Cluster--which include in-memory OLTP, transparent sharding, and support for active/active multi-datacenter clusters--that will allow you to meet even the most demanding of use cases with ease.
Database failover from client perspectivePriit Piipuu
In this presentation we will look deep into high availability technologies Oracle RAC provides for database clients, what actually happens during database instance failover or planned maintenance and how to configure database services so that Java applications experience no or minimal disruption during planned maintenance or unplanned downtime. This presentation will mainly focus on JDBC and UCP clients.
Lyft's streaming platform uses Apache Flink for stream processing and Apache Kafka for messaging. Flink was chosen for its capabilities around state management, exactly-once processing, and flexible APIs. Kafka was chosen for its durability, low latency, and consumer fanout. However, open problems remain around rescaling Kafka while preserving per-key ordering, enabling dynamic stream computations, long-term event storage, and zero downtime deployments. Lyft is working to solve these challenges as it builds out its next generation streaming platform.
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...Josef Adersberger
Running applications on Kubernetes can provide a lot of benefits: more dev speed, lower ops costs, and a higher elasticity & resiliency in production. Kubernetes is the place to be for cloud native apps. But what to do if you’ve no shiny new cloud native apps but a whole bunch of JEE legacy systems? No chance to leverage the advantages of Kubernetes? Yes you can!
We’re facing the challenge of migrating hundreds of JEE legacy applications of a major German insurance company onto a Kubernetes cluster within one year. We're now close to the finish line and it worked pretty well so far.
The talk will be about the lessons we've learned - the best practices and pitfalls we've discovered along our way. We'll provide our answers to life, the universe and a cloud native journey like:
- What technical constraints of Kubernetes can be obstacles for applications and how to tackle these?
- How to architect a landscape of hundreds of containerized applications with their surrounding infrastructure like DBs MQs and IAM and heavy requirements on security?
- How to industrialize and govern the migration process?
- How to leverage the possibilities of a cloud native platform like Kubernetes without challenging the tight timeline?
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...QAware GmbH
CloudNativeCon North America 2017, Austin (Texas, USA): Talk by Josef Adersberger (@adersberger, CTO at QAware)
Abstract:
Running applications on Kubernetes can provide a lot of benefits: more dev speed, lower ops costs, and a higher elasticity & resiliency in production. Kubernetes is the place to be for cloud native apps. But what to do if you’ve no shiny new cloud native apps but a whole bunch of JEE legacy systems? No chance to leverage the advantages of Kubernetes? Yes you can!
We’re facing the challenge of migrating hundreds of JEE legacy applications of a major German insurance company onto a Kubernetes cluster within one year. We're now close to the finish line and it worked pretty well so far.
The talk will be about the lessons we've learned - the best practices and pitfalls we've discovered along our way. We'll provide our answers to life, the universe and a cloud native journey like:
- What technical constraints of Kubernetes can be obstacles for applications and how to tackle these?
- How to architect a landscape of hundreds of containerized applications with their surrounding infrastructure like DBs MQs and IAM and heavy requirements on security?
- How to industrialize and govern the migration process?
- How to leverage the possibilities of a cloud native platform like Kubernetes without challenging the tight timeline?
Introducing new features in Confluent Platform 5.4 and Apache Kafka 2.4...
CP 5.4 (based on AK 2.4)
Security:
Role-Based Access Control (RBAC)
Structured Audit Logs
Resilience:
Multi-Region Clusters (MRC)
Data Compatibility:
Server-side Schema Validation
Management & Monitoring:
Control Center enhancements
RBAC management
Replicator monitoring
Performance & Elasticity:
Tiered Storage (preview)
Stream Processing:
New ksqlDB features like Pull Queries and Kafka Connect Integration (preview)
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB
Discover how to avoid common pitfalls when shifting to an event-driven architecture (EDA) in order to boost system recovery and scalability. We cover Kafka Schema Registry, in-broker transformations, event sourcing, and more.
Announcing the next-generation dA Platform 2, which includes open source Apache Flink and the new Application Manager. dA Platform 2 makes it easier than ever to operationalize your Flink-powered stream processing applications in production.
Google and Intel speak on NFV and SFC service delivery
The slides are as presented at the meet up "Out of Box Network Developers" sponsored by Intel Networking Developer Zone
Here is the Agenda of the slides:
How DPDK, RDT and gRPC fit into SDI/SDN, NFV and OpenStack
Key Platform Requirements for SDI
SDI Platform Ingredients: DPDK, IntelⓇRDT
gRPC Service Framework
IntelⓇ RDT and gRPC service framework
Hear are the details of the new security framework for Apache Geode, based on Apache Shiro. Watch the video at: https://youtu.be/AhUPT3wfAMM
#GeodeSummit: Easy Ways to Become a Contributor to Apache GeodePivotalOpenSourceHub
The document provides steps for becoming a contributor to the Apache Geode project, beginning with joining online conversations about the project, then test-driving it by building and running examples, and finally improving the project by reporting findings, fixing bugs, or adding new features through submitting code. The key steps are to join mailing lists or chat forums to participate in discussions, quickly get started with the project by building and testing examples in 5 minutes, and then test release candidates and report any issues found on the project's issue tracker or documentation pages. Contributions to the codebase are also welcomed by forking the GitHub repository and submitting pull requests with bug fixes or new features.
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"PivotalOpenSourceHub
Keynote at Geode Summit 2016 by Dr. Justin Erenkrantz, Bloolmberg LP. Creating the Future of Big Data Through "The Apache Way" and why this matters to the community
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...PivotalOpenSourceHub
This document discusses combining stream processing and in-memory data grids for near-real-time aggregation and notifications. It describes storing immutable event data and filtering and aggregating events in real-time based on requested perspectives. Perspectives can be requested at any time for historical or real-time event data. The solution aims to be scalable, resilient, and low latency using Apache Storm for stream processing, Apache Geode for the event log and storage, and deployment patterns to collocate them for better performance.
In this session we review the design of the newly released off heap storage feature in Apache Geode, and discuss use cases and potential direction for additional capabilities of this feature.
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & GeodePivotalOpenSourceHub
In this session we review the design of the current state of support for Apache Geode by Spring Cloud Data Flow, and explore additional use cases and future direction that Spring Cloud Data Flow and Apache Geode might evolve.
In this session we review the design of the current capabilities of the Spring Data GemFire API that supports Geode, and explore additional use cases and future direction that the Spring API and underlying Geode support might evolve.
#GeodeSummit - Modern manufacturing powered by Spring XD and GeodePivotalOpenSourceHub
This document summarizes a presentation about how TEKsystems Global Services helps modern manufacturing industries address challenges through big data solutions. It outlines TEKsystems' services and capabilities, as well as real-world applications for manufacturing, financial services, and life sciences. The presentation describes reference architectures and customer success stories in marine seismic data and gaming industries. It positions TEKsystems as having expertise, proven track records, and packaged offerings to provide big data solutions from pilot to production.
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...PivotalOpenSourceHub
One of the largest retailers in North America are considering Apache Geode for their new mobile loyalty application, to support their digital transformation effort. They would use Geode to provide operational data services for their mobile cloud service. This retailer needs to replace sluggish response times with sub-second response which will improved conversion rates. They also want to able to close the loop between data science findings and app experience. This way the right customer interaction is suggested when it is needed such as when customers are looking at their mobile app while walking in the store, or sending notifications at the individuals most likely shopping times. The final benefits of using Geode will include faster development cycles, increased customer loyalty, and higher revenue.
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...PivotalOpenSourceHub
In this session we explore a case study of a large-scale government fraud detection program that prevents billions of dollars in fraudulent payments each year leveraging the beta release of the GemFire+Greenplum Connector, which is planned for release in GemFire 9. Topics will include an overview of the system architecture and a review of the new GemFire+Greenplum Connector features that simplify use cases requiring a blend of massively parallel database capabilities and accelerated in-memory data processing.
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)PivotalOpenSourceHub
Today, if events change the decision model, we wait until the next batch model build for new insights. By extending fast “time-to-decisions” into the world of Big Data Analytics to get fast “time-to-insights”, apps will get what used to be batch insights in near real time. The technology enabling this includes smart in-memory data storage, new storage class memory, and products designed to do one or more parts of an analysis pipeline very well. In this talk we describe how Ampool is building on Apache Geode to allow Big Data analysis solutions to work together with a scalable smart storage class memory layer to allow fast and complex end-to-end pipelines to be built -- closing the loop and providing dramatically lower time to critical insights.
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...PivotalOpenSourceHub
This talk introduces an open-source solution that integrates cloud native apps running on Cloud Foundry with an open-source hybrid transactions + analytics real-time solution. The architecture is based on the fastest scalable, highly available and fully consistent In-Memory Data Grid (Apache Geode / GemFire), natively integrated to the first open-source massive parallel data warehouse (Greenplum Database) in a hybrid transactional and analytical architecture that is extremely fast, horizontally scalable, highly resilient and open source. This session also features a live demo running on Cloud Foundry, showing a real case of real-time closed-loop analytics and machine learning using the featured solution.
Apache Apex and Apache Geode are two of the most promising incubating open source projects. Combined, they promise to fill gaps of existing big data analytics platforms. Apache Apex is an enterprise grade native YARN big data-in-motion platform that unifies stream and batch processing. Apex is highly scalable, performant, fault tolerant, and strong in operability. Apache Geode provides a database-like consistency model, reliable transaction processing and a shared-nothing architecture to maintain very low latency performance with high concurrency processing. We will also look at some use cases where how these two projects can be used together to form distributed, fault tolerant, reliable in memory data processing layer.
How Southwest Airlines Uses Geode
Distributed systems and fast data require new software patterns and implementation skills. Learn how Southwest Airlines uses Apache Geode, organizes team responsibilities, and approaches design tradeoffs. Drawing inspiration from real whiteboard conversations, we’ll explore: common development pitfalls, environment capacity planning, streaming data patterns like consumer checkpointing, support roles, and production lessons learned.
Every day, Apache Geode improves how Southwest Airlines schedules nearly 4,000 flights and serves over 500,000 passengers. It’s an essential component of Southwest’s ability to reduce flight delays and support future growth.
#GeodeSummit - Wall St. Derivative Risk Solutions Using GeodePivotalOpenSourceHub
In this talk, Andre Langevin discusses how Geode forms the core of many Wall Street derivative risk solutions. By externalizing risk from trading systems, Geode-based solutions provide cross-product risk management at speeds suitable for automated hedging, while simultaneously eliminating the back office costs associated with traditional trading system based solutions.
GPORCA is newly open source advanced query optimizer that is a subproject of Greenplum Database open source project. GPORCA is the query optimizer used in commercial distributions of both Greenplum and HAWQ. In these distributions GPORCA has achieved 1000x performance improvement across TPC-DS queries by focusing on three distinct areas: Dynamic Partition Elimination, SubQuery Unnesting, and Common Table Expression.
Now that GPORCA is open source, we are looking for collaborators to help us realize the ultimate dream for GPORCA - to work with any database.
The new breed of data management systems in Big Data have to process so much data that optimization mistakes are magnified in traditional optimizers. Furthermore, coding and manual optimization of complex queries has proven to be hard.
In this session, Venkatesh will discuss:
- Overview of GPORCA
- How to add GPORCA to HAWQ with a build option
- How GPORCA could be made to work with any database
- Future vision for GPORCA and more immediate plans
- How to work with GPORCA, and how to contribute to GPORCA
Pivoting Spring XD to Spring Cloud Data Flow with Sabby AnandanPivotalOpenSourceHub
Pivoting Spring XD to Spring Cloud Data Flow: A microservice based architecture for stream processing
Microservice based architectures are not just for distributed web applications! They are also a powerful approach for creating distributed stream processing applications. Spring Cloud Data Flow enables you to create and orchestrate standalone executable applications that communicate over messaging middleware such as Kafka and RabbitMQ that when run together, form a distributed stream processing application. This allows you to scale, version and operationalize stream processing applications following microservice based patterns and practices on a variety of runtime platforms such as Cloud Foundry, Apache YARN and others.
About Sabby Anandan
Sabby Anandan is a Product Manager at Pivotal. Sabby is focused on building products that eliminate the barriers between application development, cloud, and big data.
Motivation and goals for off-heap storage
Off-heap features and usage
Implementation overview
Preliminary benchmarks: off-heap vs. heap
Tips and best practices
Zeppelin Interpreters
PSQL (to became JDBC in 0.6.x)
Geode
SpringXD
Apache Ambari
Zeppelin Service
Geode, HAWQ and Spring XD services
Webpage Embedder View
Creating an Accessible Future-How AI-powered Accessibility Testing is Shaping...Impelsys Inc.
Web accessibility is a fundamental principle that strives to make the internet inclusive for all. According to the World Health Organization, over a billion people worldwide live with some form of disability. These individuals face significant challenges when navigating the digital landscape, making the quest for accessible web content more critical than ever.
Enter Artificial Intelligence (AI), a technological marvel with the potential to reshape the way we approach web accessibility. AI offers innovative solutions that can automate processes, enhance user experiences, and ultimately revolutionize web accessibility. In this blog post, we’ll explore how AI is making waves in the world of web accessibility.
This OrionX's 14th semi-annual report on the state of the cryptocurrency mining market. The report focuses on Proof-of-Work cryptocurrencies since those use substantial supercomputer power to mint new coins and encode transactions on their blockchains. Only two make the cut this time, Bitcoin with $18 billion of annual economic value produced and Dogecoin with $1 billion. Bitcoin has now reached the Zettascale with typical hash rates of 0.9 Zettahashes per second. Bitcoin is powered by the world's largest decentralized supercomputer in a continuous winner take all lottery incentive network.
The State of Web3 Industry- Industry ReportLiveplex
Web3 is poised for mainstream integration by 2030, with decentralized applications potentially reaching billions of users through improved scalability, user-friendly wallets, and regulatory clarity. Many forecasts project trillions of dollars in tokenized assets by 2030 , integration of AI, IoT, and Web3 (e.g. autonomous agents and decentralized physical infrastructure), and the possible emergence of global interoperability standards. Key challenges going forward include ensuring security at scale, preserving decentralization principles under regulatory oversight, and demonstrating tangible consumer value to sustain adoption beyond speculative cycles.
Interested in leveling up your JavaScript skills? Join us for our Introduction to TypeScript workshop.
Learn how TypeScript can improve your code with dynamic typing, better tooling, and cleaner architecture. Whether you're a beginner or have some experience with JavaScript, this session will give you a solid foundation in TypeScript and how to integrate it into your projects.
Workshop content:
- What is TypeScript?
- What is the problem with JavaScript?
- Why TypeScript is the solution
- Coding demo
Integration of Utility Data into 3D BIM Models Using a 3D Solids Modeling Wor...Safe Software
Jacobs has developed a 3D utility solids modelling workflow to improve the integration of utility data into 3D Building Information Modeling (BIM) environments. This workflow, a collaborative effort between the New Zealand Geospatial Team and the Australian Data Capture Team, employs FME to convert 2D utility data into detailed 3D representations, supporting enhanced spatial analysis and clash detection.
To enable the automation of this process, Jacobs has also developed a survey data standard that standardizes the capture of existing utilities. This standard ensures consistency in data collection, forming the foundation for the subsequent automated validation and modelling steps. The workflow begins with the acquisition of utility survey data, including attributes such as location, depth, diameter, and material of utility assets like pipes and manholes. This data is validated through a custom-built tool that ensures completeness and logical consistency, including checks for proper connectivity between network components. Following validation, the data is processed using an automated modelling tool to generate 3D solids from 2D geometric representations. These solids are then integrated into BIM models to facilitate compatibility with 3D workflows and enable detailed spatial analyses.
The workflow contributes to improved spatial understanding by visualizing the relationships between utilities and other infrastructure elements. The automation of validation and modeling processes ensures consistent and accurate outputs, minimizing errors and increasing workflow efficiency.
This methodology highlights the application of FME in addressing challenges associated with geospatial data transformation and demonstrates its utility in enhancing data integration within BIM frameworks. By enabling accurate 3D representation of utility networks, the workflow supports improved design collaboration and decision-making in complex infrastructure projects
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2025/06/state-space-models-vs-transformers-for-ultra-low-power-edge-ai-a-presentation-from-brainchip/
Tony Lewis, Chief Technology Officer at BrainChip, presents the “State-space Models vs. Transformers for Ultra-low-power Edge AI” tutorial at the May 2025 Embedded Vision Summit.
At the embedded edge, choices of language model architectures have profound implications on the ability to meet demanding performance, latency and energy efficiency requirements. In this presentation, Lewis contrasts state-space models (SSMs) with transformers for use in this constrained regime. While transformers rely on a read-write key-value cache, SSMs can be constructed as read-only architectures, enabling the use of novel memory types and reducing power consumption. Furthermore, SSMs require significantly fewer multiply-accumulate units—drastically reducing compute energy and chip area.
New techniques enable distillation-based migration from transformer models such as Llama to SSMs without major performance loss. In latency-sensitive applications, techniques such as precomputing input sequences allow SSMs to achieve sub-100 ms time-to-first-token, enabling real-time interactivity. Lewis presents a detailed side-by-side comparison of these architectures, outlining their trade-offs and opportunities at the extreme edge.
If You Use Databricks, You Definitely Need FMESafe Software
DataBricks makes it easy to use Apache Spark. It provides a platform with the potential to analyze and process huge volumes of data. Sounds awesome. The sales brochure reads as if it is a can-do-all data integration platform. Does it replace our beloved FME platform or does it provide opportunities for FME to shine? Challenge accepted
Artificial Intelligence in the Nonprofit Boardroom.pdfOnBoard
OnBoard recently partnered with Microsoft Tech for Social Impact on the AI in the Nonprofit Boardroom Survey, an initiative designed to uncover the current and future role of artificial intelligence in nonprofit governance.
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven InfrastructureSafe Software
When projects depend on fast, reliable spatial data, every minute counts.
AI Clearing needed a faster way to handle complex spatial data from drone surveys, CAD designs and 3D project models across construction sites. With FME Form, they built no-code workflows to clean, convert, integrate, and validate dozens of data formats – cutting analysis time from 5 hours to just 30 minutes.
Join us, our partner Globema, and customer AI Clearing to see how they:
-Automate processing of 2D, 3D, drone, spatial, and non-spatial data
-Analyze construction progress 10x faster and with fewer errors
-Handle diverse formats like DWG, KML, SHP, and PDF with ease
-Scale their workflows for international projects in solar, roads, and pipelines
If you work with complex data, join us to learn how to optimize your own processes and transform your results with FME.
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...Safe Software
The National Fuels Treatments Initiative (NFT) is transforming wildfire mitigation by creating a standardized map of nationwide fuels treatment locations across all land ownerships in the United States. While existing state and federal systems capture this data in diverse formats, NFT bridges these gaps, delivering the first truly integrated national view. This dataset will be used to measure the implementation of the National Cohesive Wildland Strategy and demonstrate the positive impact of collective investments in hazardous fuels reduction nationwide. In Phase 1, we developed an ETL pipeline template in FME Form, leveraging a schema-agnostic workflow with dynamic feature handling intended for fast roll-out and light maintenance. This was key as the initiative scaled from a few to over fifty contributors nationwide. By directly pulling from agency data stores, oftentimes ArcGIS Feature Services, NFT preserves existing structures, minimizing preparation needs. External mapping tables ensure consistent attribute and domain alignment, while robust change detection processes keep data current and actionable. Now in Phase 2, we’re migrating pipelines to FME Flow to take advantage of advanced scheduling, monitoring dashboards, and automated notifications to streamline operations. Join us to explore how this initiative exemplifies the power of technology, blending FME, ArcGIS Online, and AWS to solve a national business problem with a scalable, automated solution.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2025/06/solving-tomorrows-ai-problems-today-with-cadences-newest-processor-a-presentation-from-cadence/
Amol Borkar, Product Marketing Director at Cadence, presents the “Solving Tomorrow’s AI Problems Today with Cadence’s Newest Processor” tutorial at the May 2025 Embedded Vision Summit.
Artificial Intelligence is rapidly integrating into every aspect of technology. While the neural processing unit (NPU) often receives the majority of the spotlight as the ultimate AI problem solver, it is essential to recognize that not all AI workloads can be efficiently executed on an NPU and that neural network architectures are evolving rapidly. To create efficient chips and systems with market longevity, designers must plan for diverse AI workloads that include networks yet to be invented.
In this presentation, Borkar introduces a new processor from Cadence Tensilica. This new solution is designed to complement any NPU, creating the perfect synergy between the two processing engines and establishing a robust AI subsystem able to efficiently support workloads yet to be encountered. This combination allows developers to achieve efficiency and performance on the AI workloads of today and tomorrow, paving the way for future innovations in AI-powered devices.
Domino IQ – What to Expect, First Steps and Use Casespanagenda
Webinar Recording: https://www.panagenda.com/webinars/domino-iq-what-to-expect-first-steps-and-use-cases/
HCL Domino iQ Server – From Ideas Portal to implemented Feature. Discover what it is, what it isn’t, and explore the opportunities and challenges it presents.
Key Takeaways
- What are Large Language Models (LLMs) and how do they relate to Domino iQ
- Essential prerequisites for deploying Domino iQ Server
- Step-by-step instructions on setting up your Domino iQ Server
- Share and discuss thoughts and ideas to maximize the potential of Domino iQ
Bridging the divide: A conversation on tariffs today in the book industry - T...BookNet Canada
A collaboration-focused conversation on the recently imposed US and Canadian tariffs where speakers shared insights into the current legislative landscape, ongoing advocacy efforts, and recommended next steps. This event was presented in partnership with the Book Industry Study Group.
Link to accompanying resource: https://bnctechforum.ca/sessions/bridging-the-divide-a-conversation-on-tariffs-today-in-the-book-industry/
Presented by BookNet Canada and the Book Industry Study Group on May 29, 2025 with support from the Department of Canadian Heritage.
5. A distributed, memory-based data
management platform for data
oriented apps that need:
• high performance, scalability,
resiliency and continuous
availability
• fast access to critical data sets
• location-aware distributed data
processing
• event-driven data architecture
What is GEODE?
5
6. High-level Architecture
6
Powerful app development kit
• APIs: Java & REST
• Adapters: Redis, Lucene*, Spark*, …
Multiple persistence options
• Filesystem, RDBMS or HDFS*
• Sync: read-through, write-through
• Async: write-behind
Durable <K,V> cache/ store
• Data replicated or partitioned
• Redundant storage in-memory/ disk
• Flexible data retention policiesÎ
!
Locator
Server
Server
Server
Server
+""""
"
$
%
%
%
&& &
% % % % % % % %
&&
A Peer-2-Peer
Distributed System
REST
!
* Experimental and waiting community feedback
7. • 1000+ systems in production (real customers)
• Cutting edge use cases
Incubating but ROCK solid…
7
<2000 2004 2008 2012 2016
Early drivers
• Data Volumes
• Margins/ transactions
• IT maintenance costs
• Elasticity needs
Real-time needs
• Real-time response
• Time to market needs
• Flexible Data Models
• Persistent+In-memory
Global Data
• Visibility across DC
• Fast Ingest
• Device to enterprise
• Uptime (always on)
Open Source!
• Apache Incubation
• Gemfire > Geode
• M1 release
• 1st Geode Summit
Financial
Services
US DoD
Trade Clearing
Travel Portal
Online
Gambling
Telcos
Manufacturing
Auto Insurance
Payroll processing
Rail systems
8. …with both SCALE and SPEED, …
8
40K
Transactions
per second
3TB
Data
in-memory
17B
Records
in-memory
120K
Concurrent
users
9. … and impacting a LOT of people!
9
China Railway
Corporation
Indian
Railways
19%
17%
36%
of the world population
11. …and horizontal, consistent SCALABILITY!
11
Horizontal scaling for reads, consistent latency and CPU
0
4.5
9
13.5
18
Speedup
0
1.25
2.5
3.75
5
Server Hosts
2 4 6 8 10
speedup latency (ms) CPU %
• Scaled from 256 clients and 2 servers to 1280 clients and 10 servers
• Partitioned region with redundancy and 1K data size
12. • Minimize copying
• Minimize contention points
• Run user code in-process
• Partitioning & parallelism
• Avoid disk seeks
• Automated benchmarks
What makes it go FAST?
12
13. • Cache
• Region
• Member
• Client Cache
• Persistence
• Functions
• Events & Listeners
• High Availability
• Serialization
Let’s talk about a few (basic) CONCEPTS…
13
14. • In-memory storage and
management for your data
• Configurable through XML,
Java API or CLI
• Collection of Region
What is a CACHE?
14
Region
Region
Region
Cache
JVM
15. • Distributed java.util.Map on
steroids (Key/Value)
• Consistent API regardless of where
or how data is stored
• Observable (reactive)
• Highly available, redundant on
cache Member (s).
What is a REGION?
15
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
16. • Local, Replicated or Partitioned
• In-memory or persistent
• Redundant
• LRU
• Overflow
Region: Types & Options
16
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
LOCAL
LOCAL_HEAP_LRU
LOCAL_OVERFLOW
LOCAL_PERSISTENT
LOCAL_PERSISTENT_OVERFLOW
PARTITION
PARTITION_HEAP_LRU
PARTITION_OVERFLOW
PARTITION_PERSISTENT
PARTITION_PERSISTENT_OVERFLOW
PARTITION_PROXY
PARTITION_PROXY_REDUNDANT
PARTITION_REDUNDANT
PARTITION_REDUNDANT_HEAP_LRU
PARTITION_REDUNDANT_OVERFLOW
PARTITION_REDUNDANT_PERSISTENT
PARTITION_REDUNDANT_PERSISTENT_OVERFLOW
REPLICATE
REPLICATE_HEAP_LRU
REPLICATE_OVERFLOW
REPLICATE_PERSISTENT
REPLICATE_PERSISTENT_OVERFLOW
REPLICATE_PROXY
17. • Durability
• WAL for efficient writing
• Consistent recovery
• Compaction
Persistent Regions
17
Modify
k1->v5
Create
k6->v6
Create
k2->v2
Create
k4->v4
Oplog2.crf
Member
1
Modify
k4->v7Oplog3.crf
Put k4->v7
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
Server 1 Server N
18. • A process that has a connection to
the system
• A process that has created a cache
• Embeddable within your
application
What is a MEMBER?
18
Client
Locator
Server
19. • A process connected to the
Geode server(s)
• Can have a local copy of the data
• Run OQL queries on local data
• Can be notified about events on
the servers
What is a CLIENT CACHE?
19
Application
GemFire Server
Region
Region
RegionClient Cache
20. • Clone & Build
•
• Start Services
• Create & Monitor Region
How to START? Easy as !!
20
git clone https://github.com/apache/incubator-geode
cd incubator-geode
./gradlew build -Dskip.tests=true
cd gemfire-assembly/build/install/apache-geode
./bin/gfsh
gfsh> start locator --name=locator
gfsh> start server --name=server
gfsh> create region --name=myRegion —type=REPLICATE
gfsh> start [pulse | jconsole]
1
2
3
'
1 2 3
34. • Register Interest
• Individual Keys OR RegEx for Keys
• Updates Local Copy
• Examples:
• region.registerInterest(“key-1”);
• region1.registerInterestRegex(“[a-z]+“);
• Continuous Query
• Receive Notification when Query condition met on server
• Example:
• SELECT * FROM /tradeOrder t WHERE t.price > 100.00
Can be DURABLE
Events & Notifications
34
35. • CacheWriter / CacheListener
• AsyncEventListener (queue / batch)
• Parallel or Serial
• Conflation
Listeners
35
40. But HOW to serialize data?
40
Benchmark: https://github.com/eishay/jvm-serializers
41. Schema Evolution
41
Member A Member B
Distributed Type Definitions
v2v1
Application #1
Application #2
v2 objects preserve data
from missing fields
v1 objects use default values to
fill in new fields
PDX provides forwards and backwards
compatibility, no code required
43. Code
• New features
• Bug fixes
• Writing tests
Documentation
• Wiki
• Web site
• User guide
How to CONTRIBUTE?
43
Community
• Join the mailing list
• Ask or answer
• Join our HipChat
• Become a speaker
• Finding bugs
• Testing an RC/Beta