SlideShare a Scribd company logo
Jakub Wozniak, CERN
Next CERN Accelerator
Logging Service
Architecture
#EUent9
Agenda
• What is (Next) CALS?
• NXCALS Architecture
• Meta-data Service & Ingestion API
• Spark Extraction API
#EUent9
Controls Data Logging
• Provide access to current & historical device state
– Monitoring & controls of the machines
– Improve machine/beam performance
– Various studies (new beam types, experiments, machines)
• Required to deliver quality beam to experiments
• Not physics data from experiments!
#EUent9
CERN Accelerator Logging Service
• Old system (CALS) based on Oracle (2 DBs)
– ~20,000 devices (from ~120,000 devices)
– 1,500,000 signals
– 5,000,000 extractions per day
– 71,000,000,000 records per day
• 2 TB / day (unfiltered data, 2 DBs)
– 1 PB of total data storage (heavily filtered up to 95%)
#EUent9
Current Controls Data Storage
Run 1 Run 2LS 1
900 GB/day
#EUent9
Current Issues With CALS
• Performance / scalability problems
– Difficult to scale horizontally
– “… to extract 24h of data takes 12h”
• Other issues
– Problems with big payloads (payloads vary from KB to GB)
– Limited & rigid table structure & limited types (no nested types)
– Limited integration with heterogeneous analytics tools (Python, Matlab,
R, Java,…)
• CALS & tools not ready for Big Data!
– Have to extract data to do analysis!
#EUent9
Big Data
For Controls?
#EUent9
CALS on Oracle
Impala
Kudu
?Next CALS
(NXCALS)
Next
CERN Accelerator Logging Service
(Kafka, Hadoop, Spark)
#EUent9
Controls Data
#EUent9
Readings from devices / properties (with fields inside)
Timeseries of records
Device X / Property Y (time & values): t0: { f1, f2, f3 } (schema 1)
t1: { f1, f2, f3 }
t2: { f1, f2, f3 }
…
t3: { f1, f2, f3, f4 } (schema 2)
t4: { f1, f2, f3, f4 }
…
tN: { f1, f2, f3, f5, …, fN } (schema N)
Devices get updated so …
… schema changes over time!
Generic Storage System
• Different Controls Systems for different domains
• Not only Device/Property model
Let’s generalize and define some abstraction
Call it Entity…
…and just arbitrary Records
Record: Key -> Values (with timestamp & partition)
Not limited to Controls nor CERN!
#EUent9
Some Requirements
• Discover entities from records
– Avoids static / offline registration in advance
• Allow to search for entity meta-data
– What are the known entities?
– How they are partitioned?
– With what schemas?
• Store & extract data
• Data access
– Online monitoring (simple extraction but must have low latency data access)
– Offline analysis (provide visualization tools for more complex analysis)
#EUent9
NXCALS Architecture
Spark
Lo
g.
Pr
oc.
Datasources
12
Jupyter
Old API
NXCALS
API
ETL
Kafka
HBase
HDFS
Avro
Parquet
Hadoop
API
Meta-data service DB
Scientists
Programmers
Applications
Clients
Design Choices
• Why Hadoop
– Service at CERN (IT/DB group)
• Why Kafka?
– Redundancy & data safety (if Hadoop not available)
– Low latency streaming API for extraction
• Why Hbase?
– Fast, low latency for online monitoring queries
– Gives time for data deduplication & compaction into Parquet files
• Why Parquet as final storage?
– Open, columnar, storage efficient format with good compression
– Good performance for extraction
• predicate push down
• column projection
– Easy to understand, access (even outside of the system), backup, etc
#EUent9
Data Flow
• Ingestion API to send data to Kafka (as Avro)
• ETL extracts it from Kafka towards
– HDFS (as Avro, into staging folders)
– HBase (as Avro, for low latency)
• Avro files is deduplicated & compacted
• Into larger Parquet files (with Spark)
• Hadood-friendly process, avoids many small files
• Spark Extraction API for data access
• Meta-data service knows location of objects in files
– Avoids scanning many files
– “Replacement” for missing indexes
#EUent9
Devops?
• Microservice architecture
• Monitoring is crucial, done using
– Prometheus
– Alertmanager
– Grafana
– Logs send to Elastic (outside)
• Fully automated CI/CD with
– Jenkins pipelines
– Ansible deployment
#EUent9
Meta-data Service
#EUent9
Data Types
• Data (records):
– Kafka -> Hadoop (HBase, HDFS)
• Meta-data (info about data)
– RDBMS (Oracle)
#EUent9
Domain Description
• System stores changes of state of abstract entities in form of records
– Data identified by entity keys and timestamp
– “Extended” timeseries data
• Record = { f1=v1 ,…, fn=vn } (at t1)
– Any fields
– Some fields are special (entity keys, partition keys, timestamp)
– Set of fields => Schema
• Records are split (grouped in different files on disk) by:
– Time, partition (classifier), schema
• Fields can change over time {f1…fm} (at tx)
– History of record structure changes (schema changes)
#EUent9
Meta Data Objects
• ENTITY – abstract object we store data for
– Identified by known record fields (primary key)
• PARTITION –classifier to store data on disk in files
– Identified by known record fields (primary key)
• SCHEMA – given set of all record’s fields
#EUent9
Meta Data Objects
• SYSTEM – defines record type (special fields)
– Field names identifying ENTITY
– Field names identifying PARTITION
– Field names identifying TIMESTAMP
• ENTITY-HISTORY – history of SCHEMA & PARTITION changes of ENTITY over
time
• VARIABLE – alias for ENTITY
– whole record
– field in record
• VARIABLE-HISTORY – VARIABLE configuration over time
– Pointer (alias) to entity and field with time information
#EUent9
Java Ingestion API Example
// Create data publisher
Publisher<ImmutableData> publisher =
PublisherFactory.newInstance().createPublisher(“MOCK-SYSTEM”,(d)-> d);
// Create data (ImmutableData == Map<String,Object>)
ImmutableData data = ImmutableData.builder()
.add("device", ”NXCALS_MONITORING_DEV1")
.add(”property", ”Setting")
.add(“class”,”MONITORING”)
.add(“timestamp”,Instant.now())
.add("byteField1", (byte) 2)
.add("shortField1", (short) 1).build();
// Publish data
CompletableFuture<Void> future = publisher.publish(data);
// Handle Future completion or error
future.whenComplete((v,e)->{if(e != null) //handle errors });
#EUent9
Entity Key
Partition Key
Timestamp Key
Data Partitioning
System [sid], { entity_keys, partition_keys, timestamp, field1…fieldN } = record
hdfs: /// project / nxcals / sid / partition_id / schema_id / date / data.parquet
schema
Meta
A simple example for device domain (CMW)
• System CMW which defines:
• Entity keys as device, property
• Partition keys as class, property
• Timestamp keys (acq or cycle stamp)
So one data.parquet file will contain
data for devices from the same
class/property.
A file has always records of
the same schema!#EUent9
Meta Store Efficiency
• Meta-data is cached
• Ingestion API calls the meta-store only on:
– Entity creation
– Entity change (schema change / rename / …)
– Cache misses
• So rarely compared to the data rate
– Calls to meta store expensive (10-50ms)
#EUent9
Meta-Store Features
• Entities are created dynamically from records
• Schemas are discovered and saved with history
• Records (entities) can change schemas over time
• Schema changes handled at extraction
– using history from meta-data service
#EUent9
Spark Extraction API
#EUent9
API for Spark Extraction
• Extension to Spark sources package
– Extends BaseRelation, implements PrunedFilteredScan
– sparkSession.read().format("cern.accsoft.nxcals.data.access.api”).load()
• Hides data source & implementation details
– Hbase for most recent data (<36 hours)
– HDFS for older data (>36 hours due to compaction)
• Merges schemas using schema history
• Greatly simplifies data access
#EUent9
Spark Extraction Example
SparkSession sparkSession = … // create session
Dataset<Row> dataset = DataAccessQueryBuilder
.system("MOCK-SYSTEM")
.keyValue("device", ”NXCALS_MONITORING_DEV1")
.keyValue(”property", ”Setting")
.startTime("2017-10-10 00:00:00.0")
.duration(Duration.ofDays(2))
.fields("device", "intField1", "doubleField")
.buildDataset(sparkSession);
#EUent9
Entity Key
Time Window
Record Schema, Spark Default
Record 1: {acqStamp, field1 (double), field2 (integer)}
…
Record 2: {acqStamp, field1 (float), field21 (long)} //rename, field2 = field21
…
Record 3: {acqStamp, field3 (double)} //only field3
Can you quickly extract & union datasets containing those records?
org.apache.spark.sql.AnalysisException:
Union can only be performed on tables with the same number of columns
Can be done but troublesome for scientists!
Entity A evolves over time:
#EUent9
Schema Merging
Schema: {acqStamp(long), field1 (double), field2 (integer), field21 (long), field3 (double)}
Record1
Record2
Record3
Record 1: {acqStamp, field1 (double), field2 (integer)}
…
Record 2: {acqStamp, field1 (float), field21 (long)} //rename, field2 = field21
…
Record 3: {acqStamp, field3 (double)} //only field3
#EUent9
Record 1: {acqStamp, field1 (double), field2 (integer)}
…
Record 2: {acqStamp, field1 (float), field21 (long)} //rename, field2 = field21
…
Record 3: {acqStamp, field3 (double)} //only field3
… With Field Aliases
… and new_field as alias of field2 and field21
Schema {acqStamp (long), field1 (double), new_field (long), field3 (double)}
Record1
Record2
Record3
#EUent9
Variables
• Pointer to field in entity record in time window
• Can point to different entities over time
• No need for real entity
• Useful for abstractions (“LHC_Beam_Intensity”)
#EUent9
Variable Extraction API
#EUent9
SparkSession sparkSession = … // create session
Dataset<Row> dataset = VariableQueryBuilder
.variable(”NXCALS_MONITORING_VARIABLE")
.startTime("2017-10-10 00:00:00.0")
.duration(Duration.ofDays(2))
.buildDataset(sparkSession);
Variables Configuration
Schema: {variable (String), acqStamp(long), value (double)}
Entity 1: {acqStamp, field1 (float), field21 (long)}
Entity 2: {acqStamp, field2 (double)}
Entity 3: {aqcStamp, field1(array2D), field3 (float)}
Variable configuration
changes over time
#EUent9
Why Simplified Extraction?
• Data producers ≠ data consumers
• At CERN different groups do
– Equipment & Device / Property design (low level)
– Physics & Beam-oriented analysis (high level)
#EUent9
Summary
• NXCALS is a generic Big Data storage system
• Timeseries-like records of changing structure
– Arbitrary entity & partition keys
• Java Ingestion API
• Spark Extraction API (Java, Python, Scala)
#EUent9
Questions?
• NXCALS code:
– https://gitlab.cern.ch/acc-logging-team/nxcals
• Contact us:
– jakub.wozniak@cern.ch
– acc-logging-team@cern.ch
#EUent9

More Related Content

PDF
Storage tiering and erasure coding in Ceph (SCaLE13x)
PDF
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
PDF
Using ClickHouse for Experimentation
PDF
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
PDF
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
PDF
SQream DB, GPU-accelerated data warehouse
PDF
Modularized ETL Writing with Apache Spark
PDF
DAS Slides: Data Architect vs. Data Engineer vs. Data Modeler
Storage tiering and erasure coding in Ceph (SCaLE13x)
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
Using ClickHouse for Experimentation
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
SQream DB, GPU-accelerated data warehouse
Modularized ETL Writing with Apache Spark
DAS Slides: Data Architect vs. Data Engineer vs. Data Modeler

What's hot (20)

PDF
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
PDF
MySQL Audit using Percona audit plugin and ELK
PDF
Data Source API in Spark
PDF
Velocity 2015 linux perf tools
PPTX
Flexible and Real-Time Stream Processing with Apache Flink
POTX
Performance Tuning EC2 Instances
PDF
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...
PDF
Easy, scalable, fault tolerant stream processing with structured streaming - ...
PPTX
Druid and Hive Together : Use Cases and Best Practices
PDF
美团数据平台之Kafka应用实践和优化
PPTX
Streaming Data and Stream Processing with Apache Kafka
PDF
Apache Iceberg - A Table Format for Hige Analytic Datasets
PDF
Streaming SQL with Apache Calcite
PPT
Alfresco node lifecyle, services and zones
PDF
Introduction to Apache Cassandra
PDF
ksqlDB: A Stream-Relational Database System
PDF
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
PDF
Oracle Performance Tuning Fundamentals
PDF
하이퍼커넥트 데이터 팀이 데이터 증가에 대처해온 기록
PDF
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
MySQL Audit using Percona audit plugin and ELK
Data Source API in Spark
Velocity 2015 linux perf tools
Flexible and Real-Time Stream Processing with Apache Flink
Performance Tuning EC2 Instances
Real-Time Detection of Anomalies in the Database Infrastructure using Apache ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Druid and Hive Together : Use Cases and Best Practices
美团数据平台之Kafka应用实践和优化
Streaming Data and Stream Processing with Apache Kafka
Apache Iceberg - A Table Format for Hige Analytic Datasets
Streaming SQL with Apache Calcite
Alfresco node lifecyle, services and zones
Introduction to Apache Cassandra
ksqlDB: A Stream-Relational Database System
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Oracle Performance Tuning Fundamentals
하이퍼커넥트 데이터 팀이 데이터 증가에 대처해온 기록
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
Ad

Similar to Next CERN Accelerator Logging Service with Jakub Wozniak (20)

PDF
In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...
PDF
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
PDF
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
PPTX
Building a modern Application with DataFrames
PPTX
Building a modern Application with DataFrames
PDF
Schema on read is obsolete. Welcome metaprogramming..pdf
PDF
Drill architecture 20120913
PPTX
Lecture 5- Data Collection and Storage.pptx
PDF
Fundamentals Big Data and AI Architecture
PDF
Data Platform in the Cloud
PPTX
The Big Data Stack
PPTX
Data Ingestion Engine
PDF
Data Infrastructure for a World of Music
PDF
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
PDF
Дмитрий Попович "How to build a data warehouse?"
PPTX
Software architecture for data applications
PDF
Flink Forward San Francisco 2019: Building Financial Identity Platform using ...
PDF
Scala like distributed collections - dumping time-series data with apache spark
PDF
Scalable IoT platform
PPTX
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Building a modern Application with DataFrames
Building a modern Application with DataFrames
Schema on read is obsolete. Welcome metaprogramming..pdf
Drill architecture 20120913
Lecture 5- Data Collection and Storage.pptx
Fundamentals Big Data and AI Architecture
Data Platform in the Cloud
The Big Data Stack
Data Ingestion Engine
Data Infrastructure for a World of Music
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Дмитрий Попович "How to build a data warehouse?"
Software architecture for data applications
Flink Forward San Francisco 2019: Building Financial Identity Platform using ...
Scala like distributed collections - dumping time-series data with apache spark
Scalable IoT platform
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
Ad

More from Spark Summit (20)

PDF
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
PDF
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
PDF
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
PDF
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
PDF
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
PDF
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
PDF
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
PDF
Powering a Startup with Apache Spark with Kevin Kim
PDF
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
PDF
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
PDF
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
PDF
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
PDF
Goal Based Data Production with Sim Simeonov
PDF
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
PDF
Getting Ready to Use Redis with Apache Spark with Dvir Volk
PDF
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
PDF
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
PDF
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Powering a Startup with Apache Spark with Kevin Kim
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Goal Based Data Production with Sim Simeonov
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...

Recently uploaded (20)

PDF
Oracle OFSAA_ The Complete Guide to Transforming Financial Risk Management an...
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PDF
annual-report-2024-2025 original latest.
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Mega Projects Data Mega Projects Data
PPT
Predictive modeling basics in data cleaning process
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Leprosy and NLEP programme community medicine
PDF
[EN] Industrial Machine Downtime Prediction
Oracle OFSAA_ The Complete Guide to Transforming Financial Risk Management an...
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
STERILIZATION AND DISINFECTION-1.ppthhhbx
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
importance of Data-Visualization-in-Data-Science. for mba studnts
annual-report-2024-2025 original latest.
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Mega Projects Data Mega Projects Data
Predictive modeling basics in data cleaning process
SAP 2 completion done . PRESENTATION.pptx
Database Infoormation System (DBIS).pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Qualitative Qantitative and Mixed Methods.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Leprosy and NLEP programme community medicine
[EN] Industrial Machine Downtime Prediction

Next CERN Accelerator Logging Service with Jakub Wozniak

  • 1. Jakub Wozniak, CERN Next CERN Accelerator Logging Service Architecture #EUent9
  • 2. Agenda • What is (Next) CALS? • NXCALS Architecture • Meta-data Service & Ingestion API • Spark Extraction API #EUent9
  • 3. Controls Data Logging • Provide access to current & historical device state – Monitoring & controls of the machines – Improve machine/beam performance – Various studies (new beam types, experiments, machines) • Required to deliver quality beam to experiments • Not physics data from experiments! #EUent9
  • 4. CERN Accelerator Logging Service • Old system (CALS) based on Oracle (2 DBs) – ~20,000 devices (from ~120,000 devices) – 1,500,000 signals – 5,000,000 extractions per day – 71,000,000,000 records per day • 2 TB / day (unfiltered data, 2 DBs) – 1 PB of total data storage (heavily filtered up to 95%) #EUent9
  • 5. Current Controls Data Storage Run 1 Run 2LS 1 900 GB/day #EUent9
  • 6. Current Issues With CALS • Performance / scalability problems – Difficult to scale horizontally – “… to extract 24h of data takes 12h” • Other issues – Problems with big payloads (payloads vary from KB to GB) – Limited & rigid table structure & limited types (no nested types) – Limited integration with heterogeneous analytics tools (Python, Matlab, R, Java,…) • CALS & tools not ready for Big Data! – Have to extract data to do analysis! #EUent9
  • 7. Big Data For Controls? #EUent9 CALS on Oracle Impala Kudu ?Next CALS (NXCALS)
  • 8. Next CERN Accelerator Logging Service (Kafka, Hadoop, Spark) #EUent9
  • 9. Controls Data #EUent9 Readings from devices / properties (with fields inside) Timeseries of records Device X / Property Y (time & values): t0: { f1, f2, f3 } (schema 1) t1: { f1, f2, f3 } t2: { f1, f2, f3 } … t3: { f1, f2, f3, f4 } (schema 2) t4: { f1, f2, f3, f4 } … tN: { f1, f2, f3, f5, …, fN } (schema N) Devices get updated so … … schema changes over time!
  • 10. Generic Storage System • Different Controls Systems for different domains • Not only Device/Property model Let’s generalize and define some abstraction Call it Entity… …and just arbitrary Records Record: Key -> Values (with timestamp & partition) Not limited to Controls nor CERN! #EUent9
  • 11. Some Requirements • Discover entities from records – Avoids static / offline registration in advance • Allow to search for entity meta-data – What are the known entities? – How they are partitioned? – With what schemas? • Store & extract data • Data access – Online monitoring (simple extraction but must have low latency data access) – Offline analysis (provide visualization tools for more complex analysis) #EUent9
  • 13. Design Choices • Why Hadoop – Service at CERN (IT/DB group) • Why Kafka? – Redundancy & data safety (if Hadoop not available) – Low latency streaming API for extraction • Why Hbase? – Fast, low latency for online monitoring queries – Gives time for data deduplication & compaction into Parquet files • Why Parquet as final storage? – Open, columnar, storage efficient format with good compression – Good performance for extraction • predicate push down • column projection – Easy to understand, access (even outside of the system), backup, etc #EUent9
  • 14. Data Flow • Ingestion API to send data to Kafka (as Avro) • ETL extracts it from Kafka towards – HDFS (as Avro, into staging folders) – HBase (as Avro, for low latency) • Avro files is deduplicated & compacted • Into larger Parquet files (with Spark) • Hadood-friendly process, avoids many small files • Spark Extraction API for data access • Meta-data service knows location of objects in files – Avoids scanning many files – “Replacement” for missing indexes #EUent9
  • 15. Devops? • Microservice architecture • Monitoring is crucial, done using – Prometheus – Alertmanager – Grafana – Logs send to Elastic (outside) • Fully automated CI/CD with – Jenkins pipelines – Ansible deployment #EUent9
  • 17. Data Types • Data (records): – Kafka -> Hadoop (HBase, HDFS) • Meta-data (info about data) – RDBMS (Oracle) #EUent9
  • 18. Domain Description • System stores changes of state of abstract entities in form of records – Data identified by entity keys and timestamp – “Extended” timeseries data • Record = { f1=v1 ,…, fn=vn } (at t1) – Any fields – Some fields are special (entity keys, partition keys, timestamp) – Set of fields => Schema • Records are split (grouped in different files on disk) by: – Time, partition (classifier), schema • Fields can change over time {f1…fm} (at tx) – History of record structure changes (schema changes) #EUent9
  • 19. Meta Data Objects • ENTITY – abstract object we store data for – Identified by known record fields (primary key) • PARTITION –classifier to store data on disk in files – Identified by known record fields (primary key) • SCHEMA – given set of all record’s fields #EUent9
  • 20. Meta Data Objects • SYSTEM – defines record type (special fields) – Field names identifying ENTITY – Field names identifying PARTITION – Field names identifying TIMESTAMP • ENTITY-HISTORY – history of SCHEMA & PARTITION changes of ENTITY over time • VARIABLE – alias for ENTITY – whole record – field in record • VARIABLE-HISTORY – VARIABLE configuration over time – Pointer (alias) to entity and field with time information #EUent9
  • 21. Java Ingestion API Example // Create data publisher Publisher<ImmutableData> publisher = PublisherFactory.newInstance().createPublisher(“MOCK-SYSTEM”,(d)-> d); // Create data (ImmutableData == Map<String,Object>) ImmutableData data = ImmutableData.builder() .add("device", ”NXCALS_MONITORING_DEV1") .add(”property", ”Setting") .add(“class”,”MONITORING”) .add(“timestamp”,Instant.now()) .add("byteField1", (byte) 2) .add("shortField1", (short) 1).build(); // Publish data CompletableFuture<Void> future = publisher.publish(data); // Handle Future completion or error future.whenComplete((v,e)->{if(e != null) //handle errors }); #EUent9 Entity Key Partition Key Timestamp Key
  • 22. Data Partitioning System [sid], { entity_keys, partition_keys, timestamp, field1…fieldN } = record hdfs: /// project / nxcals / sid / partition_id / schema_id / date / data.parquet schema Meta A simple example for device domain (CMW) • System CMW which defines: • Entity keys as device, property • Partition keys as class, property • Timestamp keys (acq or cycle stamp) So one data.parquet file will contain data for devices from the same class/property. A file has always records of the same schema!#EUent9
  • 23. Meta Store Efficiency • Meta-data is cached • Ingestion API calls the meta-store only on: – Entity creation – Entity change (schema change / rename / …) – Cache misses • So rarely compared to the data rate – Calls to meta store expensive (10-50ms) #EUent9
  • 24. Meta-Store Features • Entities are created dynamically from records • Schemas are discovered and saved with history • Records (entities) can change schemas over time • Schema changes handled at extraction – using history from meta-data service #EUent9
  • 26. API for Spark Extraction • Extension to Spark sources package – Extends BaseRelation, implements PrunedFilteredScan – sparkSession.read().format("cern.accsoft.nxcals.data.access.api”).load() • Hides data source & implementation details – Hbase for most recent data (<36 hours) – HDFS for older data (>36 hours due to compaction) • Merges schemas using schema history • Greatly simplifies data access #EUent9
  • 27. Spark Extraction Example SparkSession sparkSession = … // create session Dataset<Row> dataset = DataAccessQueryBuilder .system("MOCK-SYSTEM") .keyValue("device", ”NXCALS_MONITORING_DEV1") .keyValue(”property", ”Setting") .startTime("2017-10-10 00:00:00.0") .duration(Duration.ofDays(2)) .fields("device", "intField1", "doubleField") .buildDataset(sparkSession); #EUent9 Entity Key Time Window
  • 28. Record Schema, Spark Default Record 1: {acqStamp, field1 (double), field2 (integer)} … Record 2: {acqStamp, field1 (float), field21 (long)} //rename, field2 = field21 … Record 3: {acqStamp, field3 (double)} //only field3 Can you quickly extract & union datasets containing those records? org.apache.spark.sql.AnalysisException: Union can only be performed on tables with the same number of columns Can be done but troublesome for scientists! Entity A evolves over time: #EUent9
  • 29. Schema Merging Schema: {acqStamp(long), field1 (double), field2 (integer), field21 (long), field3 (double)} Record1 Record2 Record3 Record 1: {acqStamp, field1 (double), field2 (integer)} … Record 2: {acqStamp, field1 (float), field21 (long)} //rename, field2 = field21 … Record 3: {acqStamp, field3 (double)} //only field3 #EUent9
  • 30. Record 1: {acqStamp, field1 (double), field2 (integer)} … Record 2: {acqStamp, field1 (float), field21 (long)} //rename, field2 = field21 … Record 3: {acqStamp, field3 (double)} //only field3 … With Field Aliases … and new_field as alias of field2 and field21 Schema {acqStamp (long), field1 (double), new_field (long), field3 (double)} Record1 Record2 Record3 #EUent9
  • 31. Variables • Pointer to field in entity record in time window • Can point to different entities over time • No need for real entity • Useful for abstractions (“LHC_Beam_Intensity”) #EUent9
  • 32. Variable Extraction API #EUent9 SparkSession sparkSession = … // create session Dataset<Row> dataset = VariableQueryBuilder .variable(”NXCALS_MONITORING_VARIABLE") .startTime("2017-10-10 00:00:00.0") .duration(Duration.ofDays(2)) .buildDataset(sparkSession);
  • 33. Variables Configuration Schema: {variable (String), acqStamp(long), value (double)} Entity 1: {acqStamp, field1 (float), field21 (long)} Entity 2: {acqStamp, field2 (double)} Entity 3: {aqcStamp, field1(array2D), field3 (float)} Variable configuration changes over time #EUent9
  • 34. Why Simplified Extraction? • Data producers ≠ data consumers • At CERN different groups do – Equipment & Device / Property design (low level) – Physics & Beam-oriented analysis (high level) #EUent9
  • 35. Summary • NXCALS is a generic Big Data storage system • Timeseries-like records of changing structure – Arbitrary entity & partition keys • Java Ingestion API • Spark Extraction API (Java, Python, Scala) #EUent9
  • 36. Questions? • NXCALS code: – https://gitlab.cern.ch/acc-logging-team/nxcals • Contact us: – [email protected][email protected] #EUent9