Improve perfomance of extremely large datasets

When I search around for plotting libraries for a large number of data points, people are talking 50k, I'm talking 10M. I made a simple React demo that generates 10 random sinusoids of 1M points each. 10 * 100k works fine, but 10 * 1M becomes unusable.

```typescript
  var numpoints = 1e6;
  var time: number[] = [];
  for (let i=0; i<numpoints; i++) {
    time.push(i);
  }
  var traces: object[] = [];
  for (let i=0; i<10; i++) {
    let points: number[] = [];
    let freq = Math.random()/1000;
    for (let j=0; j<numpoints; j++) {
      points.push(Math.sin(j*freq));
    }
    traces.push({
      x: time,
      y: points,
      type: 'scatter',
      mode: 'lines',
      yaxis: `y${i+1}`,
      xaxis: `x${i+1}`,
    })
  }
// ...
<Plot
        data={traces}
        layout={ {
          width: 2000,
          height: 1000,
          grid: {rows:5, columns: 2, pattern: "independent"},
          title: 'A Fancy Plot'} }
      />
```

I did [a bit of profiling](https://github.com/plotly/plotly.js/files/6440431/profiles.zip), and there are two issues. The first on is relatively simple, and the comment already describes what needs to be done to make hovering over the plot not lag because it's looping over every single data point.
https://github.com/plotly/plotly.js/blob/623fcd1fea9d9bfb86e5e0d44d8047cd8636881c/src/components/fx/helpers.js#L59-L62

The second issue seems to be just the drawing of the plot itself after a drag or zoom action. It spends all its time in `plot`, `plotOne` and `linePoints`.
What's interesting is that even if you zoom in, where it would only have to draw a small subset of the line, it's still just as slow.
https://github.com/plotly/plotly.js/blob/623fcd1fea9d9bfb86e5e0d44d8047cd8636881c/src/traces/scatter/line_points.js#L346

So it seems like both problems could be solved with some sort of index to avoid looping over all the datapoints. Some suggestions to jumpstart the discussion:

 * For the common case of a monotonous x axis, implement a simple binary/interpolation search (monotonicity could be detected or specified)
 * Store points in a quadtree, to allow fast spatial indexing for any type of data. (such as https://github.com/plotly/point-cluster )
 * Automatic downsampling. If I have 10M points, when zoomed out all the detail is lost anyway, but I still want to be able to zoom in and inspect it.
 * Offload operations to a webworker. At some point you're going to need to do a thing 10M times, but don't freeze the UI to do it.

If I end up using Plotly in production I'd be happy to try and contribute towards this, but for now just some suggestions to see how the maintainers feel about the issue and what their preferred approach would be.

	// apply the distance function to each data point
	// this is the longest loop... if this bogs down, we may need
	// to create pre-sorted data (by x or y), not sure how to
	// do this for 'closest'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve perfomance of extremely large datasets #5641

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Improve perfomance of extremely large datasets #5641

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions