Every now and again, in the wonderfully cyclical world of information technology, new concepts emerge challenging us to rethink what we thought we already knew. It’s happening again, right now. The instigator, this time, is streaming.

Driven by the desire to shrink to zero the time it takes to turn massive volumes of raw data into useful information and action, streaming is deceptively simple: just process and act on data as it arrives, quickly, and in a continuous and infinite fashion.

For use cases from Industrial IoT to Connected Cars to Real-Time Fraud Detection and more, we’re increasingly looking to build new applications and experiences that react quickly to customer interests and actions, learn and adapt to changing behavior patterns, and the like. But the reality is most of us don’t yet have the tools to do this with production level data volumes, ingestion rates, and fault resiliency.

Streaming demands disruptive compute and storage technologies

Watch the video of Pravega in action

Streaming is hard because it assumes three disruptive systems capabilities:

  • Ability to treat data as continuous and infinite rather than finite and static
  • Ability to deliver consistently fast results by dynamically scaling data ingest, storage, and processing in coordination with data arrival volume
  • Ability to deliver accurate results processing data continuously even with late arriving or out of order data

Here’s the good news: Streaming is forcing systems designers to rethink fundamental computational and storage principles. As passionate storage people, we’re doing our part by designing a new storage primitive, called a stream, purpose-built for streaming architecture and implemented in a new open source project named Pravega.

A stream is the storage foundation for reliable streaming systems: a high-performance, durable, elastic, and infinite append-only byte stream with strict ordering and consistency.

By combining Pravega streams with a stateful stream processor like Apache Flink, we realize a system where writers, processors, readers, and storage are independently, elastically, and dynamically scalable in coordination with the volume of data arriving enabling all of us to build streaming apps we could not before, and to seamlessly scale them from prototype to production.

And by refactoring and externalizing previously internal and proprietary log storage, streams will greatly simplify the development and operation of a new generation of distributed middleware reimagined as streaming infrastructure:

We think Pravega will make the benefits of streaming easily accessible to anyone If you’re passionate about this, we encourage you to join our community (in #pravega) to see how Pravega will help your streaming solutions. You may also download Pravega, read my full blog, and visit our website.

Please check back to keep up with new concepts and ideas related to Pravega’s mission to reimagine data middleware as streaming infrastructure.

If you’re interested to learn more about {code}’s DevHigh5 program, visit the {code} Wiki.

Blog by guest author Salvatore DeSimone

[email protected]