[ Pagewise preview ]
| Category | Value | ||
| Available via | http://dbpubs.stanford.edu/pub/2003-19 | ||
| Next version(s) | 2003-68 | ||
| Submitted on | 9th of March 2003 | ||
| Author | Babcock, Brian; Babu, Shivnath; Datar, Mayur; Motwani, Rajeev | ||
| Title | Chain: Operator Scheduling for Memory Minimization in Data Stream Systems | ||
| Date of publication | 2003 | ||
| Published in | Proc. of the ACM Intl Conf. on Management of Data (SIGMOD 2003) | ||
| Citation | Babcock, Brian; Babu, Shivnath; Datar, Mayur; Motwani, Rajeev. Chain: Operator Scheduling for Memory Minimization in Data Stream Systems, Proc. of the ACM Intl Conf. on Management of Data (SIGMOD 2003) | ||
| Number of pages | 12 | ||
| Language | English | ||
| Project | STREAM | ||
| Type | Conference or Journal Paper | ||
| Subject group | Data Streams; Miscellaneous | ||
| Abstract | In many applications involving continuous data streams, data arrival is bursty and data rate fluctuates over time. Systems that seek to give rapid or real-time query responses in such an environment must be prepared to deal gracefully with bursts in data arrival without compromising system performance. We discuss one strategy for processing bursty streams --- adaptive, load-aware scheduling of query operators to minimize resource consumption during times of peak load. We show that the choice of an operator scheduling strategy can have significant impact on the run-time system memory usage. We then present Chain scheduling, an operator scheduling strategy for data stream systems that is near-optimal in minimizing run-time memory usage for single-stream queries involving selections, projections, and foreign-key joins with stored relations. Chain scheduling also performs well for queries with sliding-window joins over multiple streams, and multiple queries of the above types. A thorough experimental evaluation is provided where we demonstrate the potential benefits of Chain scheduling, compare it with competing scheduling strategies, and validate our analytical conclusions. | ||
| Keywords | Data streams, scheduling | ||
| Contact address | shivnath@stanford.edu | ||
| Notes | An extended version of this paper titled "Operator Scheduling in Data Stream Systems" appears on this publications server as technical report 2003-68 at http://dbpubs.stanford.edu/pub/2003-68. This technical report proves an NP-completeness result showing the intractability of the problem of minimizing memory. The report also contains theoretical results and experiments for miminizing run-time memory requirements subject to user-specified latency constraints. | ||
| Fulltext source |
| Management of the document by | rwesley@stanford.edu
| |
[ Pagewise preview ]