SSS Abstracts |
Autograph: Toward Distributed, Automated Worm Signature Detection
Friday, September 17th, 2004 from 12-1 pm in WeH 4601.
Presented by Hyang-Ah Kim,
Today's Internet intrusion detection systems (IDSes) monitor edge networks' DMZs to identify and/or filter malicious flows. While an IDS helps protect the hosts on its local edge network from compromise and denial of service, it cannot alone effectively intervene to halt and reverse the spreading of novel Internet worms. Generation of the worm signatures required by an IDS - the byte patterns sought in monitored traffic to identify worms -today entails non-trivial human labor, and thus significant delay: as network operators detect anomalous behavior, they communicate with one another and manually study packet traces to produce a worm signature. Yet intervention must occur early in an epidemic to halt a worm's spread. In this talk, we describe Autograph, a system that automatically generates signatures for novel Internet worms that propagate using TCP transport. Autograph generates signatures by analyzing the prevalence of portions of flow payloads, and thus uses no knowledge of protocol semantics above the TCP level. It is designed to produce signatures that exhibit high sensitivity (high true positives) and high specificity (low false positives); our evaluation of the system on real DMZ traces validates that it achieves these goals. We extend Autograph to share port scan reports among distributed monitor instances, and using trace-driven simulation, demonstrate the value of this technique in speeding the generation of signatures for novel worms. Our results elucidate the fundamental trade-off between early generation of signatures for novel worms and the specificity of these generated signatures.
This talk is in partial fulfillment of the speaking requirement.
Finding Frequent Items in Distributed Data Streams
Friday, September 24th, 2004 from 12-1 pm in WeH 4601.
Presented by Amit Manjhi,
We consider the problem of maintaining frequency counts for items occurring frequently in the union of multiple distributed data streams. Naive methods of combining approximate frequency counts from multiple nodes tend to result in excessively large data structures that are costly to transfer among nodes. To minimize communication requirements, the degree of precision maintained by each node while counting item frequencies must be managed carefully.
We introduce the concept of a precision gradient for managing precision when nodes are arranged in a hierarchical communication structure. We then study the optimization problem of how to set the precision gradient so as to minimize communication, and provide solutions that minimize worst-case communication load over all possible inputs. We then introduce a variant designed to perform well in practice, with input data that does not conform to worst-case characteristics. We verify the effectiveness of our approach empirically using real-world data, and show that our methods incur substantially less communication than naive approaches while providing the same error guarantees on answers.
This talk is in partial fulfillment of the speaking requirement.
An Efficient Parts-based Near-duplicate and Sub-image Retrieval System
Friday, October 1st, 2004 from 12-1 pm in WeH 4601.
Presented by Yan Ke,
We introduce a system for near-duplicate detection and sub-image retrieval. Such a system is useful for finding copyright violations and detecting forged images. We define near-duplicates as images altered with common transformations such as changing contrast, saturation, scaling, cropping, framing, etc. Our system builds a parts-based representation of images using distinctive local descriptors which give high quality matches even under severe transformations. To cope with the large number of features extracted from the images, we employ locality-sensitive hashing to index the local descriptors. This allows us to make approximate similarity queries that only examine a small fraction of the database. Although locality-sensitive hashing has excellent theoretical performance properties, a standard implementation would still be unacceptably slow for this application. We show that, by optimizing layout and access to the index data on disk, we can efficiently query indices containing millions of keypoints. Our system achieves near-perfect accuracy (100% precision at 99.85% recall) on the tests used by Meng et al., and consistently strong results on our own, significantly more challenging experiments. Query times are interactive even for collections of thousands of images.
This talk is in partial fulfillment of the speaking requirement.
Available Bandwidth Measurement Using Packet Train Probing
Friday, October 29th, 2004 from 12-1 pm in WeH 4601.
Presented by Ningning Hu,
Available bandwidth is a key metric for network performance in that it helps average network users and applications to better control data transmission, and enables network administrators to effectively provision network resources. This talk will discuss how to measure available bandwidth using packet train probing technique. Specifically, it will describe PTR (Packet Transmission Rate) and Pathneck--two network measurement tools using packet train probing. PTR measures end-to-end available bandwidth by searching the highest probing rate that does not disturb background traffic. It demonstrates the importance of packet train configuration, especially probing rate, in available bandwidth measurement. Pathneck locates network bottleneck using an unique way of composing the probing packet train, combining measurement packets and load packets into a single packet train. This design allows Pathneck to obtain available bandwidth information from all the links along a network path, thus locate the bottleneck link. Using both Internet experiments and testbed emulations, we show that both tools significantly reduce the measurement time and overhead comparing with the other tools, therefore making it really useful for regular network users and applications.
This talk is in partial fulfillment of the speaking requirement.
Seurat: A Pointillist Approach to Anomaly Detection
Friday, November 5th, 2004 from 12-1 pm in WeH 4601.
Presented by Yinglian Xie,
Correlation is a recognized technique for improving the effectiveness of intrusion detection by combining information from multiple sources. In this talk, I will describe a new approach to detecting aggregated anomalous events by correlating host file system changes across space and time. Our approach is based on a key observation that many host state transitions of interest have both temporal and spatial locality. Abnormal state changes, which may be hard to detect in isolation, become apparent when they are correlated with similar changes on other hosts. Based on this intuition, we have developed a method to detect similar coincident changes to the patterns of file updates that are shared across multiple hosts. We have implemented this approach in a prototype system called Seurat and demonstrated its effectiveness using a combination of real workstation cluster traces, simulated attacks, and a manually launched Linux worm.
This talk is in partial fulfillment of the speaking requirement.
Space-Efficient Block Storage Integrity
Friday, November 19th, 2004 from 12-1 pm in WeH 4601.
Presented by Alina Oprea,
Storage systems have evolved recently from direct attached storage (DAS) to network attached storage (NAS) and storage area network (SAN)architectures. In these new architectures, client data is stored remotely on potentially untrusted machines, and, thus, new concerns for the privacy and security of the data arise. Two security properties need to be guaranteed: data confidentiality (i.e., the secrecy of the data) and data integrity (i.e., modifications to the original data are detectable by the data owner). Data confidentiality has received a lot of attention in the storage community. Several length-preserving encryption algorithms have been proposed and are currently considered for standardization.
In this talk, I will present new methods to provide block-level integrity in encrypted storage systems. We give two new cryptographic definitions for storage integrity and a new construction for each of them. Our schemes are designed to be used in conjunction with length-preserving encryption algorithms. The constructions are novel in exploiting the fact that distributions of block contents and of block access patterns are not random in practice. We evaluate our constructions through experiments and demonstrate their storage efficiency, compared to solutions that store integrity information for each block.
This talk is in partial fulfillment of the speaking requirement.
Synopsis Diffusion for Robust Aggregation in Sensor Networks
Friday, December 3rd, 2004 from 12-1 pm in WeH 4601.
Presented by Suman Nath,
Previous approaches for computing duplicate-sensitive aggregates in sensor networks (e.g., in TAG) have used a tree topology, in order to conserve energy and to avoid double-counting sensor readings. However, a tree topology is not robust against node and communication failures, which are common in sensor networks. In this talk, I present synopsis diffusion, a general framework for achieving significantly more accurate and reliable answers by combining energy-efficient multi-path routing schemes with techniques that avoid double-counting. Synopsis diffusion avoids double-counting through the use of order- and duplicate-insensitive (ODI) synopses that compactly summarize intermediate results during in-network aggregation. We provide a surprisingly simple test that makes it easy to check the correctness of an ODI synopsis. We show that the properties of ODI synopses and synopsis diffusion create implicit acknowledgments of packet delivery. We show that this property can, in turn, enable the system to adapt message routing to dynamic message loss conditions, even in the presence of asymmetric links. Finally, we illustrate, using extensive simulations, the significant robustness, accuracy, and energy-efficiency improvements of synopsis diffusion over previous approaches.
This talk is in partial fulfillment of the speaking requirement.
Web contact: sss+www@cs