MIME-Version: 1.0 Server: CERN/3.0 Date: Sunday, 01-Dec-96 18:54:21 GMT Content-Type: text/html Content-Length: 25954 Last-Modified: Wednesday, 21-Aug-96 19:44:20 GMT
Department of Computer Science
Cornell University
In the case of connection-oriented packets, a channel specifies an endpoint- to-endpoint connection (i.e. a connection from one process to another). If process A connects to process B, then A can only communicate with B, and B can only communicate with A on that channel. If A wants to communicate with C, then it must open a separate channel.
In the case of connection-less packets, program A can open a broadcast or multicast channel. Any other process listening on that channel can receive packets sent from A.
Together, cells and hash tables are connected together to form a directed a- cyclic graph (DAG). When a packet comes in off of the ethernet wire, each protocol in the composite packet header is broken down into fields. These fields are then compared to information stored in the cells. As matches are found, we follow the appropriate path through the DAG.
The next section breaks down the current architecture of the U-Net Packet Filter, and discusses the protocols it currently supports.
These types of channels are not part of the UPF system. In the event that a Raw U-Net channel is detected, the UPF system is by-passed, and the packet is demultiplexed based strictly on the U-Net port number. If the ethernet card is running in promiscuous or multicast mode, then only the U-Net Type and U- Net-Port identify an endpoint/channel. These channels are not guaranteed to be unique.
The combination of these two fields uniquely specify an endpoint (unless the Source Addr is an ethernet multicast or broadcast address). The ethernet type can be any 16-bit value, provided that it is not a reserved field (e.g. the IP ethernet type is 0x0800, and is a reserved type).
In the case where the only open channel's consist of Raw U-Net (not in the DAG) or Raw Ethernet channels, the DAG is merely a linked-list of ETH- TYPE cells (see the Cell Specification Page).
When a Raw Ethernet Packet is received, fields the ethernet source and raw ethernet type are compared with the corresponding cell values for each element in the linked-list, until a match is found. For the case(s) in which a large number of Raw Ethernet channels are open, performance may be improved by replacing this linked list with a hash table.
This is a composite point-to-point protocol. The outer protocol is IP, and the inner is TCP. I refer to this as a single filter, as only both protocols together identify an endpoint. In reality, they are separate protocols.
The combination of the following fields uniquely identifies a TCP/IP endpoint:
Notice that fields from three different protocols uniquely identify a channel. This corresponds to a traversal of the packet filter's DAG. UPF is designed to be fast. This means that in the interests of speed, we want to drop a packet from contention as soon as it is realized that the packet will not demultiplex to any endpoint.
Of the 7 preceding fields, we know that the packet is not IP if the ethernet type field does not match IP (0x0800). As an optimization, the IP ethernet cell is always located at the head of the Ethernet-level linked-list. Since IP demultiplexing is expensive, this results in just a single ethernet-type check.
At this point, we now know the packet is of type IP. At the next level in the DAG, we check the protocol and the IP version number. If the protocol is un- defined (i.e. is not TCP or UDP, ..., in this case TCP), or if the IP version number does not match the filter's IP version number, the packet is dropped. Again, this results in traversing a linked-list of protocol cells, or as its referred to in the source code, IP_PROTO cells, until a match is or isn't found.
If a match is found, then we descend another level into the DAG. At this
point, we drop the packet if the destination port (TCP Protocol), does not
exist in the filter. Since there can be as many as 2^16 possible ports, we
perform a lookup into a hash table. The hash table itself is connected to the
IP_PROTO cell corresponding to the TCP protocol. The hash function is
defined within the source code by the function hash_func_port().
Since it is possible for multiple channels to exist on the same
The 3 remaining fields, IP Source addr, IP Destination addr, and TCP source
port, uniquely identify a TCP channel. A hash function is then used, defined
by the function hash_func_chan() in the source code, to hash into a table
connected to the IP_CHANNEL_HASH node. Again, there can be more
than one node connected to a single index in the hash table. The hash
function is meant to spread out the different channel combinations,
so that a chain of IP_CHANNEL nodes will not have to be walked.
If a match is found in the IP_CHANNEL linked list, we demultiplex the
packet to the corresponding endpoint.
In summary, demultiplexing TCP/IP packets requires traversing the DAG
through 3 protocols.
Note that the current implementation does not support IP fragmentation.
This would be a natural extension to TCP/IP, as well as UDP/IP.
Traversal of the DAG for UDP/IP is nearly identical to that of
TCP/IP. The difference between the two protocols is that UDP/IP can be
connection-less, while TCP/IP is always connection-oriented. This means
that its possible to establish not only unicast channels, but multicast and
broadcast channels as well.
Traversal of the DAG for UDP/IP requires the same seven fields that were
used in TCP/IP. The difference is that at channel setup time, a unicast,
multicast, or broadcast channel type can be selected. If the user creates a
unicast channel, then traversal of the DAG occurs exactly as it did for TCP/IP,
except that the transmission protocol is now UDP.
Choosing to open a multicast channel results in creating a new level of the
DAG. Graph traversal proceeds as it did for TCP/IP, all the way to the
IP_CHANNEL node. In UDP, the IP_CHANNEL node contains an x_cast
field, which is set to either UNICAST, MCAST, or BCAST. If the field is
MCAST, then we descend a level to a linked list of IP_MCAST cells.
Each node in this linked list corresponding to an IP_CHANNEL with the
same address information, but with pointers to different channels. In the
event that a multicast IP_CHANNEL is matched, then we walk the entire
IP_MCAST linked list, and demultiplex at each IP_MCAST cell.
If the IP_CHANNEL node is of type BCAST, then we must demultiplex to
every active channel on this port (dest_port). This results in a traversal of
every hash-chain for every index in the hash table connected to the
IP_CHANNEL_HASH node. The packet is then demultiplexed to every
unicast and multicast channel open on the port. If another broadcast channel
is encountered, it is ignored (since a broadcast is already taking place).
Multicast channels result in demultiplexing to many channels. This involves
copying the entire packet into multiple regions of user memory, for as many
multicast channels that are open. Broadcast channels result in even poorer
performance, since the packet must be demultiplexed to every open channel
on the port.
In general, multicast and broadcast channels may be useful, but they incur a
severe performance overhead.
The combination of only a WILD IP source address forms the highest
priority, while the combination of all WILDS creates the lowest priority.
Wildcard channels will only pick up packets if there is no matching channel
from the next-highest priority level. This functionality is built into
hash_func_chan(). Each IP_CHANNEL hash table is partitioned into
HASH_ENTRIES - 7 indexed hash buckets. During graph traversal on
packet reception, if a packet cannot be demultiplexed via the indexed hash
chain, then we traverse each priority level on the corresponding destination
port. If a matching wild-card channel is found, UPF demultiplexes on that
channel, and graph traversal ends. If not, then UPF continues searching each
next lowest priority level until they are all exhausted, at which time the
packet can be dropped.
Note that UDP Multicast and Broadcast channels are implemented as wild-card
channels. Since the channels are by definition connection-less, the IP source
address and source port fields are WILD. The down-side of this setup is that
matching a multicast or broadcast channel requires exactly 3 hash misses (1
for the missed index, and 2 for the preceding wild-card priority levels). If a
U-Net parallel program were to rely heavily on IP multicast or broadcast, this
scheme may require alteration to boost performance.
The source tree contains the following items:
Raw-Ethernet's performance was about as expected. Traversal of the DAG is not free,
and requires some cycles. TCP/IP's performance is a little disappointing, especially since
the channel test was connection-oriented, and was the only channel open during testing.
However, the significant performance drain could be partially attributed to the
hash lookups implemented with the IP filter.
In order to demultiplex to a TCP/IP channel, a packet must
traverse through two hash-table lookups. In its current implementation, both
hashing functions use a modulo operator. The modulo of the hash function is taken against
the number of available hash buckets. The use of the modulo operator incurs a large
expense, and might explain a significant portion of TCP/IP's performance drain.
One way the hash functions can be improved would be to replace the modulo
operations with an alternative similar operation,
in order to hash a packet into one of the available hash buckets.
Rather than use modulo, we could logically and the result of the hash
function with a mask corresponding to the log (base 2)
of the size of the hash table. This would perform a function similar to the
modulo operation while enormously reducing the
computational expense of the operation.
There are undoubtably numerous optimizations one can make to the packet filter
to enhance performance. This is left as future work.
The designer of a new filter should feel free to define there own cell types, as
is necessitated by the new filter. The designer should feel free to modify the
actual cell structure, if will improve the overall packet filter design,
and can also define new hash functions as is needed.
The current implementation of UPF handles multicast and broadcast channels.
However, in order to properly join a multicast group, a process needs access
to an IP multicast to Ethernet address resolver. The de facto way of doing
this now is via IGMP. Therefore, the addition of an IGMP filter to UPF might
be a good idea.
If a group of U-Net processes were to need to dynamically need to create new IP channels,
then they would also require a way to resolve straight ethernet to IP address translations.
The addition of the Address Resolution Protocol (ARP) filter to UPF
would also make a logical extension.
In general, as many extensions can be made to UPF as is required by the needs
of the U-Net user(s)
4.3 Wild-Card Channels in IP
Both TCP/IP and UDP/IP composite protocols support wild-card channels.
In order to demultiplex to an endpoint, the minimal requirements specify
that the IP ethernet type, the IP Version number, the transport protocol, and a
destination port must be specified. The remaining fields, IP destination
address, IP source address, and the source port can remain un-specified (they
are declared as a WILD during channel creation). The combination of 3
wild-card fields results in seven distinct wild card priority levels, which are
illustrated below.
5.0 Using UPF with U-Net, Benchmarks, and Demonstration
Programs
You can obtain a copy of the UPF U-Net source tree, by saving
this link. The
UPF code is designed to run on a Linux system over Fast Ethernet. However,
it should be extremely portable to any U-Net system running over Fast
Ethernet. This version, has been tested and bench-marked on Linux 1.3.71
using the U-Net kernel patch (
see the U-Net supporting documentation).
6.0 Performance Benchmarks
The U-Net Packet Filter was evaluated using a 133Mhz Pentium and a 90Mhz Pentium,
connected to a Fast Ethernet Switch. The programs pingpong_unet, pingpong_eth,
and pingpong_ip, were used to test the roundtrip times for sending messages
using Raw Unet, Raw Ethernet, and TCP/IP, respectively. 1000 forty-byte packets
were sent per test. The following results were taken from several runs.
Raw-Ethernet performed 4% slower than Raw-Unet. TCP/IP performed 17% slower than Raw-Unet,
and 13% slower than Raw-Ethernet.7.0 How to Go About Adding a New Filter to UPF
Each filter is some combination of cells and hash tables connected together.
In meeting the second design goal, the integration of a new filter to UPF is
meant to be modular. Each module requires at least 3 routines.
The current implementation supports 4 different filters: Raw Ethernet, IP,
TCP, and UDP. For each segmented protocol, there are exactly 3 essential
routines (see the source code). That is, the aforementioned 3 routines exists for
every module. Actually, there is only one find_Protocol in the UPF system
(upf_findchannel()). However, this routine is broken up into C case statements,
essentially breaking down the task of demultiplexing composite
headers by protocols.8.0 Memory Management
All memory allocatation in the UPF system is chained together off of a
pointer on the tulip-device structure. As new cells and hash-tables are allocated,
a new "wastebucket" structure is added to a garbage collecting list of
wastebuckets. Each wastebucket points towards the allocated structure. Each
structure in turn points back to the corresponding waste bucket. A call to
upf_shutdown() results in a traversal to each wastebucket. At each bucket,
the data structure it points to is de-allocated. Since each structure points to its
corresponding wastebucket, a call to upf_deactivate() results in proper
maintenance of the watebucket linked-list.
9.0 Other Possible Extensions to UPF
Througout this document, some possible extensions to UPF have been mentioned.
In this section, we will discuss a few more.10.0 Conclusions
The attempt to create a packet filter that can be used
with U-Net was successful. In doing so, I believe that our original design goals were met.
This is, however, a beginning. There is still a lot work that must be done in
developing both U-Net, and the U-Net Packet Filter. Particulary in
the case of UPF, performance can be improved. Improvements to the current
hashing implementation, and tweaks to the source code in general will
probably boost overall perforance of the filters.11.0 Acknowledgments
I would like to thank the following individuals for their advice, guidance, and
suggestions in designing and developing UPF:
12.0 References