/*
 * PCN Abstract Machine Emulator
 * Authors:     Steve Tuecke and Ian Foster
 *              Argonne National Laboratory
 *
 * Please see the DISCLAIMER file in the top level directory of the
 * distribution regarding the provisions under which this software
 * is distributed.
 *
 * sr_doc_stream.h  -   Documentation for stream versions
 *			of the sr_*.c Send/Receive routines.
 *
 * Please use the instructions in sr_doc.h for implementing your
 * own sned/receive module.
 *
 * The stream versions were partially implemented in sr_bsdipc.c,
 * but never completed.  This was some of the doc for what is
 * in the #ifdef STREAMS in that file.
 */

/*

Each SR (send/receive) module consists of two files:

	sr_*.c	- All of the code that implements initialization,
			sends, and receives.
	sr_*.h	- Any header information that is needed by other 
			parts of the emulator.

The functions that the SR module should implement for the emulator
are:

	_p_sr_get_argdesc()
	_p_sr_init_node()
	_p_sr_node_initialized()
	_p_destroy_nodes()
	_p_abort_nodes()
	_p_alloc_msg_buffer()
	_p_msg_send()
	_p_msg_receive()

	_p_alloc_stream()
	_p_free_stream()
	_p_stream_send()
	_p_enable_stream_receive()


This file contains general documentation describing the use of the SR
module as a whole, as well as descriptions of each procedure that the
SR module must implement.

Argument parsing
================

_p_sr_get_argdesc() is called immediately before command line
arguments are parsed.  It is passed argv and argc, in case something
is needed directory out of them -- for example, sr_bsdipc.c saves a
pointer to argv[0] (the program name) so that it can use it later.

It should fill in its argument argdescp with a pointer to an argument
description array that contains the arguments needed by this SR
module. And n_argdescp should be set to the number of arguments held
in this array.


Initialization
==============
	
The parallel emulator is started up using the following sequence of
calls:

	_p_sr_init_node(...);
	<Do initialization stuff on each node>
	_p_sr_node_initialized();

It is the responsibility of _p_sr_init_node() to make sure all the
nodes are created and their send/receive primitives are initialized.
When _p_sr_init_node() returns, all send/receive operations should be
fully functional.

_p_sr_node_initialized() is basicly just a debugging hook, though it
can also be used to verify initialization.  It need not do anything.
However, it is very useful when debugging a new SR module, because it
is called after all initialization, immediately before the main
emulator loop is entered.  It provides a good place to check out
initialization and test out the SR primitives.  It can also be used to
verify that all the nodes have actually initialized correctly, and if
not then it can shut things down.

There is one other function, sr_fatal_error(), that is used by
_p_sr_init_node(), but that should not be exported to the rest of the
emulator.  Once the emulator has been completely initialized,
_p_fatal_error() (in boot.c) should be used to kill the emulator in
the case of a fatal error.  But _p_fatal_error() should not be called
until all of the SR routines are initialized and functional.  But if
there is an error in _p_init_node(), then _p_fatal_error() cannot be
used.  Therefore, sr_fatal_error() should be used during SR
initialization to kill everything in the case of an initialization
error.  It should try to kill off all nodes by whatever method
possible.

Global variables
================

_p_sr_init_node() is responsible for setting the following global
variables:

	_p_my_id
	_p_host_id
	_p_nodes
	_p_host
	_p_usehost
	_p_default_msg_buffer_size

All nodes of a parallel emulator run are given a unique integer.  If
there are N nodes in the system, they must be numbered 0..N-1, where
the host is always node N-1.  In addition, the system needs to be told
if it should use the host node (N-1) when mapping work to nodes.  In a
implementation like the Cosmic Environment, where there is a host
machine that is separate from all the other nodes, you would generally
not want to use the host for general work.  But in an implementation
like the Sequent Symmetry, where all nodes are the same, you would
want to use the host.

Thus, the first five variables listed above must be set to reflect the
parallel architecture:

_p_nodes :	The number of nodes (N) in the emulator on this run.
_p_my_id :	The node number (from 0..N-1) for my node.
_p_host_id :	The node number for the host (always _p_nodes - 1).
_p_host :	A boolean variable that should be set to TRUE if 
		this is the host (_p_my_id == _p_host_id), otherwise
		it should be set to FALSE.
_p_usehost :	A boolean variable that should be set to TRUE if
		the host should be used when mapping work to nodes,
		otherwise FALSE if it should not.

_p_default_msg_buffer_size : The default message buffer size (in
cells) for message buffers.  This size should not include any header
information that the SR code might tack onto the message.  Thus, if
4096 bytes is an good default message size, cells are 4 bytes each,
and 4 cells are needed for header information, then
_p_default_msg_buffer_size should be set to 1020 (4096/4 - 4).

So what is a good value for _p_default_msg_buffer_size?  That's a good
question -- and one that doesn't have a pat answer.  It is used when
the emulator does not know exactly what size buffer should allocated
before it starts packing stuff into that buffer.  

For example, if a tuple needs to be sent in the message, how much
space should be allocated?  Just enough to allow the first level of
the tuple to be copied?  Or do you allow additional space in case the
tuple contains other tuples (for example, it is a list), so that you
can pack more of the contents of the tuple into the message?

The emulator will always allocate enough space for the top level of
the tuple.  But, if it requires less than the
_p_default_msg_buffer_size to hold the top level, then it allocate a
space of size _p_default_msg_buffer_size, so that it can pack addition
levels of the tuple into the message, if those additional levels
exist.

Finally, one last factor in determining a value for this variable.  As
mentioned, the emulator does not know how much space it needs to
allocate for the message before packing the message into the buffer.
However, after the message is packed into the buffer, it knows exactly
how many cells from the buffer it actually used.  And it is this value
(the number of cells actually used) that is passed to the
_p_msg_send() routine.

Therefore, _p_msg_send() routine need to send the entire allocated
buffer.  It only needs to send the part that is used.  So it is ok to
allocate considerably more space than you actually send.

So, in general, this value should probably be at least 100-200 cells.
That way, at least a few levels of a tuple (such as a list) can be
packed into a single message.  But if memory is available, and your
send/receive routines allow allocation of buffer that are larger than
what is actually sent, then the _p_default_msg_buffer_size should be
made considerably bigger.  What is "considerably bigger"?  At least
1000 cells, and perhaps even more.


Sending messages
================

There are essentially two catagories of messages that are sent by the
emulator: 

	normal messages	- The normal communication done between
				emulator nodes
	stream messages	- Communication done using the special
				stream primitives.

Each catagory of messages has its own way in which messages are sent. 


Sending a normal message
------------------------

Normal messages are sent using the code:

	_p_alloc_msg_buffer(...);
	<Fill in the message buffer>
	_p_msg_send(...);

_p_alloc_msg_buffer() allocates a message buffer of the appropriate
size.  _p_msg_send send the message in that message buffer to a node
and frees the message buffer.


Sending a stream message
------------------------

Stream messaages are sent using the _p_stream_send() function.  No
message buffers are allocated because the arguments to
_p_stream_send() tell it exactly where on heap it should grab its data
from. 


Receiving a message
===================

There is only one function for receiving messages -- _p_msg_receive().
It must handle both normal and stream messages.  However, it must work
in conjunction with _p_enable_stream_receive() to handle stream
messages properly.  These functions will be described in more detail
below. 


Normal termination of the emulator
==================================

The only way the emulator will normally exit is if the the host node
reaches a PCN exit instruction, which will cause the main emulator
loop to exit.

In this situation, the host will send MSG_EXIT messages to all other
nodes.  Upen receipt of a MSG_EXIT message, a node will return a
MSG_EXIT message to the host and then call _p_destroy_nodes().  Once
the host receives a MSG_EXIT message back from each node, it will also
call _p_destroy_nodes.

_p_destroy_nodes() need not do anything.  If it does not do anything,
then the host will send a second MSG_EXIT to each node, followed by an
exit(0).  Each node, upon receipt of the second MSG_EXIT(), will also
do an exit(0).

Thus, under normal circumstances, every node will execute an exit(0)
to shut itself down.  If this is not the proper way for the nodes to
exit on a particular machine, the proper method should be implemented
in _p_destroy_nodes().


Aborting the emulator
=====================

If the emulator encounters a fatal error during its execution (a
signal, corrupt heap, etc), it will call the _p_fatal_error()
function.  That function will try to cleanly shut all nodes of the
emulator down.

Along the way it will call _p_abort_nodes().  If a method exists for
kill all nodes of the emulator, then _p_abort_nodes() should use it.
For example, the Sequent Symmetry version uses a killpg() to kill the
entire process group which consists of all the nodes.  In other SR
modules (sr_machipc), _p_abort_nodes() sends a special abort message
to all other nodes before exiting.  In that case, the _p_msg_receive()
routine watches for an abort message and calls _p_fatal_error() if it
receives one.

In general, the goal of _p_abort_nodes() is to everything possible to
kill all nodes of the emulator, so that under abortive circumstances
some nodes aren't left hanging around while others have terminated.

If a fatal error occurs in the emulator after it has been completely
initialized, there are two procedures (in boot.c) that should be used:

	_p_fatal_error("Error string");

and

	_p_malloc_error();

Neither of these procedures return.  They will kill the node and
hopefully all other nodes as well.


sr_*.h
======

At a minimum, the following needs to be defined in sr_*.h:

#undef PARALLEL
#define PARALLEL

#undef ASYNC_MSG
#define ASYNC_MSG 0

The PARALLEL definition causes all of the parallel emulator code to be
compiled into the emulator.  Without this definition, the emulator
only has the code to run a 1 node emulator.

The ASYNC_MSG definition causes the proper message handling code to
get linked into the emulator.  It signals whether this SR module uses
synchronous (polled) message handling (ASYNC_MSG==0) or asynchronous
message handling (ASYNC_MSG==1).


Asynchronous message handling
=============================

When ASYNC_MSG is set to 0, the emulator will occasionally poll for
new messages.  It does this by calling _p_msg_receive(...,RCV_NOBLOCK) --
a non-blocking receive.  Unfortunately, this can be a relatively
expensive operation.

However, some systems can be set up so that when a message arrives,
the emulator can be asynchronous notified of this fact.  In this
situation, the emulator need not call _p_msg_receive(...,RCV_NOBLOCK) in
order to find out if there are messages.  Rather, the asynchronous
notification can set a variable that the emulator can check, instead
of having to call _p_msg_receive() each time.

When ASYNC_MSG is set to 1, this asynchronous notification is enabled.
Instead of calling _p_msg_receive(...,RCV_NOBLOCK) to check for new
messages, the emulator just checks the _p_msg_avail variable.

Thus, if a SR module uses asynchronous messaging, then it must set
_p_msg_avail to TRUE when a message arrives.  When the emulator finds
that _p_msg_avail has been set to TRUE, only then it will call
_p_msg_receive(...,RCV_NOBLOCK).  So, once _p_msg_receive() handles all
available messages, it should reset _p_msg_avail to FALSE.


Aborting from the emulator
==========================

*/




/*****************************************************************
******************************************************************
**								**
**		PROCEDURE DESCRIPTIONS				**
**								**
******************************************************************
*****************************************************************/




/***********************************************************
void _p_sr_get_argdesc(argdesc_t **argdescp, int *n_argdesc)

Called by boot.c to get a pointer to argument description table.  For
ease of mind's sake, we can initialize the values here since they will
be filled in soon after this call.

********************* _p_sr_get_argdesc() ******************/


/***********************************************************
void sr_fatal_error(char *msg)

Used by _p_sr_init_node() to deal with fatal errors during the
worker creation process.  _p_fatal_error() cannot be called until
everything is up and running.  So sr_fatal_error() fills in until
then. 

********************* sr_fatal_error() *********************/



/***********************************************************
void _p_sr_init_node()

This procedure is responsible for setting up and initializing the SR
module on all nodes.  It is the first thing called.  When it returns,
the SR module should be fully functional.

This module usually works in one of two ways:

1) The host process must spawn off all the nodes (using fork, or rsh,
or some such means), and then initialize itself.

2) On some parallel machines, the OS takes care of loading the
executable onto all nodes simultaneously.  In this case, the procedure
must figure out how to initialize the SR module for the node it is
running on, and get everything setup so that it can communicate with
other nodes.

********************* _p_sr_init_node() ********************/



/***********************************************************
void _p_sr_node_initialized()

This function is called after the node has been completely
initialized.  It need not do anything.  However, it can be useful for
two things:

1) SR module debugging code can be put here.  For example, I often put
a simple ring test in here, just to see if the proper connections are
being made.

2) It can make a final check to make sure all the other nodes came up
ok.  And if it didn't then it can shut down.

********************* _p_sr_node_initialized() *************/



/***********************************************************
void	_p_destroy_nodes()

This procedure is described above under the "Normal termination of
the emulator" section.

To recap, it is called on every node during normal termination of the
emulator.  It can kill all of the nodes.  Or it can do nothing, in
which case all nodes will proceed to execute an exit(0).

********************* _p_destroy_nodes() *******************/



/***********************************************************
void _p_abort_nodes()

This procedure is called from _p_fatal_error() -- when we encounter a
fatal error.  It should do what it can to kill off all of the nodes.

Some typical ways in which this is done:

1) A special (machine specific) procedure is called which will kill
off all the nodes.  For example, on the Sequent Symmetry, killpg() is
called to kill all the nodes.

2) An abort message is sent to the host.  When the _p_msg_receive()
routine on the host receives this abort message, it calls some special
procedure to kill all the nodes.

3) An abort message is sent to all other nodes.  When those other
nodes do a _p_msg_receive() and see the abort messages, then they will
shutdown using _p_fatal_error().

********************* _p_abort_nodes() *********************/



/***********************************************************
cell_t *_p_alloc_msg_buffer(int size)

Allocate a message buffer that will later be used by _p_msg_send().
The 'size' argument specifies how many cells (NOT bytes) the message
buffer should contain.

Note: If this procedure uses malloc(), and the malloc fails, then it
should call _p_malloc_error(), not _p_fatal_error().  The difference
is that _p_malloc_error() does not use fprintf().  One many machines,
once a malloc fails once, it will fail from then on.  Unfortunately,
fprintf() usually uses malloc() for temporary space, so it fails after
a malloc error.  Therefore, _p_malloc_error() does not use fprintf().

Return:	A pointer to a message buffer with 'size' cells.

********************* _p_alloc_msg_buffer() ****************/


/***********************************************************
void _p_msg_send(cell_t *buf, int node, int size, int type)

Sends the message that is pointed to by 'buf' to 'node'.  Only the
first 'size' cells of the buffer need to be sent.  The message has the
specified 'type'.  If buf==NULL (and size==0), then an empty message
of the specified type is sent.

After the send is completed, free the message buffer.

This send will block until the message can be delivered.

If an error occurs, _p_fatal_error() or _p_malloc_error() should be
called to abort the program.

********************* _p_msg_send() ************************/


/***********************************************************
bool_t _p_msg_receive(int *node, int *size, int *type, int rcv_type)

Receive a message from ANY node. Place the message onto the heap.
(And make sure there is room for it on the heap.)

Valid 'rcv_type' arguments are:
	RCV_NOBLOCK	Do not block if no messges are waiting
	RCV_BLOCK	Block until a message is received.
        RCV_PARAMS	Block until a MSG_PARAMS message arrives.
	RCV_COLLECT	When called from _p_garbage_collect().  Ignore
				MSG_COLLECT messages, and queue up
				MSG_DEFINE and MSG_VALUE.  Block until
				we get a MSG_CANCEL or MSG_READ.

Return:	TRUE if we read a message, otherwise FALSE.
		The node, size and type arguments are return values.

********************* _p_msg_receive() *********************/

