Status: Passed, as amended, Jun 89 X3j13
Issue:        PATHNAME-WILD
Forum:        Cleanup
References:   Pathnames (pp410-413)
Related issues: PATHNAME-COMPONENT-VALUE, PATHNAME-LOGICAL
Category:     ADDITION
Edit history: 21-Jul-88, Version 1 by Pitman
              06-Oct-88, Version 2 by Pitman
               9-May-89, Version 3 by Moon (small fixes)
              10-May-89, Version 4 by Moon (add two more functions)
              13-May-89, Version 5 by Moon (minor cleanups, add clarification)
              19-Jun-89, Version 6 by Moon (revise based on extensive
                        discussion in the cleanup subcommittee; rewrite
                        the description of TRANSLATE-PATHNAME so it is
                        possible to understand it)
              23-Jun-89, Version 7 by Moon (simplify TRANSLATE-PATHNAME
                        based on last minute discussion of logical pathnames)
	       2-Jul-89, Version 8 by Masinter (T -> True as per Jun89 X3J13)
	
Problem Description:

  Some file systems provide more complex conventions for wildcards than
  simple component-wise wildcards (:WILD). For example,

  "F*O" might mean:
    - a normal three character name
    - a three-character name, with the middle char wild
    - at least a two-character name, with the middle 0 or more chars wild
    - a wild match spanning multiple directories

  ">foo>*>bar" might imply:
    - the middle directory is named "*"
    - the middle directory is :WILD
    - there may be zero or more :WILD middle directories
    - the middle directory name matches any one-letter name

  ">foo>**>bar" might mean
    - the middle directory is named "**"
    - the middle directory is :WILD
    - there may be zero or more :WILD middle directories
    - the middle directory name matches any two-letter name

  Some file systems support even more complex wildcards, for example
  regular expressions.

  The CL pathname model does not specify a way to represent complex
  wildcards, which means, for example, that (MAKE-PATHNAME :NAME "F*O")
  cannot be recognized by portable code as containing a wildcard.

  Common Lisp provides only the first of these four common operations
  on wildcard pathnames:
  (1) Enumerate the set of existing files that match the pathname;
  this is provided by the DIRECTORY function.
  (2) Test whether a pathname contains wildcards.
  (3) Test whether a pathname matches a wildcard pathname.
  (4) Translate one pathname into another according to a mapping specified
  by a pair of wildcard pathnames.

Proposal (PATHNAME-WILD:NEW-FUNCTIONS):

  Introduce the following three functions:

  1. WILD-PATHNAME-P pathname &optional field-key
  
    Tests a pathname for the presence of wildcard components.  If the first
    argument is not a pathname, string, or file stream an error of type
    TYPE-ERROR is signalled.
  
    If no <field-key> is provided, or the <field-key> is NIL, the result is
    true if <pathname> has any wildcard components, NIL if <pathname> has none.
    If a non-null <field-key> is provided, it must be one of :HOST, :DEVICE,
    :DIRECTORY, :NAME, :TYPE, or :VERSION.  In this case, the result is true
    if the indicated component of <pathname> is a wildcard, NIL if the
    component is not a wildcard.  Note that not all implementations
    support wildcards in all fields, according to PATHNAME-COMPONENT-VALUE.


  2. PATHNAME-MATCH-P pathname wildcard

    true if <pathname> matches <wildcard>, otherwise NIL.  The matching rules
    are implementation-defined but should be consistent with the
    DIRECTORY function.  Missing components of <wildcard> default to :WILD.

    If either argument is not a pathname, string, or file stream an error
    of type TYPE-ERROR is signalled.  It is valid for <pathname> to be a
    wild pathname; a wildcard field in <pathname> will only match a
    wildcard field in <wildcard>, i.e. the function is not commutative.
    It is valid for <wildcard> to be a non-wild pathname.


  3. TRANSLATE-PATHNAME source from-wildcard to-wildcard &key

    Translates the pathname <source>, which matches <from-wildcard>, into
    a corresponding pathname <result>, which matches <to-wildcard>, and
    returns <result>.

    The pathname <result> is <to-wildcard> with each wildcard or missing
    field replaced by a portion of <source>.  A "wildcard field" is a
    pathname component with a value of :WILD, a :WILD element of a
    list-valued directory component, or an implementation-defined portion
    of a component, such as the "*" in the complex wildcard string
    "foo*bar" that some implementations support.  An implementation that
    adds other wildcard features, such as regular expressions, must define
    how TRANSLATE-PATHNAME extends to those features.  A "missing field" is
    a pathname component with a value of NIL.

    The portion of <source> that is copied into <result> is implementation
    defined.  Typically it is determined by the user interface conventions
    of the file systems involved.  Usually it is the portion of <source>
    that matches a wildcard field of <from-wildcard> that is in the same
    position as the wildcard or missing field of <to-wildcard>.  If there
    is no wildcard field in <from-wildcard> at that position, then usually
    it is the entire corresponding pathname component of <source>, or in
    the case of a list-valued directory component, the entire corresponding
    list element.  For example, if the name components of <source>,
    <from-wildcard>, and <to-wildcard> are "gazonk", "gaz*", and "h*"
    respectively, then in most file systems, the wildcard fields of the
    name component of <from-wildcard> and <to-wildcard> are each "*", the
    matching portion of <source> is "onk", and the name component of
    <result> is "honk".  However, the exact behavior of TRANSLATE-PATHNAME
    cannot be dictated by the Common Lisp language and must be allowed to
    vary, depending on the user interface conventions of the file systems
    involved.

    During the copying of a portion of <source> into <result>, additional
    implementation-defined translations of alphabetic case or file naming
    conventions might occur, especially when <from-wildcard> and
    <to-wildcard> are for different hosts.

    If any of the first three arguments is not a pathname, string, or file
    stream an error of type TYPE-ERROR is signalled.  It is valid for
    <source> to be a wild pathname; in general this will produce a wild
    result.  It is valid for <from-wildcard> and/or <to-wildcard> to be
    non-wild pathnames.  (PATHNAME-MATCH-P <source> <from-wildcard>) must
    be true or an error is signalled.
    
    There are no specified keyword arguments for TRANSLATE-PATHNAME, but
    implementations are permitted to extend it by adding keyword arguments.
    There is one specified return value from TRANSLATE-PATHNAME;
    implementations are permitted to extend it by returning additional
    values.

    Implementation guideline: one file system performs this operation by
    examining each piece of the three pathnames in turn, where a piece is a
    pathname component or a list element of a structured component such as
    a hierarchical directory.  Hierarchical directory elements in
    <from-wildcard> and <to-wildcard> are matched by whether they are
    wildcards, not by depth in the directory hierarchy.  If the piece in
    <to-wildcard> is present and not wild, it is copied into the result.
    If the piece in <to-wildcard> is :WILD or NIL, the piece in <source> is
    copied into the result.  Otherwise, the piece is <to-wildcard> might be
    a complex wildcard such as "foo*bar" and the piece in <from-wildcard>
    should be wild; the portion of the piece in <source> that matches the
    wildcard portion of the piece in <from-wildcard> replaces the wildcard
    portion of the piece in <to-wildcard> and the value produced is used in
    the result.

  4. Clarify that the functions OPEN (and WITH-OPEN-FILE), PROBE-FILE,
  FILE-WRITE-DATE, FILE-AUTHOR, and TRUENAME only accept non-wildcard
  pathnames and signal an error if given a pathname for which
  WILD-PATHNAME-P returns true.

  5. Clarify that the functions RENAME-FILE, DELETE-FILE, LOAD, and
  COMPILE-FILE have implementation-defined consequences when given a
  wildcard pathname.  Each function might signal an error or might operate
  on all files that match the wildcard pathname.


Examples:

  ;The following examples are not portable.  They are written to run
  ;with particular file systems and particular wildcard conventions.
  ;Other implementations will behave differently.  These examples are
  ;intended to be illustrative, not to be prescriptive.

  (WILD-PATHNAME-P (MAKE-PATHNAME :NAME :WILD)) => T
  (WILD-PATHNAME-P (MAKE-PATHNAME :NAME :WILD) :NAME) => T
  (WILD-PATHNAME-P (MAKE-PATHNAME :NAME :WILD) :TYPE) => NIL
  (WILD-PATHNAME-P (PATHNAME "S:>foo>**>")) => T   ;Lispm
  (WILD-PATHNAME-P (PATHNAME :NAME "F*O")) => T    ;Most places

  ;This example assumes one particular set of wildcard conventions
  ;Not all file systems will run this example exactly as written
  (DEFUN RENAME-FILES (FROM TO)
    (DOLIST (FILE (DIRECTORY FROM))
      (RENAME-FILE FILE (TRANSLATE-PATHNAME FILE FROM TO))))
  (RENAME-FILES "/usr/me/*.lisp" "/dev/her/*.l")
    ;Renames /usr/me/init.lisp to /dev/her/init.l
  (RENAME-FILES "/usr/me/pcl*/*" "/sys/pcl/*/")
    ;Renames /usr/me/pcl-5-may/low.lisp to /sys/pcl/pcl-5-may/low.lisp
    ;In some file systems the result might be /sys/pcl/5-may/low.lisp
  (RENAME-FILES "/usr/me/pcl*/*" "/sys/library/*/")
    ;Renames /usr/me/pcl-5-may/low.lisp to /sys/library/pcl-5-may/low.lisp
    ;In some file systems the result might be /sys/library/5-may/low.lisp
  (RENAME-FILES "/usr/me/foo.bar" "/usr/me2/")
    ;Renames /usr/me/foo.bar to /usr/me2/foo.bar
  (RENAME-FILES "/usr/joe/*-recipes.text" "/usr/jim/cookbook/joe's-*-rec.text")
    ;Renames /usr/joe/lamb-recipes.text to /usr/jim/cookbook/joe's-lamb-rec.text
    ;Renames /usr/joe/pork-recipes.text to /usr/jim/cookbook/joe's-pork-rec.text
    ;Renames /usr/joe/veg-recipes.text to /usr/jim/cookbook/joe's-veg-rec.text

  ;This example assumes one particular set of wildcard conventions
  (PATHNAME-NAME (TRANSLATE-PATHNAME "foobar" "foo*" "*baz")) => "barbaz"
  (PATHNAME-NAME (TRANSLATE-PATHNAME "foobar" "foo*" "*"))    => "foobar"
  (PATHNAME-NAME (TRANSLATE-PATHNAME "foobar" "*"    "foo*")) => "foofoobar"
  (PATHNAME-NAME (TRANSLATE-PATHNAME "bar"    "*"    "foo*")) => "foobar"

  ;Using Unix syntax and the wildcard conventions used by the
  ;particular version of Unix on which I tried this:
  (NAMESTRING
    (TRANSLATE-PATHNAME "/usr/dmr/hacks/frob.l"
                        "/usr/d*/hacks/*.l"
                        "/usr/d*/backup/hacks/backup-*.*"))
   => "/usr/dmr/backup/hacks/backup-frob.l"
  (NAMESTRING
    (TRANSLATE-PATHNAME "/usr/dmr/hacks/frob.l"
                        "/usr/d*/hacks/fr*.l"
                        "/usr/d*/backup/hacks/backup-*.*"))
   => "/usr/dmr/backup/hacks/backup-ob.l"

  ;This is similar to the above example but uses two different hosts,
  ;U: which is a Unix and V: which is a VMS.  Note the translation
  ;of file type and alphabetic case conventions.
  (NAMESTRING
    (TRANSLATE-PATHNAME "U:/usr/dmr/hacks/frob.l"
                        "U:/usr/d*/hacks/*.l"
                        "V:SYS$DISK:[D*.BACKUP.HACKS]BACKUP-*.*"))
   => "V:SYS$DISK:[DMR.BACKUP.HACKS]BACKUP-FROB.LSP"
  (NAMESTRING
    (TRANSLATE-PATHNAME "U:/usr/dmr/hacks/frob.l"
                        "U:/usr/d*/hacks/fr*.l"
                        "V:SYS$DISK:[D*.BACKUP.HACKS]BACKUP-*.*"))
   => "V:SYS$DISK:[DMR.BACKUP.HACKS]BACKUP-OB.LSP"

  ;This example presumes background information described in PATHNAME-LOGICAL
  (DEFUN TRANSLATE-LOGICAL-PATHNAME-1 (PATHNAME RULES)
    (LET ((RULE (ASSOC PATHNAME RULES :TEST #'PATHNAME-MATCH-P)))
      (UNLESS RULE (ERROR "No translation rule for ~A" PATHNAME))
      (TRANSLATE-PATHNAME PATHNAME (FIRST RULE) (SECOND RULE))))
  (TRANSLATE-LOGICAL-PATHNAME-1 "FOO:CODE;BASIC.LISP"
                        '(("FOO:DOCUMENTATION;" "MY-UNIX:/doc/foo/")
                          ("FOO:CODE;"          "MY-UNIX:/lib/foo/")
                          ("FOO:PATCHES;*;"     "MY-UNIX:/lib/foo/patch/*/")))
   => the pathname MY-UNIX:/lib/foo/basic.l

Rationale:

  1,2,3. These three functions provide a standardized interface to the
  idiosyncratic wildcard functionality of each host file system.

  1. WILD-PATHNAME-P makes it possible to detect wild pathnames reliably and
  do something useful (give up, merge out the bothersome components, call
  DIRECTORY for a list of matching pathnames, etc.)

  2,3. TRANSLATE-PATHNAME is needed by many application programs that deal
  with wildcard pathnames.  PATHNAME-MATCH-P and TRANSLATE-PATHNAME are
  needed by logical pathnames.  The PATHNAME-LOGICAL proposal cannot be
  implemented without these features.  Implementing PATHNAME-LOGICAL could
  involve adding additional capabilities to TRANSLATE-PATHNAME, depending
  on the type of file system used, but those capabilities do not need to
  be in the standard.

  4. Since these functions return a value connected with one file, there
  is no meaningful way to extend them to work on wildcard pathnames.  It
  seems best to specify that they signal an error, rather than leaving
  the consequences undefined.

  5. The consequences are proposed to be implementation-defined because
  current practice varies and no one wants to change.

Current Practice:

  Presumably no implementation supports the proposal exactly as stated.
  Symbolics Genera has had similar features under different names for many
  years:

    (SEND pathname :WILD-P) returns a value such as NIL, :NAME, :TYPE,
    etc., indicating the first wild field.

    (SEND pathname :NAME-WILD-P), (SEND pathname :DIRECTORY-WILD-P),
    etc. test individual fields.

    The :TRANSLATE-WILD-PATHNAME, :TRANSLATE-WILD-PATHNAME-REVERSIBLE, and
    :PATHNAME-MATCH messages resemble TRANSLATE-PATHNAME and
    PATHNAME-MATCH-P.

  The Explorer also supports the messages :WILD-P (although it only
  returns NIL or T), :NAME-WILD-P, etc., :TRANSLATE-WILD-PATHNAME, and
  :PATHNAME-MATCH.

  Points 4 and 5 are current practice as far as the authors are aware.
  The Explorer permits DELETE-FILE on a wild pathname, meaning to delete
  all files that match.

Cost to Implementors:

  Many implementations probably have a substrate which is capable of this
  or something similar already. In such cases, it's a relatively small
  matter to add the proposed interface.

  Even in cases where an implementation doesn't have ready code, it's clearly
  better for the implementor to write that code once and for all than to ask
  each user of wildcards to write it.

  Since the detailed behavior is at the implementor's discretion, the cost
  is unlikely to be large.  Some file systems will do all the work and the
  implementor need only provide an interface to the file system or to a
  standard library routine.  For other file systems the implementor has to
  write the actual matching and translation algorithms.

Cost to Users:

  None.  This change is upward compatible.

Cost of Non-Adoption:

  Wild pathnames would continue to be mistaken for ordinary pathnames in
  many situations.  User programs that deal with wildcard pathnames would
  have to operate on implementation-dependent representations and hence
  would not be easily portable.

  The biggest cost is that the logical pathnames proposal would be stymied.

Performance Impact:

  None.

Benefits:

  A more complete set of wildcard pathname operations.  Portable user
  programs that deal with wildcard pathnames will be more consistent
  and reliable.  A portable system construction tool can be written
  and the foundations are laid for a `logical pathname' facility
  (proposed separately in PATHNAME-LOGICAL).

Aesthetics:

  This change would make some portable code less kludgey.

Discussion:

  There was some question about the name. The name PATHNAME-WILD-P
  suggests a ``slot'' of a pathname (like PATHNAME-HOST),
  while WILD-PATHNAME-P suggests a type (like INPUT-STREAM-P).
  The committee was split on what to call it. Since it is more
  like a type than a slot, the name WILD-PATHNAME-P was chosen.

  It's been suggested that WILD-PATHNAME-P and PATHNAME-MATCH-P be allowed
  to return a value other than T to represent "truth", which would
  somehow encode some additional information.
