17-811 Self-Healing Systems: Class Discussion Summary

David Garlan
Spring Semester 2003

Summary of Class Discussion for March 12 by Owen Cheng

Scribes:
    * Two (Vahe suggested) picked for each meeting.
    * Today: Owen & Bhuricha

Goal for class:
    * Discussion to establish overall framework & Taxonomy

Last class:
    * Ten-thousand foot view; 2 things
    * 1.  Design/run-time
      - Idea that can't get things at run-time right by wiring things in at design-time.
      - Many of the things done in design-time now must happen at run-time
    * 2.  Impact to user...

Discussion of Readings:
    * Why SHS?
      IBM perspective:
          Vahe:
              Labor is becoming relatively expensive
              Complexity management
              "Variable to fixed cost" in the levels
          David:
              Difference between predictive and adaptive
          Kevin:
              Level of trust; predictive doesn't do the adaptation, involves human decision; adaptative does the adaptation with little human intervention
          Paul:
              Another distinction is the SLA layer
              Adaptive codifies with SLA to drive adaptation
              Predictive doesn't have codification in SLA
          David:
              SLA similar to architectural requirements
              In autonomic:  IT goals, higher-level, task related, above SLA, more flexibility than SLA.
              From adaptive to autonomic, difference in level of abstraction.
          Owen:
              SLA involves two sides:  provider vs customer
          Kevin:
              Summarize:  essential point is that SLA is contract
          Paul:
              SLA still at the adaptive level
              The 5 levels are adaptation at different layers of abstraction
          Vahe:
              Reinforces Paul's point by pointing out the each level's IT-task becomes automated by the next level.
          David:
              Still not clear about what's automated in level 5 and not in 4.
              A mismatch?
          Sanjit:
          David:
              What do we do if SLA isn't met at runtime?
              Basic:  We figure out that availability not met
                      We figure out what's the problem, how to fix, etc.
              Managed:  We can at least tell you SLA not satisfied
          Kevin:
              Level 2:  System provides enough data for humans to make decision
              Level 3:
          David:
              Level 4:  Humans decide how to fix it, given correlations, system can do the interaction automatically.
          Joao:
              Level 4 is individual service level
              Level 5 is more global perspective, across services, etc.
              The business' policy is encoded into the system.
              Cost-benefit analysis to maximize profit for whole company.
          David:
              Adaptive:  Might have system make decision, while
              Autonomic:  broader view of the services in system
              Interesting they only call level 5 Autonomic.
          Vahe:
              Example with market maker, busier on Friday than other days.
              Predictive:  based on history, on Thu suggest more servers
              Adaptive:  do the Fri addition of servers
              Autonomic:  not sure about it...
          David:
              Difference, big ideas:
                  Degree of automation
                  Scope of analysis in automation
                      Breadth of concern
                      Future vs past
          David:
              Self-* terms
              Orthagonal, or one built on previous?
          Sanjit and others:
              Orthagonal:  self-optimizing need not be self-healing
          Kevin:
              Example of self-optimizing Database not being able to fix itself

      Mary's Paper:
          David:
              So self-healing isn't self-configuring?  --> nice segue into Mary's paper
              Drawing on slide of value axis, with "Broken" in left extreme, and "Perfect" in right extreme; draw a red bar indicating user's "value" for system.
              Is Mary saying that the system always drive red bar toward the right? But it might not always be ideal, say, in terms of cost...
          Owen:
              "Normal" vs "Perfect"
          Sanjit:
              Normal:  Some equilibrium point, Mary's saying to bring system back into equilibrium point
          Owen:
              Mary's definition of sufficiency seems to suggest user's value as the "normal" region
          David:
              Added blue bar and shifted red bar down, where red is the "broken" value, while blue is the user's preferred "normal" value
          Jungsoo:
              [Missed]
              Brought up degree of self-adaptation: self-configuring, followed by self-healing, followed by self-optimizing.
          Kevin:
              Some point about self-healing vs self-optimzing; some easily catgorized, some not
          Vahe:
              Self-healing is can be done by component itself, while self-optimizing needs outside component.
          Kevin:
          Paul:
              Mary's paper doesn't suggest layer above?
          David:
              Mary's paper seems neutral on this
          Owen:
              Self-healing reactive, while self-optimizing proactive
          Joao:
              Not agree, say Java GC is preventing problem from arising proactively, so it's still self-healing
          Kevin:
              Self-healing is in sense of immunology, you don't fix something until there is a problem
          David:
              Adds self-protecting into diagram
          Joao:
              Self-protecting seems to emphasize external attacks
          David:
              Question to which "environment" is included in automation.
                  - Added environment included decision to big idea.
              Looking at things outside of the system, in the future; two dimensions:  Inside-outside, present-past-future.
              * Self-protecting seems to be external, future
              * Human
          Vahe:
              Human anatomy analogy
              - Self-configuring:  learning new skills (Kevin)?
              - Self-protecting:  you protect yourself from injury
              - Self-healing:  immune system fight disease
              - Self-optimizing:  you go to the gym (Joao)
          David:
              Would self-adaptation mech still makes sense without users?
              Could we imagine these mechanisms independent of users?
              (Kevin) Does not think such a system exists.
              There seems to be user utility based on which these mechanisms make sense.
              Maybe this doesn't make sense.
          Jungsoo:
              Think of user in terms of two aspects:  one as environment, so system responds to environment changes.
              [Missed]
          David:
              External vs internal decision-making
              External:  Rainbow
              Internal:  Self-stabilizing algorithms... no one place you would go to say this is broken; exceptions would be viewed as self-repair
          Owen:
              The external vs internal can't be defined hard-fast, depends on scope.
              System context:  everything within system circle is internal, and outside is external.
              Bring up the ATAM scenario stimuli example, what would be internal stimuli and what external?
          David:
              This leads to open- & closed-loop systems.
              External has model of system different from the model of the system itself.
          Kevin:
              Watcher of the watcher issue
          David:
              Looking at this from a layered perspective, where each layer deals with some aspect of adaptation internally, but also has layer above that does adaptation on the layer below.

    * What are the essential char. of SHS?
    * What are some important distinctions?

Next Step:
    3 Ideas:
    - 1:  Foundational work; understand related domains
    - 2:  Continue discussion now, on layers, framework, dimensions; what category would we use?
    - 3:  Various approaches of adaptation

    Proceed with both--foundation and discuss framework
        Dependability and fault-tolerance
        Also immunology