The goal of the proposed thesis is a comprehensive multi-modal human-machine interface that allows non-experts to conveniently compose robot programs. Two key characteristics of this novel programming approach are that the system can interpret the user's intent, and that the user can provide feedback interactively, at any time. The proposed framework takes a three-step approach to the problem: multi-modal recognition, intention interpretation, and prioritized task execution. The multi-modal recognition module translates hand gestures and spontaneous speech into a structured symbolic data stream without abstracting away the user's intent. The intention interpretation module selects the appropriate primitives based on the user input, current state, and robot sensor data. Finally, the prioritized task execution module selects and executes primitives based on current state, sensor input, and the task given by the previous step. Depending on the mode of operation, the system can provide interactive robot control, adjustment and creation of primitives, and composition of robot programs. The proposed research is expected to improve significantly the state-of-the-art in industrial robot programming and interactive personal robotics.