Internet use is ubiquitous – it is the primary medium for everything from trading commodities to collecting stamps. The benefits of web applications are numerous: they are available anywhere without installation, new versions can be released frequently and instantaneously, collaboration is easier because data is stored centrally, distribution costs are miniscule, etc. Web development also appeals to a broader audience than any other programming paradigm – most students have websites or blogs, but they will never write desktop applications.
Web programs typically involve three or more tiers each written in a unique set of languages. The front end is written in HTML, XML and JavaScript, the middle tier in Java, C# or Ruby, the back end in SQL or XQuery, and in complex applications, messages are passed between sub-applications and to/from external systems via COM, CORBA, or SOAP.
Software written in multiple languages is more difficult to write correctly: it cannot take advantage of language features that ensure quality and it is redundant. Each language operates in a different type system and it is therefore impossible to typecheck the entire system. An injection attack takes advantage of this by passing unexpected, but unchecked, values from the browser to the server. It is also impossible to verify HTML well-formedness without a type system. Complex webpages are especially problematic because their structure is constantly changing. A single page that relies heavily on AJAX can send the browser an enormous range of possible HTML – this is not checkable by a human. In addition, the same logic often spans multiple tiers of the application to increase security, usability, or performance. For example, a bank will use JavaScript to validate that a withdrawal does not exceed the account balance to give the user immediate feedback. This needs to be revalidated on the server for security reasons (e.g. in case JavaScript is turned off). A developer might later change the validation logic in only one tier, breaking the application. This is a classic case of redundancy, and as always it reduces quality.
Similarly, it is difficult to ensure that web applications conform to a security model, yet they face serious security threats. A web server cannot trust that a client has validated the data it sent. Even more ominously, the server must severely restrict the types of code that can be run in reaction to client requests. Further, the server must ensure that there is no open "back door" that allows unauthorized client access, like that of a SQL injection attack. The most sophisticated programs are the most at risk because they send frequent and varied messages between client and server, increasing the number of potential security holes.
Several aspects of web programming reduce developer productivity. Each browser implements HTML and JavaScript differently, a problem that is particularly acute in mobile browsers. Developers are forced to adjust their client code to maximize compatibility. In many cases, developers write separate front ends for each mobile platform. Also, since the web developer writes code in many languages, she must tediously encode/decode messages between tiers and she must make constant mental context switches, slowing down her work.
Programming Language (PL) researchers and industry have both noticed this problem and presented solutions. Many PL solutions [1,2] unify web application development into a single functional language, but they have limitations. They cannot take advantage of powerful industry tools like application servers and IDEs. They do not support communication standards, preventing interoperability with external systems. Finally, functional languages avoid state and mutable data, but the focus of web programming is the manipulation of stateful page elements. On the other hand, industry solutions, like Google Web Toolkit and ASP.NET, ignore the most powerful tool available – language design. They are piecemeal solutions that cannot produce the safety guarantees or constructability gains of a unified approach. Simple tasks are elegant, but complex tasks must be written in native code outside the framework.
I plan to design and implement an imperative programming language for the web where: the application is written as one program, all expressions are typed, security model support is built in, and structured data are first class elements; I hypothesize that this language will be adoptable, and enable high quality, secure, and easily constructible web applications. In addition, my unified approach to web programming, can serve as a teaching tool for the undergraduate computer scientist who currently is not taught to write a web application. Instead, he learns on the job and scrambles for anything that works – yielding the messy result we see in practice.
I hope to improve quality by statically enforcing that language expressions are well typed. For the language and type system to be useful, it must be capable of building rich client applications on the browser. This requires that the language provide an interface to the DOM that allows the programmer to use all page elements (e.g. controls, events, etc.) as first class components of the language. This will enable me to guarantee the well-formedness of HTML pages even when dynamically constructed. This is technically challenging primarily because there is a mismatch between the XML based structure of web pages and the objects of mainstream programming languages. I hope to build language features for manipulation of structured data, that extend the excellent, yet immutable, language features introduced by C# [3]. In addition, unified code will reduce redundancy, eliminating the corresponding quality problems.
Security will be addressed by supporting language features that can verifiably implement a security model. In my language, the location and distributed messages will be explicitly stated in the code. These features will allow me to develop a tool that analyzes the code statically to ensure that it is safe. For example, I will verify that all client side validation (like the earlier bank example) is also performed on the server.
I will approach constructability via compilation. The language will compile to both server and client code with automatic encoding/decoding. All communication will conform to industry standards allowing integration with external systems. I will also provide an interface that will allow the application to be compiled so as to run standalone, within a browser, or natively on a large swath of smartphones and PDAs. Finally, I will maintain the currently supported separation between presentation and logic, allowing developers and designers to focus on their strengths.
I intend to test my hypothesis using four techniques: proofs, compilers, case studies, and benchmarks. I will build a formal type system for my language and prove that my language is type safe and enforces security constraints. I will build a compiler for the language into JavaScript, Java, and SQL to validate that the language is implementable. I intend to test my new language against standard industrial languages in a set of case studies, by re-implementing existing web applications in my new language. The language will be successful if it captures these interesting examples succinctly, provides stronger security guarantees, and if it is flexible enough to allow common programming idioms.
The web has already had tremendous impact on communications, commerce and journalism. My language will improve the quality, security, and construction of web applications. An elegant yet practical language for the web also provides a powerful tool for reaching a new audience for computer science – a broader audience, more interested in building collaborative web applications, than a cutting edge file system or hashing algorithm.
- COOPER E, ET AL. Web programming without tiers. In FMCO '06.
- MURPHY VII T, ET AL. Type-safe distributed programming with ML5. In TGC '07.
- BIERMAN G, ET AL. The Essence of Data Access in Cω. In ECOOP '05.