High-level parallel languages (HLPLs) make it easier to write correct parallel 
programs. Disciplined memory usage in these languages enables new optimizations
for hardware bottlenecks, such as cache coherence. In this work, we show how to
reduce the costs of cache coherence by integrating the hardware coherence
protocol directly with the programming language; no programmer effort or static
analysis is required.

We identify a new low-level memory property, WARD (WAW Apathy and RAW
Dependence-freedom), by construction in HLPL programs. We design a new
coherence protocol, WARDen, to selectively disable coherence using WARD. We
evaluate WARDen with a widely-used HLPL benchmark suite on both current and
future x64 machine structures. WARDen both accelerates the benchmarks (by an
average of 1.46x) and reduces energy (by 23%) by eliminating unnecessary data
movement and coherency messages.