The SCEC TeraShake Earthquake Simulation Minster, J., Olsen, K B., Moore, R., Day, S. , Maechling, P., Jordan, T., Faerman, M. , Cui, Y., Ely, G., Hu, Y., Shkoller, B. , Marcinkovich, C., Bielak, J., Okaya, D. and Archuleta, R., Wilkins-Diehr, N., Cutchin, S., Chourasia, A.,Kremenek, G., Jagatheesan, A. , Brieger, L., Majundar, A., Chukkapalli, G. , Xin, Q., Banister, B., Thorp, D., Kovatch, P. , Diegel, L., Sherwin, Thiebaux, M., Lopez J. In EOS Trans. AGU 2004, Abstract The southern portion of the San Andreas fault, between Cajon Creek and Bombay Beach has not seen a major event since 1690, and has therefore accumulated a slip deficit of 5-6 m. The potential for this portion of the fault to rupture in a single M7.7 event is a major component of seismic hazard in southern California and northern Mexico. TeraShake is a large-scale finite-difference (fourth-order) simulation of such an event based on Olsen's Anelastic Wave Propagation Model (AWM) code, and conducted in the context of the Southern California Earthquake Center Community Modeling Environment (CME). The fault geometry is taken from the 2002 USGS National Hazard Maps. The kinematic slip function is transported and scaled from published inversions for the 2002 Denali (M7.9) earthquake. The three-dimensional crustal structure is the SCEC Community Velocity model. The 600km x 300km x 80km simulation domain extends from the Ventura Basin and Tehachapi region to the north and to Mexicali and Tijuana to the south. It includes all major population centers in southern California, and is modeled at 200m resolution using a rectangular, 1.8 giganode, 3000 x 1500 x 400 mesh. The simulated duration is 200 seconds, with a temporal resolution of 0.01seconds, maximum frequency of 0.5Hz, for a total of 20,000 time steps. The simulation is planned to run at the San Diego Supercomputer Center (SDSC) on 240 processors of the IBM Power4, DataStar machine. Validation runs conducted at one sixteenth (4D) resolution have shown that this is the optimal configuration in the trade-off between computational and I/O demands. The full run will consume about 18,000 CPU.hours. Each time step produces a 21.6GByte mesh snapshot of the entire ground motion velocity vectors. A 4D wavefield containing 2,000 time steps, amounting to 43 Tbytes of data, will be stored at SDSC. Surface data will be archived for every time step for synthetic seismogram engineering analysis, totaling 1 Tbyte. The data will be registered with the SCEC Digital Library supported by the SDSC Storage Resource Broker (SRB). Data collections will be annotated with simulation metadata, which will allow data discovery operations on metadata-based queries. The binary output will be described using HDF5 headers. Each file will be fingerprinted with MD5 checksums to preserve and validate data integrity. Data access, management and data product derivation will be provided through a set of SRB APIs, including java, C, web service and data grid workflow interfaces. High resolution visualizations of the wave propagation phenomena will be produced under diverse camera views. The surface data will be analyzed online by remote web clients plotting synthetic seismograms. Data mining operations, spectral analysis and data subsetting are planned as future work. The TeraShake simulation project has provided some insights about the cyberinfrastructure needed to advance computational geoscience, which we will discuss.