* Copyright (c) 1992 Carnegie Mellon University * SCAL project: Guy Blelloch, Siddhartha Chatterjee, * Jonathan Hardwick, Jay Sipelstein, * Marco Zagha * All Rights Reserved. * * Permission to use, copy, modify and distribute this software and its * documentation is hereby granted, provided that both the copyright * notice and this permission notice appear in all copies of the * software, derivative works or modified versions, and any portions * thereof, and that both notices appear in supporting documentation. * * CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" * CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND FOR * ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. * * The SCAL project requests users of this software to return to * * Guy Blelloch and Marco Zagha guy.blelloch@cs.cmu.edu * School of Computer Science marco.zagha@cs.cmu.edu * Carnegie Mellon University * 5000 Forbes Ave. * Pittsburgh PA 15213-3890 * * any improvements or extensions that they make and grant Carnegie Mellon * the rights to redistribute these changes. * This file generated automatically by assembler on 1/13/1993 21:07 * XMVMULT * XMVMULT: A sparse matrix-vector operation (V-*F,V-+F) (internal version) IDENT XMVMULT (D, VALUES, IND, VEC, FLAGS, LASTIND, PROCSUM, VLENGTH, STRIDE, REMAIN) ENTRY XMVMULT BASE D * base 10 XMVMULT ENTER NP=10 * # parameters * int *D; * int *VALUES; * int *IND; * int *VEC; * int *FLAGS; * int *LASTIND; * int *PROCSUM; * int VLENGTH; * int STRIDE; * int REMAIN; B70 A6 ARGADD A1,1,ARGPTR=A6 * D ARGADD A2,2,ARGPTR=A6 * VALUES ARGADD A3,3,ARGPTR=A6 * IND ARGADD S1,4,ARGPTR=A6 * VEC ARGADD A4,5,ARGPTR=A6 * FLAGS ARGADD S2,6,ARGPTR=A6 * LASTIND ARGADD S3,7,ARGPTR=A6 * PROCSUM ARGADD A5,8,ARGPTR=A6 * VLENGTH ARGADD A7,9,ARGPTR=A6 * STRIDE ARGADD S4,10,ARGPTR=A6 * REMAIN VL A5 A6 S1 B71 A6 T70 S2 T71 S3 V0 0 S2 0 S3 0 S5 0 S1 0 A6 A7 A6 A6-1 B72 A6 A5 A5-1 S6 A7 S4 S6-S4 T73 S4 VL A5 S6 1 T72 S6 S7 1 A0 S4 JAZ L5 L1 = P.* S6 S4&S7 * logical and * Jump to start if the number of iterations is even A0 S6 JAZ L2 * Fix and Jump to middle if the number of iterations is odd J L3 L2 = P.* * *** 1st body *** VM0 S2 VM1 S3 A0 A3+A6 V3 ,A0,A7 * Vector load V3 with stride A7 B73 A1 A1 2 VL A1 A0 A4 V2 ,A0,1 * Vector load V2 A1 B73 VL A5 B73 A4 A4 0 A0 A2+A6 V1 ,A0,A7 * Vector load V1 with stride A7 A0 B71 V4 ,A0,V3 * Gather into V4 using indices in V3 V6 #VM&V0 * V6 = If mask zero else V0 S2 V2,A4 A4 A4+1 S3 V2,A4 A4 B73 V7 V1*FV4 * V7 = V1 * V4 V0 V6+FV7 * V0 = V6 + V7 A4 A4-1 A0 A1+A6 ,A0,A7 V0 * Vector store V0 with stride A7 A4 A4-1 S5 S5!S2 * logical or packed mask S1 S1!S3 * *** End 1st body *** S4 S4-S7 A6 A6-1 L3 = P.* * *** 2nd body *** VM0 S2 VM1 S3 A0 A3+A6 V3 ,A0,A7 * Vector load V3 with stride A7 B73 A1 A1 2 VL A1 A0 A4 V2 ,A0,1 * Vector load V2 A1 B73 VL A5 B73 A4 A4 0 A0 A2+A6 V1 ,A0,A7 * Vector load V1 with stride A7 A0 B71 V5 ,A0,V3 * Gather into V5 using indices in V3 V6 #VM&V0 * V6 = If mask zero else V0 S2 V2,A4 A4 A4+1 S3 V2,A4 A4 B73 V7 V1*FV5 * V7 = V1 * V5 V0 V6+FV7 * V0 = V6 + V7 A4 A4-1 A0 A1+A6 ,A0,A7 V0 * Vector store V0 with stride A7 A4 A4-1 S5 S5!S2 * logical or packed mask S1 S1!S3 * *** End 2nd body *** S4 S4-S7 A6 A6-1 A0 S4 JAN L2 L5 = P.* * Bail out if finished both parts S6 T72 A0 S6 JAZ L4 * Clear flag to indicate done with 'full' part S6 0 T72 S6 * Increment vector length for remainder A5 A5+1 VL A5 S6 A7 S4 T73 S4 S6-S4 * Jump to end if the second part is empty A0 S4 JAZ L4 * Jump back up to handle the remainder J L1 L4 = P.* VL A5 S4 T73 A6 B72 VM0 S2 VM1 S3 V0 #VM&V0 * V0 = If mask zero else V0 S2 T70 S3 T71 * Phase 2: sum across row compressing adjustment vectors * load lastindex vector A0 S2 V4 ,A0,1 * Vector load V4 * expand flags VM0 S5 VM1 S1 V5 0 T70 S1 S1 1 V3 S1!V5&VM * If mask S1 else V5 S1 T70 A2 PS5 * Population count (of 1 bits) A4 PS1 * Population count (of 1 bits) A2 A2+A4 S7 0 T70 S1 A6 A5 L7 = P.* A6 A6-1 * execute body of serial loop S1 V3,A6 * Reset running sum if flag is set S2 V0,A6 A0 S1 JAZ L6 S6 V4,A6 A2 A2-1 V2,A2 S7 S7 0 V1,A2 S6 L6 = P.* S7 S7+FS2 A0 A6 JAN L7 A6 S3 ,A6 S7 * Scalar store S7 S1 T70 * gather from lastindex, add adjustment, and scatter back A6 PS5 * Population count (of 1 bits) A4 PS1 * Population count (of 1 bits) A6 A6+A4 A0 A6 JAZ L8 VL A6 A0 A1 V3 ,A0,V1 * Gather into V3 using indices in V1 V4 V3+FV2 * V4 = V3 + V2 A0 A1 ,A0,V1 V4 * Scatter V4 using indices in V1 L8 = P.* A6 B70 EXIT END * END XMVMULT * YMPMULT * YMPMULT: A sparse matrix-vector operation (V-*F,V-+F) (internal version) IDENT YMPMULT (D, VALUES, IND, VEC, FLAGS, LASTIND, PROCSUM, VLENGTH, STRIDE, REMAIN) ENTRY YMPMULT BASE D * base 10 YMPMULT ENTER NP=10 * # parameters * int *D; * int *VALUES; * int *IND; * int *VEC; * int *FLAGS; * int *LASTIND; * int *PROCSUM; * int VLENGTH; * int STRIDE; * int REMAIN; B70 A6 ARGADD A1,1,ARGPTR=A6 * D ARGADD A2,2,ARGPTR=A6 * VALUES ARGADD A3,3,ARGPTR=A6 * IND ARGADD S1,4,ARGPTR=A6 * VEC ARGADD A4,5,ARGPTR=A6 * FLAGS ARGADD S2,6,ARGPTR=A6 * LASTIND ARGADD S3,7,ARGPTR=A6 * PROCSUM ARGADD A5,8,ARGPTR=A6 * VLENGTH ARGADD A7,9,ARGPTR=A6 * STRIDE ARGADD S4,10,ARGPTR=A6 * REMAIN VL A5 A6 S1 B71 A6 T70 S2 T71 S3 V0 0 S2 0 S3 0 S5 0 S1 0 A6 A7 A6 A6-1 B72 A6 A5 A5-1 S6 A7 S4 S6-S4 T73 S4 VL A5 S6 1 T72 S6 S7 1 A0 S4 JAZ L5 L1 = P.* S6 S4&S7 * logical and * Jump to start if the number of iterations is even A0 S6 JAZ L2 * Fix and Jump to middle if the number of iterations is odd J L3 L2 = P.* * *** 1st body *** VM0 S2 VM1 S3 A0 A3+A6 V3 ,A0,A7 * Vector load V3 with stride A7 B73 A1 A1 2 VL A1 A0 A4 V2 ,A0,1 * Vector load V2 A1 B73 VL A5 B73 A4 A4 0 A0 A2+A6 V1 ,A0,A7 * Vector load V1 with stride A7 A0 B71 V4 ,A0,V3 * Gather into V4 using indices in V3 V6 #VM&V0 * V6 = If mask zero else V0 S2 V2,A4 A4 A4+1 S3 V2,A4 A4 B73 V7 V1*FV4 * V7 = V1 * V4 V0 V6+FV7 * V0 = V6 + V7 A4 A4-1 A0 A1+A6 ,A0,A7 V0 * Vector store V0 with stride A7 A4 A4-1 S5 S5!S2 * logical or packed mask S1 S1!S3 * *** End 1st body *** S4 S4-S7 A6 A6-1 L3 = P.* * *** 2nd body *** VM0 S2 VM1 S3 A0 A3+A6 V3 ,A0,A7 * Vector load V3 with stride A7 B73 A1 A1 2 VL A1 A0 A4 V2 ,A0,1 * Vector load V2 A1 B73 VL A5 B73 A4 A4 0 A0 A2+A6 V1 ,A0,A7 * Vector load V1 with stride A7 A0 B71 V5 ,A0,V3 * Gather into V5 using indices in V3 V6 #VM&V0 * V6 = If mask zero else V0 S2 V2,A4 A4 A4+1 S3 V2,A4 A4 B73 V7 V1*FV5 * V7 = V1 * V5 V0 V6+FV7 * V0 = V6 + V7 A4 A4-1 A0 A1+A6 ,A0,A7 V0 * Vector store V0 with stride A7 A4 A4-1 S5 S5!S2 * logical or packed mask S1 S1!S3 * *** End 2nd body *** S4 S4-S7 A6 A6-1 A0 S4 JAN L2 L5 = P.* * Bail out if finished both parts S6 T72 A0 S6 JAZ L4 * Clear flag to indicate done with 'full' part S6 0 T72 S6 * Increment vector length for remainder A5 A5+1 VL A5 S6 A7 S4 T73 S4 S6-S4 * Jump to end if the second part is empty A0 S4 JAZ L4 * Jump back up to handle the remainder J L1 L4 = P.* VL A5 S4 T73 A6 B72 VM0 S2 VM1 S3 V0 #VM&V0 * V0 = If mask zero else V0 S2 T70 S3 T71 * Phase 2: sum across row compressing adjustment vectors * load lastindex vector A0 S2 V4 ,A0,1 * Vector load V4 * expand flags VM0 S5 VM1 S1 V5 0 T70 S1 S1 1 V3 S1!V5&VM * If mask S1 else V5 S1 T70 A2 PS5 * Population count (of 1 bits) A4 PS1 * Population count (of 1 bits) A2 A2+A4 S7 0 T70 S1 A6 A5 L7 = P.* A6 A6-1 * execute body of serial loop S1 V3,A6 * Reset running sum if flag is set S2 V0,A6 A0 S1 JAZ L6 S6 V4,A6 A2 A2-1 V2,A2 S7 S7 0 V1,A2 S6 L6 = P.* S7 S7+FS2 A0 A6 JAN L7 A6 S3 ,A6 S7 * Scalar store S7 S1 T70 * gather from lastindex, add adjustment, and scatter back A6 PS5 * Population count (of 1 bits) A4 PS1 * Population count (of 1 bits) A6 A6+A4 A0 A6 JAZ L8 VL A6 A0 A1 V3 ,A0,V1 * Gather into V3 using indices in V1 V4 V3+FV2 * V4 = V3 + V2 A0 A1 ,A0,V1 V4 * Scatter V4 using indices in V1 L8 = P.* A6 B70 EXIT END * END YMPMULT * CLASTINDEX * Finds index of last seg start and compresses flags to flagd (internal version) IDENT CLASTIN (INCD, FLAGD, OREDFLAGS, FLAGS, VLENGTH, STRIDE, REMAIN) ENTRY CLASTIN BASE D * base 10 CLASTIN ENTER NP=7 * # parameters * int *INCD; * int *FLAGD; * int *OREDFLAGS; * int *FLAGS; * int VLENGTH; * int STRIDE; * int REMAIN; B70 A6 ARGADD S1,1,ARGPTR=A6 * INCD ARGADD A1,2,ARGPTR=A6 * FLAGD ARGADD A2,3,ARGPTR=A6 * OREDFLAGS ARGADD A3,4,ARGPTR=A6 * FLAGS ARGADD A4,5,ARGPTR=A6 * VLENGTH ARGADD A5,6,ARGPTR=A6 * STRIDE ARGADD S2,7,ARGPTR=A6 * REMAIN VL A4 V0 0 S3 0 S4 0 * set v-last-a[i] = i*stride V4 0 V3,VM V4,Z * Compress Index Where V4 Zero S5 16 A7 S5 S5 A5 S5 S5<31 * S5 = S5 left shift 31 V3 V3