Parallelism
English prose on parallelism.
Some sources
A programming technique for non-rectangular data: I think the first paper to present the segmented operations that are the bread-and-butter of flattening nested data parallelism. The paper discusses a "partition operator", which applies a given function to the segments of an array given by what we would today call a "flag array" (a vector of booleans indicating when a new segment begins). However, the partition operator cannot be implemented in APL in its full generality; instead the paper presents implementations of the partition operator specialised to various built-in functions (e.g. prefix sum, to produce segmented prefix sum).
An O(n² log n) parallel max-flow algorithm: to my knowledge the paper with the first instance of the now-common work/span model for characterising parallel algorithms. It was not at this time a language-based parallel cost model, however - I think Blelloch may be the first to have had that idea.
More Fun with Monoids by Oleg Kiselyov describes the theory behind monoids in a map-reduce context, and gives a very nice explanation of vertical composition of monoids - essentially a kind of (manual) fusion. Apart from being theoretically beautiful, it is also eminently practical - the performance advantages can be quite dramatic. The most mind-bending example in this paper is implementing Boyer-Moore majority voting with an operator that is not as such associative, but is in fact associative given an extended definition of equality.