It's taken a long time to get here but there's some interesting discussion of the Java 8 Collector API which provides some stream processing that has builtin parallelism.
Rather like a generalisation of the Clojure reducers library which also provides standard Stream processors (because typing, OO, factories)