pigpen.fold

Fold operations for use with pig/fold, pig/group-by, and pig/cogroup.

See https://github.com/Netflix/PigPen/wiki/Folding-Data

avg

added in 0.2.0

(avg)(avg fold)
Average the values. All values must be numeric. Optionally takes another
fold operation to compose.

  Example:
    (fold/avg)

    (->> (fold/map :foo) (fold/avg)) ; average the foo's
    (->> (fold/keep identity) (fold/avg)) ; avg non-nils
    (->> (fold/filter #(< 0 %)) (fold/avg)) ; avg positive numbers

    (->>
      (fold/map :foo)
      (fold/keep identity)
      (fold/avg))

  See also: pigpen.fold/count, pigpen.fold/sum

count

added in 0.2.0

(count)(count fold)
Counts the values, including nils. Optionally takes another fold operation
to compose.

  Example:
    (fold/count)

    (->> (fold/keep identity) (fold/count)) ; count non-nils
    (->> (fold/filter #(< 0 %)) (fold/count)) ; count positive numbers

    (->>
      (fold/map :foo)
      (fold/keep identity)
      (fold/count))

  See also: pigpen.fold/sum, pigpen.fold/avg

distinct

added in 0.2.0

(distinct)(distinct fold)
Returns the distinct set of values.

Example:
  (fold/distinct)

  (->> (fold/map :foo)
       (fold/keep identity)
       (fold/distinct))

filter

added in 0.2.0

(filter f)(filter f fold)
Pre-processes data for a fold operation. Same as clojure.core/filter.

first

added in 0.2.0

(first)(first fold)
Returns the first output value. This is a post-reduce operation, meaning that
it can only be applied after a fold operation that produces a sequence.

  Example:
    (fold/first)

    (->> (fold/map :foo)
         (fold/sort)
         (fold/first))

  See also: pigpen.fold/last, pigpen.fold/min, pigpen.fold/max

fold-fn

added in 0.2.0

(fold-fn reducef)(fold-fn combinef reducef)(fold-fn combinef reducef post)(fold-fn pre combinef reducef post)
Creates a pre-defined fold operation. Can be used with cogroup and group-by
to aggregate large groupings in parallel. See pigpen.core/fold for usage of
reducef and combinef.

  Example:

    (def count
      (pig/fold-fn + (fn [acc _] (inc acc))))

    (def sum
      (pig/fold-fn +))

    (defn sum-by [f]
      (pig/fold-fn + (fn [acc value] (+ acc (f value)))))

juxt

added in 0.2.0

(juxt & folds)
Applies multiple fold fns to the same data. Produces a vector of results.

Example:
  (fold/juxt (fold/count) (fold/sum) (fold/avg))

keep

added in 0.2.0

(keep f)(keep f fold)
Pre-processes data for a fold operation. Same as clojure.core/keep.

last

added in 0.2.0

(last)(last fold)
Returns the last output value. This is a post-reduce operation, meaning that
it can only be applied after a fold operation that produces a sequence.

  Example:
    (fold/last)

    (->> (fold/map :foo)
         (fold/sort)
         (fold/last))

  See also: pigpen.fold/first, pigpen.fold/min, pigpen.fold/max

map

added in 0.2.0

(map f)(map f fold)
Pre-processes data for a fold operation. Same as clojure.core/map.

mapcat

added in 0.2.0

(mapcat f)(mapcat f fold)
Pre-processes data for a fold operation. Same as clojure.core/mapcat.

max

added in 0.2.0

(max)(max fold)(max comp)(max comp fold)
Return the maximum (last) value of the collection. If a comparator is not
specified, clojure.core/compare is used. Optionally takes another fold
operation to compose.

  Example:
    (fold/max)
    (fold/max >)

    (->>
      (fold/map :foo)
      (fold/max >))

  See also: pigpen.fold/max-key, pigpen.fold/min, pigpen.fold/top

max-key

added in 0.2.0

(max-key keyfn)(max-key keyfn fold)(max-key keyfn comp)(max-key keyfn comp fold)
Return the maximum (last) value of the collection based on (keyfn value).
If a comparator is not specified, clojure.core/compare is used. Optionally takes
another fold operation to compose.

  Example:
    (fold/max-key :foo)
    (fold/max-key :foo >)

  See also: pigpen.fold/max, pigpen.fold/min-key, pigpen.fold/top-by

min

added in 0.2.0

(min)(min fold)(min comp)(min comp fold)
Return the minimum (first) value of the collection. If a comparator is not
specified, clojure.core/compare is used. Optionally takes another fold
operation to compose.

  Example:
    (fold/min)
    (fold/min >)

    (->>
      (fold/map :foo)
      (fold/min >))

  See also: pigpen.fold/min-key, pigpen.fold/max, pigpen.fold/top

min-key

added in 0.2.0

(min-key keyfn)(min-key keyfn fold)(min-key keyfn comp)(min-key keyfn comp fold)
Return the minimum (first) value of the collection based on (keyfn value).
If a comparator is not specified, clojure.core/compare is used. Optionally takes
another fold operation to compose.

  Example:
    (fold/min-key :foo)
    (fold/min-key :foo >)

  See also: pigpen.fold/min, pigpen.fold/max-key, pigpen.fold/top-by

preprocess

added in 0.2.0

(preprocess f')
Takes a a clojure seq function, like map or filter, and returns a fold
preprocess function. The function must take two params: a function and a seq.

  Example:

    (def map (preprocess clojure.core/map))

    (pig/fold (map :foo))

remove

added in 0.2.0

(remove f)(remove f fold)
Pre-processes data for a fold operation. Same as clojure.core/remove.

sort

added in 0.2.0

(sort)(sort fold)(sort c fold)
Sorts the data. This sorts the data after every element, so it's best to use
with take, which also limits the data after every value. If a comparator is not
specified, clojure.core/compare is used.

  Example:

    (fold/sort)

    (->>
      (fold/sort)
      (fold/take 40))

    (->>
      (fold/sort >)
      (fold/take 40))

  See also: pigpen.fold/sort-by, pigpen.fold/top

sort-by

added in 0.2.0

(sort-by keyfn)(sort-by keyfn fold)(sort-by keyfn c fold)
Sorts the data by (keyfn value). This sorts the data after every element, so
it's best to use with take, which also limits the data after every value. If a
comparator is not specified, clojure.core/compare is used.

  Example:

    (fold/sort-by :foo)

    (->> (vec)
      (fold/sort-by :foo)
      (fold/take 40))

    (->> (vec)
      (fold/sort-by :foo >)
      (fold/take 40))

  See also: pigpen.fold/sort, pigpen.fold/top-by

sum

added in 0.2.0

(sum)(sum fold)
Sums the values. All values must be numeric. Optionally takes another
fold operation to compose.

  Example:
    (fold/sum)

    (->> (fold/map :foo) (fold/sum)) ; sum the foo's
    (->> (fold/keep identity) (fold/sum)) ; sum non-nils
    (->> (fold/filter #(< 0 %)) (fold/sum)) ; sum positive numbers

    (->>
      (fold/map :foo)
      (fold/keep identity)
      (fold/sum))

  See also: pigpen.fold/count, pigpen.fold/avg

take

added in 0.2.0

(take n)(take n fold)
Returns a sequence of the first n items in coll. This is a post-reduce
operation, meaning that it can only be applied after a fold operation that
produces a sequence.

  Example:

    (->>
      (fold/sort)
      (fold/take 40))

top

added in 0.2.0

(top n)(top comp n)
Returns the top n items in the collection. If a comparator is not specified,
clojure.core/compare is used.

  Example:
    (fold/top 40)
    (fold/top > 40)

  See also: pigpen.fold/top-by

top-by

added in 0.2.0

(top-by keyfn n)(top-by keyfn comp n)
Returns the top n items in the collection based on (keyfn value). If a
comparator is not specified, clojure.core/compare is used.

  Example:
    (fold/top-by :foo 40)
    (fold/top-by :foo > 40)

  See also: pigpen.fold/top

vec

added in 0.2.0

(vec)
Returns all values as a vector. This is the default fold operation if none
other is specified.

  Example:
    (fold/vec)

    (->> (fold/vec)
      (fold/take 5))