pigpen.parquet
Functions for reading and writing parquet data and for creating parquet
schemas. Start with `load-parquet` and `store-parquet`.
See: http://parquet.incubator.apache.org/
Note: These are currently only supported by the local, rx, and pig platforms.
Note: There are inconsistencies with how strings & byte arrays are stored.
Parquet stores strings as byte arrays, but loses the type of the
original value. PigPen will read all byte arrays as strings.
binary
(binary name)
(binary name repetition)
Defines a field of type BINARY in a parquet schema.
`repitition` is one of :required, :optional, or :repeated; defaults to :required.
boolean
(boolean name)
(boolean name repetition)
Defines a field of type BOOLEAN in a parquet schema.
`repitition` is one of :required, :optional, or :repeated; defaults to :required.
double
(double name)
(double name repetition)
Defines a field of type DOUBLE in a parquet schema.
`repitition` is one of :required, :optional, or :repeated; defaults to :required.
float
(float name)
(float name repetition)
Defines a field of type FLOAT in a parquet schema.
`repitition` is one of :required, :optional, or :repeated; defaults to :required.
int32
(int32 name)
(int32 name repetition)
Defines a field of type INT32 in a parquet schema.
`repitition` is one of :required, :optional, or :repeated; defaults to :required.
int64
(int64 name)
(int64 name repetition)
Defines a field of type INT64 in a parquet schema.
`repitition` is one of :required, :optional, or :repeated; defaults to :required.
load-parquet
added in 0.2.7
(load-parquet location schema)
Loads data from a parquet file. Returns data as maps with keywords matching
the parquet column names. The parameter `schema` is a parquet schema.
Example:
(load-parquet "input.pq" (message (int64 "value")))
See also: pigpen.parquet/message for schema details
See also: https://github.com/apache/incubator-parquet-mr
message
(message name & fields)
Defines a parquet schema. `name` is a string. To define fields, see:
pigpen.parquet/int32
pigpen.parquet/int64
pigpen.parquet/float
pigpen.parquet/double
pigpen.parquet/boolean
pigpen.parquet/binary (used for strings)
Note: Complex data structures (GroupType) are not supported at this time.
store-parquet
added in 0.2.7
(store-parquet location schema relation)
Stores data to a parquet file. The relation prior to this command must be a
map with keywords matching the parquet columns to be stored. The parameter
`schema` is a parquet schema.
Example:
(store-parquet "output.pq" (message (int64 "value")) foo)
See also: pigpen.parquet/message for schema details
See also: https://github.com/apache/incubator-parquet-mr