pigpen.avro
*** ALPHA - Subject to change ***
Functions for reading avro data.
See: http://avro.apache.org/
Note: These are currently only supported by the local, rx, and pig platforms
load-avro
added in 0.2.13
(load-avro location schema)
*** ALPHA - Subject to change ***
Loads data from an avro file. Returns data as maps with keyword keys
corresponding to avro field names. Fields with avro type 'map' will be maps with
string keys.
Example:
(pig-avro/load-avro "input.avro" (slurp "schemafile.json"))
(pig-avro/load-avro "input.avro"
"{\"namespace\": \"example.avro\",
\"type\": \"record\",
\"name\": \"foo\",
\"fields\": [{\"name\": \"wurdz\",
\"type\": \"string\"},
{\"name\": \"bar\",
\"type\": \"int\"}]}")
Notes:
* Avro schemas are defined on the project's website: http://avro.apache.org/docs/1.7.7/spec.html#schemas
* load-avro takes the schema as a string
* Make sure a piggybank.jar (http://mvnrepository.com/artifact/org.apache.pig/piggybank/0.14.0)
compatible with your version of hadoop is on $PIG_CLASSPATH. For hadoop v2,
see http://stackoverflow.com/a/21753749. Amazon's elastic mapreduce comes
with a compatible piggybank.jar already on the classpath.