FAILED: Error in metadata: org.jets3t.service.S3ServiceException: Failed to sanitize XML document destined for handler class org.jets3t.service.impl.rest.XmlResponsesSaxParser$ListBucketHandler null 'null' -- ResponseCode: -1, ResponseStatus: null, RequestId: null, HostId: null
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
There's some discussion in the aws forums. The underlying cause is that it's running out of memory when trying to build the partition list.
A workaround is to increase the HADOOP_HEAPSIZE. This can be done by modifying hadoop-user-env.sh with an EMR bootstrap action. On an m1.large instance, 2G seems to do the trick for us.
Upload a script like the following somewhere in s3:
You can now run this bootstrap action as part of your job:
elastic-mapreduce --create --alive \
--name "large partitions..." --hive-interactive \
--num-instances 1 --instance-type m1.large \
--hadoop-version 0.20 \
You should now be able to load your partitions.