Skip to content

never-summer/emr-simple

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

emr-simple

This realisation for parse domain name in msgtype=response in raw data WARC file on Amazon EMR.
And summary domain name in reduce, send result for SQS Amazon. Input files get form url. Special for Noah Silverman.
Example usage:
s3://aws-publicdatasets/common-crawl/crawl-data/CC-MAIN-2015-40/segments/1443736672328.14/warc/CC-MAIN-20151001215752-00000-ip-10-137-6-227.ec2.internal.warc.gz
elasticmapreduce/outDir/
aws-logs-xxx-us-west-2
sqsQueueName
us-west-2
accessKey
secretKey

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages