Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions assembly/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,14 @@
<artifactId>spark-hadoop-cloud_${scala.binary.version}</artifactId>
<version>${project.version}</version>
</dependency>
<!--
Redeclare this dependency to force it into the distribution.
-->
<dependency>
<groupId>org.eclipse.jetty</groupId>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kinda sucks. Doesn't this also end up pulling up a bunch of other jetty stuff into the packaging?

I guess there's no way around it until Hadoop itself shades jetty in some way...

Copy link
Copy Markdown
Contributor Author

@steveloughran steveloughran Apr 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bq. Doesn't this also end up pulling up a bunch of other jetty stuff into the packaging?

It doesn't pull in anything else. There's already one of the jetty- JARs in the dist/jars directory BTW.

I guess there's no way around it until Hadoop itself shades jetty in some way...

Or when @aajisaka & colleagues implement the Java 9 support and everyone runs to it. This is one of those examples why, from a packaging and deployment perspective, Java 9 is the good one

Created HADOOP-15387 for the shading task, put my name to it as Bikas has already been expressing a desire for it

<artifactId>jetty-util</artifactId>
<scope>${hadoop.deps.scope}</scope>
</dependency>
</dependencies>
</profile>
</profiles>
Expand Down
6 changes: 6 additions & 0 deletions core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,12 @@
<groupId>org.apache.curator</groupId>
<artifactId>curator-recipes</artifactId>
</dependency>
<!-- With curator 2.12 SBT/Ivy doesn't get ZK on the build classpath.
Explicitly declaring it as a dependency fixes this. -->
<dependency>
<groupId>org.apache.zookeeper</groupId>
<artifactId>zookeeper</artifactId>
</dependency>

<!-- Jetty dependencies promoted to compile here so they are shaded
and inlined into spark-core jar -->
Expand Down
221 changes: 221 additions & 0 deletions dev/deps/spark-deps-hadoop-3.1
Original file line number Diff line number Diff line change
@@ -0,0 +1,221 @@
HikariCP-java7-2.4.12.jar
JavaEWAH-0.3.2.jar
RoaringBitmap-0.5.11.jar
ST4-4.0.4.jar
accessors-smart-1.2.jar
activation-1.1.1.jar
aircompressor-0.8.jar
antlr-2.7.7.jar
antlr-runtime-3.4.jar
antlr4-runtime-4.7.jar
aopalliance-1.0.jar
aopalliance-repackaged-2.4.0-b34.jar
apache-log4j-extras-1.2.17.jar
arpack_combined_all-0.1.jar
arrow-format-0.8.0.jar
arrow-memory-0.8.0.jar
arrow-vector-0.8.0.jar
automaton-1.11-8.jar
avro-1.7.7.jar
avro-ipc-1.7.7.jar
avro-mapred-1.7.7-hadoop2.jar
base64-2.3.8.jar
bcprov-jdk15on-1.58.jar
bonecp-0.8.0.RELEASE.jar
breeze-macros_2.11-0.13.2.jar
breeze_2.11-0.13.2.jar
calcite-avatica-1.2.0-incubating.jar
calcite-core-1.2.0-incubating.jar
calcite-linq4j-1.2.0-incubating.jar
chill-java-0.8.4.jar
chill_2.11-0.8.4.jar
commons-beanutils-1.9.3.jar
commons-cli-1.2.jar
commons-codec-1.10.jar
commons-collections-3.2.2.jar
commons-compiler-3.0.8.jar
commons-compress-1.4.1.jar
commons-configuration2-2.1.1.jar
commons-crypto-1.0.0.jar
commons-daemon-1.0.13.jar
commons-dbcp-1.4.jar
commons-httpclient-3.1.jar
commons-io-2.4.jar
commons-lang-2.6.jar
commons-lang3-3.5.jar
commons-logging-1.1.3.jar
commons-math3-3.4.1.jar
commons-net-3.1.jar
commons-pool-1.5.4.jar
compress-lzf-1.0.3.jar
core-1.1.2.jar
curator-client-2.12.0.jar
curator-framework-2.12.0.jar
curator-recipes-2.12.0.jar
datanucleus-api-jdo-3.2.6.jar
datanucleus-core-3.2.10.jar
datanucleus-rdbms-3.2.9.jar
derby-10.12.1.1.jar
dnsjava-2.1.7.jar
ehcache-3.3.1.jar
eigenbase-properties-1.1.5.jar
flatbuffers-1.2.0-3f79e055.jar
generex-1.0.1.jar
geronimo-jcache_1.0_spec-1.0-alpha-1.jar
gson-2.2.4.jar
guava-14.0.1.jar
guice-4.0.jar
guice-servlet-4.0.jar
hadoop-annotations-3.1.0.jar
hadoop-auth-3.1.0.jar
hadoop-client-3.1.0.jar
hadoop-common-3.1.0.jar
hadoop-hdfs-client-3.1.0.jar
hadoop-mapreduce-client-common-3.1.0.jar
hadoop-mapreduce-client-core-3.1.0.jar
hadoop-mapreduce-client-jobclient-3.1.0.jar
hadoop-yarn-api-3.1.0.jar
hadoop-yarn-client-3.1.0.jar
hadoop-yarn-common-3.1.0.jar
hadoop-yarn-registry-3.1.0.jar
hadoop-yarn-server-common-3.1.0.jar
hadoop-yarn-server-web-proxy-3.1.0.jar
hk2-api-2.4.0-b34.jar
hk2-locator-2.4.0-b34.jar
hk2-utils-2.4.0-b34.jar
hppc-0.7.2.jar
htrace-core4-4.1.0-incubating.jar
httpclient-4.5.4.jar
httpcore-4.4.8.jar
ivy-2.4.0.jar
jackson-annotations-2.6.7.jar
jackson-core-2.6.7.jar
jackson-core-asl-1.9.13.jar
jackson-databind-2.6.7.1.jar
jackson-dataformat-yaml-2.6.7.jar
jackson-jaxrs-base-2.7.8.jar
jackson-jaxrs-json-provider-2.7.8.jar
jackson-mapper-asl-1.9.13.jar
jackson-module-jaxb-annotations-2.6.7.jar
jackson-module-paranamer-2.7.9.jar
jackson-module-scala_2.11-2.6.7.1.jar
janino-3.0.8.jar
java-xmlbuilder-1.1.jar
javassist-3.18.1-GA.jar
javax.annotation-api-1.2.jar
javax.inject-1.jar
javax.inject-2.4.0-b34.jar
javax.servlet-api-3.1.0.jar
javax.ws.rs-api-2.0.1.jar
javolution-5.5.1.jar
jaxb-api-2.2.11.jar
jcip-annotations-1.0-1.jar
jcl-over-slf4j-1.7.16.jar
jdo-api-3.0.1.jar
jersey-client-2.22.2.jar
jersey-common-2.22.2.jar
jersey-container-servlet-2.22.2.jar
jersey-container-servlet-core-2.22.2.jar
jersey-guava-2.22.2.jar
jersey-media-jaxb-2.22.2.jar
jersey-server-2.22.2.jar
jets3t-0.9.4.jar
jetty-webapp-9.3.20.v20170531.jar
jetty-xml-9.3.20.v20170531.jar
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hadoop 3.x profile does not shade Jetty any more.

This is different from Hadoop 2.x profile. See #4285.

cc @wangyum @yhuai

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but what do you mean? Apache Spark 2.4.5 Hadoop 2.7 binary has jetty jars while Apache Spark 3.0.0 Hadoop 3.2 binary does not.

$ tar tvf spark-2.4.5-bin-hadoop2.7.tgz | grep jetty
-rw-r--r-- spark-rm/spark-rm   177131 2020-01-13 02:30 spark-2.4.5-bin-hadoop2.7/jars/jetty-util-6.1.26.jar
-rw-r--r-- spark-rm/spark-rm   539912 2020-01-13 02:30 spark-2.4.5-bin-hadoop2.7/jars/jetty-6.1.26.jar

$ tar tvf spark-3.0.0-bin-hadoop3.2.tgz | grep jetty
$

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see [SPARK-30051][BUILD] Clean up hadoop-3.2 dependency

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @dbtsai

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other required jetty jars are still shaded correctly. Please let me know if there is something missed.

$ jar tvf spark-core_2.12-3.0.0.jar | grep jetty | wc -l
    1308

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thank you for answering the question.

jline-2.12.1.jar
joda-time-2.9.3.jar
jodd-core-3.5.2.jar
jpam-1.1.jar
json-smart-2.3.jar
json4s-ast_2.11-3.5.3.jar
json4s-core_2.11-3.5.3.jar
json4s-jackson_2.11-3.5.3.jar
json4s-scalap_2.11-3.5.3.jar
jsp-api-2.1.jar
jsr305-1.3.9.jar
jta-1.1.jar
jtransforms-2.4.0.jar
jul-to-slf4j-1.7.16.jar
kerb-admin-1.0.1.jar
kerb-client-1.0.1.jar
kerb-common-1.0.1.jar
kerb-core-1.0.1.jar
kerb-crypto-1.0.1.jar
kerb-identity-1.0.1.jar
kerb-server-1.0.1.jar
kerb-simplekdc-1.0.1.jar
kerb-util-1.0.1.jar
kerby-asn1-1.0.1.jar
kerby-config-1.0.1.jar
kerby-pkix-1.0.1.jar
kerby-util-1.0.1.jar
kerby-xdr-1.0.1.jar
kryo-shaded-3.0.3.jar
kubernetes-client-3.0.0.jar
kubernetes-model-2.0.0.jar
leveldbjni-all-1.8.jar
libfb303-0.9.3.jar
libthrift-0.9.3.jar
log4j-1.2.17.jar
logging-interceptor-3.8.1.jar
lz4-java-1.4.0.jar
machinist_2.11-0.6.1.jar
macro-compat_2.11-1.1.1.jar
mesos-1.4.0-shaded-protobuf.jar
metrics-core-3.1.5.jar
metrics-graphite-3.1.5.jar
metrics-json-3.1.5.jar
metrics-jvm-3.1.5.jar
minlog-1.3.0.jar
mssql-jdbc-6.2.1.jre7.jar
netty-3.9.9.Final.jar
netty-all-4.1.17.Final.jar
nimbus-jose-jwt-4.41.1.jar
objenesis-2.1.jar
okhttp-2.7.5.jar
okhttp-3.8.1.jar
okio-1.13.0.jar
opencsv-2.3.jar
orc-core-1.4.3-nohive.jar
orc-mapreduce-1.4.3-nohive.jar
oro-2.0.8.jar
osgi-resource-locator-1.0.1.jar
paranamer-2.8.jar
parquet-column-1.8.2.jar
parquet-common-1.8.2.jar
parquet-encoding-1.8.2.jar
parquet-format-2.3.1.jar
parquet-hadoop-1.8.2.jar
parquet-hadoop-bundle-1.6.0.jar
parquet-jackson-1.8.2.jar
protobuf-java-2.5.0.jar
py4j-0.10.6.jar
pyrolite-4.13.jar
re2j-1.1.jar
scala-compiler-2.11.8.jar
scala-library-2.11.8.jar
scala-parser-combinators_2.11-1.0.4.jar
scala-reflect-2.11.8.jar
scala-xml_2.11-1.0.5.jar
shapeless_2.11-2.3.2.jar
slf4j-api-1.7.16.jar
slf4j-log4j12-1.7.16.jar
snakeyaml-1.15.jar
snappy-0.2.jar
snappy-java-1.1.7.1.jar
spire-macros_2.11-0.13.0.jar
spire_2.11-0.13.0.jar
stax-api-1.0.1.jar
stax2-api-3.1.4.jar
stream-2.7.0.jar
stringtemplate-3.2.1.jar
super-csv-2.2.0.jar
token-provider-1.0.1.jar
univocity-parsers-2.5.9.jar
validation-api-1.1.0.Final.jar
woodstox-core-5.0.3.jar
xbean-asm5-shaded-4.4.jar
xz-1.0.jar
zjsonpatch-0.3.0.jar
zookeeper-3.4.9.jar
zstd-jni-1.3.2-2.jar
1 change: 1 addition & 0 deletions dev/test-dependencies.sh
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ MVN="build/mvn"
HADOOP_PROFILES=(
hadoop-2.6
hadoop-2.7
hadoop-3.1
)

# We'll switch the version to a temp. one, publish POMs using that new version, then switch back to
Expand Down
83 changes: 82 additions & 1 deletion hadoop-cloud/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,32 @@
<sbt.project.name>hadoop-cloud</sbt.project.name>
</properties>

<build>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still needed after you removed the committer code?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's in an adjacent PR, I've just pulled in all the POM dependency changes to keep everything related to the dependency digraph in this one so it can be audited in one go.

<outputDirectory>target/scala-${scala.binary.version}/classes</outputDirectory>
<testOutputDirectory>target/scala-${scala.binary.version}/test-classes</testOutputDirectory>
</build>

<dependencies>
<!--used during compilation but not exported as transitive dependencies-->
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still needed after you removed the committer code?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see below

<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<type>test-jar</type>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
<scope>provided</scope>
</dependency>
<!--
the AWS module pulls in jackson; its transitive dependencies can create
intra-jackson-module version problems.
Expand Down Expand Up @@ -147,7 +172,7 @@

<profile>
<id>hadoop-2.7</id>
<!-- Hadoop Azure is a new Jar with -->
<!-- 2.7+ adds the azure Jar to the set of dependencies -->
<dependencies>

<!--
Expand Down Expand Up @@ -180,6 +205,62 @@
</dependencies>
</profile>

<!--
Hadoop 3 simplifies the classpath, and adds a new committer base class which
enables store-specific committers.
-->
<profile>
<id>hadoop-3.1</id>
<dependencies>
<!--
There's now a hadoop-cloud-storage which transitively pulls in the store JARs,
but it still needs some selective exclusion across versions, especially 3.0.x.
-->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-cloud-storage</artifactId>
<version>${hadoop.version}</version>
<scope>${hadoop.deps.scope}</scope>
<exclusions>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
</exclusion>
<exclusion>
<groupId>org.codehaus.jackson</groupId>
<artifactId>jackson-mapper-asl</artifactId>
</exclusion>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
</exclusion>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<!--
The jetty declarations are made
(a) to keep that jetty-util-ajax version in sync with the rest of Spark.
(b) to minimise the effects which Spark's jetty shading has on the
availability of the jetty JARs on for hadoop-azure, which depends
on them.
-->
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-util</artifactId>
<scope>${hadoop.deps.scope}</scope>
</dependency>
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-util-ajax</artifactId>
<version>${jetty.version}</version>
<scope>${hadoop.deps.scope}</scope>
</dependency>
</dependencies>
</profile>

</profiles>

</project>
9 changes: 9 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2671,6 +2671,15 @@
</properties>
</profile>

<profile>
<id>hadoop-3.1</id>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for skipping Hadoop 3.0 and starts to support Hadoop 3.1+ only.

<properties>
<hadoop.version>3.1.0</hadoop.version>
<curator.version>2.12.0</curator.version>
<zookeeper.version>3.4.9</zookeeper.version>
</properties>
</profile>

<profile>
<id>yarn</id>
<modules>
Expand Down