- First, I had installed scala 2.11.6 from scala-lang.org
- Modified pom.xml and commented (using <!-- -->) out the compiler plugins for quasi-quotes, which are now built into scala 2.11's libraries unlike in 2.10. Check out sql/catalyst/pom.xml.
- Excluded kafka, which does not have maven presence, for 2.11, from the build command line.
- Ran dev/change-version-to-2.11.sh
I built spark with:
mvn -Phadoop-2.4 -Phive -Pyarn -Pscala-2.11 -pl \!external/kafka,\!external/kafka-assembly,\!examples -DskipTests clean package
The list of modules you can exclude are listed in the pom directly under the modules section. "-pl" is a maven command line option to exclude a module (that's an el not a one).2
You may want to use -Phadoop-provided if you are going to run on yarn directly as the AM in that deployment model will already contain the hadoop jars you need. I included yarn so I could run on yarn, but startup is very slow with anything hadoop so you may want to just use the spark master model for everything.
Update for GA:
In the GA release, it appears that they have set a flag to exclude quasi-quotes for scala 2.11. The build did not work for me so I still had to comment out the dependency in the sql/catalyst/pom.xml file. The kafka modules are suppose to be only used for the scala-2.10 profile, however, the pom.xml did not work. Essentially, I still had to do everything that I listed above even for GA.
Note for 1.5.x
You need to read the instructions at: http://spark.apache.org/docs/latest/building-spark.html#building-for-scala-211.
Spark now comes with its own maven distribution so you always use the version that the spark team uses. Look for build/mvn. I had an older maven install so I had to set the M2_HOME variable to the explicit spark distribution maven directory to get the spark supplied maven to run correctly.
No comments:
Post a Comment