1.安装Docker
参考https://docs.docker.com/engine/installation/linux/ubuntu/ 安装docker
2.选择基础镜像
这里选择ubuntu:16.04版本
docker pull ubuntu:16.04
运行ubuntu镜像:
docker run --rm -it ubuntu:16.04
root@mark-virtual-machine:/home/mark/dockerspace# docker run -it ubuntu:16.04
root@33afc5817cf2:/# ls
bin boot dev etc home lib lib64 media mnt opt proc root run sbin srv sys tmp usr var
3.创建Dockerfile
下载Spark:
root@mark-virtual-machine:/home/mark/dockerspace# mkdir sparkbuild
cd sparkbuild
wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz
tar xzvf spark-2.1.0-bin-hadoop2.7.tgz
创建Dockerfile
root@mark-virtual-machine:/home/mark/dockerspace# cd sparkbuild/
root@mark-virtual-machine:/home/mark/dockerspace/sparkbuild# vi Dockerfile
内容如下:
FROM ubuntu:16.04
RUN apt-get update && apt-get install -y openjdk-8-jdk
EXPOSE 8080
COPY spark-2.1.0-bin-hadoop2.7 /spark
CMD /bin/bash
保存并build(这里build了一个包含spark的基础镜像,之后可以根据不同的启动命令来启动master和worker):
root@mark-virtual-machine:/home/mark/dockerspace/sparkbuild# docker build -t myspark:0.1 .
运行并简单测试:
docker run --rm -it -P myspark:0.1
测试启动Spark master server:
root@0950cfbe650d:/# ./spark/sbin/start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /spark/logs/spark--org.apache.spark.deploy.master.Master-1-96a43ba2b88a.out
4.启动Spark
启动spark master(作为后台进程):
docker run --rm -itdP -h spark-master myspark:0.1 ./spark/bin/spark-class org.apache.spark.deploy.master.Master -h spark-master
注:docker中spark需要以foreground形式运行,否则运行完启动脚本后会马上退出
root@mark-virtual-machine:/home/mark/dockerspace/sparkbuild#
docker run --rm -itdP -h spark-master --name spark-master myspark:0.1 ./spark/bin/spark-class org.apache.spark.deploy.master.Master -h spark-master
4cc5a62207336320ea2e12d93a4bfd3b245a089b31ab3c0eafbef3f2c20cdb19root@mark-virtual-machine:/home/mark/dockerspace/sparkbuild# docker psCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES4cc5a6220733 myspark:0.1 "./spark/bin/spark..." 3 seconds ago Up 2 seconds 0.0.0.0:32770->8080/tcp hardcore_lewin
可以看到虚拟机端口32770映射到了docker内的8080,虚拟机中打开浏览器,输入http://localhost:32770/
接下来启动一个slave
root@mark-virtual-machine:/home/mark/dockerspace/sparkbuild# docker run --rm -itdP -h spark-worker-1 --link spark-master myspark:0.1 ./spark/bin/spark-class org.apache.spark.deploy.worker.Worker spark://spark-master:7077
634d9c4d02397b49231d0fdf54d99f4d95f7a7c48023305d5876048222cd8256
刷新web端,可以看到新的work已经加入成功
Docker Build时无法解析 archive.ubuntu.com 的解决办法
apt-get update时,发现无法解析archive.ubuntu.com,一直卡在0%,解决方法参考:http://stackoverflow.com/questions/24991136/docker-build-could-not-resolve-archive-ubuntu-com-apt-get-fails-to-install-a
试了第一个答案不能用,第二个答案正确。可能不同情况下解决方式不同。我的环境是win10系统使用VMware运行的ubuntu16.04虚拟机,上面run一个ubuntu16.04的docker镜像。
First, let’s verify the problem:
$ docker run busybox nslookup google.com # takes a long time
nslookup: can't resolve 'google.com' # <--- appears after a long time
Server: 8.8.8.8
Address 1: 8.8.8.8
If the command appears to hang, but eventually spits out the error “can’t resolve ‘google.com’”, then you have the same problem as me.
The nslookup command queries the DNS server 8.8.8.8 in order to turn the text address of ‘google.com’ into an IP address. Ironically, 8.8.8.8 is Google’s public DNS server. If nslookup fails, public DNS servers like 8.8.8.8 might be blocked by your company (which I assume is for security reasons).
You’d think that adding your company’s DNS servers to DOCKER_OPTS in /etc/default/docker should do the trick, but for whatever reason, it didn’t work for me. I describe what worked for me below.
SOLUTION:
On the host (I’m using Ubuntu 16.04), find out the primary and secondary DNS server addresses:
$ nmcli dev show | grep 'IP4.DNS'
IP4.DNS[1]: 10.0.0.2
IP4.DNS[2]: 10.0.0.3
Using these addresses, create a file /etc/docker/daemon.json:
$ sudo su root
# cd /etc/docker
# touch daemon.json
Put this in /etc/docker/daemon.json:
{
"dns": ["10.0.0.2", "10.0.0.3"]
}
Exit from root:
# exit
Now restart docker:
$ sudo service docker restart
VERIFICATION:
Now check that adding the /etc/docker/daemon.json file allows you to resolve ‘google.com’ into an IP address:
$ docker run busybox nslookup google.com
Server: 10.0.0.2
Address 1: 10.0.0.2
Name: google.com
Address 1: 2a00:1450:4009:811::200e lhr26s02-in-x200e.1e100.net
Address 2: 216.58.198.174 lhr25s10-in-f14.1e100.net
REFERENCES:
I based my solution on an article by Robin Winslow, who deserves all of the credit for the solution. Thanks, Robin!
“Fix Docker’s networking DNS config.” Robin Winslow. Retrieved 2016-11-09. https://robinwinslow.uk/2016/06/23/fix-docker-networking-dns/