parent
e44d5ee7bd
commit
5c4a466670
|
@ -0,0 +1,24 @@
|
|||
#!/bin/bash
|
||||
|
||||
langs="en zh"
|
||||
for lang in $langs
|
||||
do
|
||||
cp ${lang}/dev/api_doc.md ${lang}/dev/api_doc.html
|
||||
python3 scripts/mergeByTOC.py ${lang}/
|
||||
done
|
||||
|
||||
./scripts/genDoc.sh
|
||||
|
||||
for lang in $langs
|
||||
do
|
||||
xvfb-run wkhtmltopdf ./en/dev/api_doc.html api.pdf
|
||||
python3 scripts/mergePDF.py ${lang}
|
||||
done
|
||||
|
||||
for lang in $langs
|
||||
do
|
||||
rm ${lang}/dev/api_doc.html
|
||||
rm ${lang}/doc.md
|
||||
rm api.pdf
|
||||
rm output_${lang}.pdf
|
||||
done
|
|
@ -1,236 +0,0 @@
|
|||
# 示例 - 远程文件访问加速
|
||||
Fluid使用[Alluxio](https://www.alluxio.io)为用户提供了极其便捷的远程文件访问接口,使得程序能够像访问本地文件一样访问远程文件,同时,借助Alluxio提供的文件缓存能力,程序对于已访问过的文件重复访问能够获得大幅度的速度提升。本文档通过一个简单的例子演示了上述功能特性
|
||||
|
||||
## 前提条件
|
||||
在运行该示例之前,请参考[安装文档](../installation_cn/README.md)完成安装,并检查Fluid各组件正常运行:
|
||||
```shell script
|
||||
$ kubectl get pod -n fluid-system
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
controller-manager-7fd6457ccf-jnkvn 1/1 Running 0 60s
|
||||
csi-nodeplugin-fluid-6rhpt 2/2 Running 0 60s
|
||||
csi-nodeplugin-fluid-6zwgl 2/2 Running 0 60s
|
||||
```
|
||||
|
||||
## 运行示例
|
||||
**查看待创建的Dataset资源对象**
|
||||
```shell script
|
||||
$ cat samples/accelerate/dataset.yaml
|
||||
apiVersion: data.fluid.io/v1alpha1
|
||||
kind: Dataset
|
||||
metadata:
|
||||
name: hbase
|
||||
spec:
|
||||
mounts:
|
||||
- mountPoint: https://mirrors.tuna.tsinghua.edu.cn/apache/hbase/2.2.5/
|
||||
name: hbase
|
||||
```
|
||||
> 本示例将以Apache镜像站点上的Hbase v2.25相关资源作为演示中使用的远程文件
|
||||
|
||||
**创建Dataset资源对象**
|
||||
```shell script
|
||||
$ kubectl create -f samples/accelerate/dataset.yaml
|
||||
dataset.data.fluid.io/hbase created
|
||||
```
|
||||
|
||||
**查看Dataset资源对象状态**
|
||||
```shell script
|
||||
$ kubectl get dataset hbase -o yaml
|
||||
apiVersion: data.fluid.io/v1alpha1
|
||||
kind: Dataset
|
||||
...
|
||||
status:
|
||||
conditions: []
|
||||
phase: NotBound
|
||||
```
|
||||
|
||||
该Dataset资源对象目前还未与任何AlluxioRuntime资源对象绑定,因此其`status`中的`phase`属性值为`NotBound`,这意味着该Dataset资源对象仍然处于不可用状态
|
||||
|
||||
**创建AlluxioRuntime资源对象**
|
||||
```shell script
|
||||
$ kubectl create -f samples/accelerate/runtime.yaml
|
||||
alluxioruntime.data.fluid.io/hbase created
|
||||
```
|
||||
|
||||
等待一段时间,让AlluxioRuntime资源对象中的各个组件得以顺利启动,看到类似以下状态:
|
||||
```shell script
|
||||
$ kubectl get pod
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
hbase-fuse-hvxgh 1/1 Running 0 27s
|
||||
hbase-fuse-sjhxk 1/1 Running 0 27s
|
||||
hbase-master-0 2/2 Running 0 62s
|
||||
hbase-worker-92cln 2/2 Running 0 27s
|
||||
hbase-worker-rlb5w 2/2 Running 0 27s
|
||||
```
|
||||
|
||||
**再次查看Dataset资源对象状态**
|
||||
```shell script
|
||||
$ kubectl get dataset hbase -o yaml
|
||||
...
|
||||
...
|
||||
status:
|
||||
cacheStates:
|
||||
cacheCapacity: 4GiB
|
||||
cached: 0B
|
||||
cachedPercentage: 0%
|
||||
conditions:
|
||||
- lastTransitionTime: "2020-07-29T08:23:44Z"
|
||||
lastUpdateTime: "2020-07-29T08:26:29Z"
|
||||
message: The ddc runtime is ready.
|
||||
reason: DatasetReady
|
||||
status: "True"
|
||||
type: Ready
|
||||
phase: Bound
|
||||
runtimes:
|
||||
- category: Accelerate
|
||||
name: hbase
|
||||
namespace: default
|
||||
type: alluxio
|
||||
ufsTotal: 443.5MiB
|
||||
```
|
||||
因为已经与一个成功启动的AlluxioRuntime绑定,该Dataset资源对象的`Status`得到了更新,从上述状态中可以获知有关资源对象的基本信息
|
||||
|
||||
**查看AlluxioRuntime状态**
|
||||
```shell script
|
||||
$ kubectl get alluxioruntime hbase -o yaml
|
||||
...
|
||||
...
|
||||
status:
|
||||
cacheStates:
|
||||
cacheCapacity: 4GiB
|
||||
cached: 0B
|
||||
cachedPercentage: 0%
|
||||
conditions:
|
||||
- lastProbeTime: "2020-07-29T08:23:05Z"
|
||||
lastTransitionTime: "2020-07-29T08:23:05Z"
|
||||
message: The master is initialized.
|
||||
reason: Master is initialized
|
||||
status: "True"
|
||||
type: MasterInitialized
|
||||
- lastProbeTime: "2020-07-29T08:23:40Z"
|
||||
lastTransitionTime: "2020-07-29T08:23:05Z"
|
||||
message: The master is ready.
|
||||
reason: Master is ready
|
||||
status: "True"
|
||||
type: MasterReady
|
||||
- lastProbeTime: "2020-07-29T08:23:20Z"
|
||||
lastTransitionTime: "2020-07-29T08:23:20Z"
|
||||
message: The workers are initialized.
|
||||
reason: Workers are initialized
|
||||
status: "True"
|
||||
type: WorkersInitialized
|
||||
- lastProbeTime: "2020-07-29T08:23:20Z"
|
||||
lastTransitionTime: "2020-07-29T08:23:20Z"
|
||||
message: The fuses are initialized.
|
||||
reason: Fuses are initialized
|
||||
status: "True"
|
||||
type: FusesInitialized
|
||||
- lastProbeTime: "2020-07-29T08:23:40Z"
|
||||
lastTransitionTime: "2020-07-29T08:23:40Z"
|
||||
message: The workers are partially ready.
|
||||
reason: Workers are ready
|
||||
status: "True"
|
||||
type: WorkersReady
|
||||
- lastProbeTime: "2020-07-29T08:23:40Z"
|
||||
lastTransitionTime: "2020-07-29T08:23:40Z"
|
||||
message: The fuses are ready.
|
||||
reason: Fuses are ready
|
||||
status: "True"
|
||||
type: FusesReady
|
||||
currentFuseNumberScheduled: 2
|
||||
currentMasterNumberScheduled: 1
|
||||
currentWorkerNumberScheduled: 2
|
||||
desiredFuseNumberScheduled: 2
|
||||
desiredMasterNumberScheduled: 1
|
||||
desiredWorkerNumberScheduled: 2
|
||||
fuseNumberAvailable: 2
|
||||
fuseNumberReady: 2
|
||||
fusePhase: Ready
|
||||
masterNumberReady: 1
|
||||
masterPhase: Ready
|
||||
valueFile: hbase-alluxio-values
|
||||
workerNumberAvailable: 2
|
||||
workerNumberReady: 2
|
||||
workerPhase: Ready
|
||||
```
|
||||
|
||||
**查看与远程文件关联的PersistentVolume以及PersistentVolumeClaim**
|
||||
```shell script
|
||||
$ kubectl get pv
|
||||
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
|
||||
hbase 100Gi RWX Retain Bound default/hbase 18m
|
||||
```
|
||||
|
||||
```shell script
|
||||
$ kubectl get pvc
|
||||
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
|
||||
hbase Bound hbase 100Gi RWX 18m
|
||||
```
|
||||
与远程文件关联的PV,PVC已经由Fluid生成,应用可以通过该PVC完成远程文件在Pod中的挂载,并通过挂载目录实现远程文件访问
|
||||
|
||||
## 远程文件访问
|
||||
|
||||
**启动应用进行远程文件访问**
|
||||
```shell script
|
||||
kubectl create -f samples/accelerate/nginx.yaml
|
||||
```
|
||||
|
||||
登录Nginx Pod:
|
||||
```shell script
|
||||
kubectl exec -it nginx -- bash
|
||||
```
|
||||
|
||||
查看远程文件挂载情况:
|
||||
```shell script
|
||||
# ls -1 /data/hbase
|
||||
CHANGES.md
|
||||
RELEASENOTES.md
|
||||
api_compare_2.2.5RC0_to_2.2.4.html
|
||||
hbase-2.2.5-bin.tar.gz
|
||||
hbase-2.2.5-client-bin.tar.gz
|
||||
hbase-2.2.5-src.tar.gz
|
||||
```
|
||||
|
||||
```shell script
|
||||
# du -sh /data/hbase/hbase-2.2.5-client-bin.tar.gz
|
||||
200M /data/hbase/hbase-2.2.5-client-bin.tar.gz
|
||||
```
|
||||
|
||||
## 远程文件访问加速
|
||||
|
||||
**启动测试作业**
|
||||
```shell script
|
||||
$ kubectl create -f samples/accelerate/test.yaml
|
||||
job.batch/fluid-test created
|
||||
```
|
||||
该测试程序会尝试读取一个远程文件(e.g. `hbase-2.2.5-client-bin.tar.gz`),并打印出此过程所耗费的时间:
|
||||
```shell script
|
||||
$ kubectl logs fluid-test-cqmwj
|
||||
real 1m 9.55s
|
||||
user 0m 0.00s
|
||||
sys 0m 0.64s
|
||||
```
|
||||
可见,第一次远程文件的读取耗费了接近70s的时间
|
||||
|
||||
**再次启动测试作业**
|
||||
```shell script
|
||||
kubectl delete -f samples/accelerate/test.yaml
|
||||
kubectl create -f samples/accelerate/test.yaml
|
||||
```
|
||||
由于远程文件已经被缓存,此次测试作业能够迅速完成:
|
||||
```shell script
|
||||
$ kubectl logs fluid-test-hpzqc
|
||||
real 0m 2.03s
|
||||
user 0m 0.00s
|
||||
sys 0m 0.63s
|
||||
```
|
||||
同样的文件访问操作仅耗费了2s
|
||||
|
||||
因为该文件已经在Alluxio中被缓存,因此访问的速度大大加快,可见,Fluid利用Alluxio实现了远程文件访问的加速
|
||||
|
||||
> 注意: 上述文件的访问速度与示例运行环境的网络条件有关,如果文件访问速度过慢,请更换更小的远程文件尝试
|
||||
|
||||
## 环境清理
|
||||
```shell script
|
||||
kubectl delete -f samples/accelerate
|
||||
```
|
||||
|
|
@ -0,0 +1,22 @@
|
|||
# Fluid Documentation
|
||||
|
||||
<!-- markdownlint-disable MD007 -->
|
||||
<!-- markdownlint-disable MD032 -->
|
||||
|
||||
## TOC
|
||||
|
||||
+ Userguide
|
||||
- [Overview](userguide/overview.md)
|
||||
- [Get Started](userguide/get_started.md)
|
||||
- [Installation](userguide/install.md)
|
||||
- [Diagnose](userguide/diagnose.md)
|
||||
+ Samples
|
||||
- [Accelerate Data Accessing](samples/accelerate_data_accessing.md)
|
||||
- [Cache Co-locality](samples/data_co_locality.md)
|
||||
- [Machine Learning](samples/machinelearning.md)
|
||||
- [Warm up](samples/warmup.md)
|
||||
- [Dawnbench](samples/dawnbench.md)
|
||||
+ Developer Guide
|
||||
- [How to develop](dev/how_to_develop.md)
|
||||
- [API_Doc](dev/api_doc.md)
|
||||
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,199 @@
|
|||
# Developer Guide
|
||||
|
||||
## Requirements
|
||||
|
||||
- git
|
||||
|
||||
- golang (version >= 1.13)
|
||||
- docker (version >= 19.03)
|
||||
- Kubernetes (version >= 1.14)
|
||||
- GNU Make
|
||||
|
||||
For installation of golang, please refer to [Install Golang](https://golang.org/dl/)
|
||||
|
||||
`make` is usually in a `build-essential` package in your distribution's package manager of choice. Make sure you have `make` on your machine.
|
||||
|
||||
There're great chances that you may want to run your implementation in a real Kubernetes cluster, so probably a Docker is needed for some necessary operations like building images.
|
||||
See [Install Docker](https://docs.docker.com/engine/install/) for more information.
|
||||
|
||||
## How to Build, Run and Debug
|
||||
|
||||
### Get Source Code
|
||||
|
||||
```shell
|
||||
$ mkdir -p $GOPATH/src/github.com/cloudnativefluid/
|
||||
$ cd $GOPATH/src/github.com/cloudnativefluid
|
||||
$ git clone https://github.com/fluid-cloudnative/fluid.git
|
||||
```
|
||||
|
||||
> **NOTE**: In this document, we build, run and debug under non-module environment.
|
||||
>
|
||||
> See [Go Modules](https://github.com/golang/go/wiki/Modules) for more information if some issue occurs to you.
|
||||
|
||||
### Build binary
|
||||
`Makefile` under project directory provides many tasks you may want to use including Test, Build, Debug, Deploy etc.
|
||||
|
||||
You can simply get a binary by running:
|
||||
```shell
|
||||
# build controller manager
|
||||
$ make manager
|
||||
|
||||
# build fluid CSI plugin
|
||||
$ make csi
|
||||
```
|
||||
By default, the binary would be put under `<fluid-path>/bin`.
|
||||
|
||||
### Build image
|
||||
1\. Set tags for images
|
||||
|
||||
```shell
|
||||
# image name for controller manager
|
||||
$ export IMG=<registry>/<namespace>/<img-repo>
|
||||
# image name for CSI plugin
|
||||
$ export CSI_IMG=<registry>/<namespace>/<csi-img-repo>
|
||||
```
|
||||
Image tag will be automatically injected with SHA1 value of current git revision.
|
||||
|
||||
2. Login to a image registry
|
||||
|
||||
Make sure you've login to a docker image registry that you'd like to push your image to:
|
||||
```shell
|
||||
$ sudo docker login <docker-registry>
|
||||
```
|
||||
|
||||
3. Build your image and push:
|
||||
```shell
|
||||
# build and push image for controller manager
|
||||
$ make docker-push
|
||||
# build and push image for CSI plugin
|
||||
$ make docker-push-csi
|
||||
```
|
||||
|
||||
Alternatively, it makes no difference that you build your images first and then manually push them:
|
||||
```shell
|
||||
$ make docker-build
|
||||
|
||||
$ make docker-build-csi
|
||||
|
||||
$ docker push <IMG>:<IMG_TAG>
|
||||
```
|
||||
|
||||
### Run your fluid on kubernetes cluster
|
||||
In the following steps, we assume you have properly configured `KUBECONFIG` environment variable or set up `~/.kube/config`. See [Kubeconfig docs](https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/) for more information.
|
||||
|
||||
1. Push your images to a image registry accessible to your Kubernetes cluster
|
||||
|
||||
If your images are pushed to some private repositories, make sure your Kubernetes cluster hold credentials for accessing those repositories.
|
||||
|
||||
2. Change image in the samples we provide:
|
||||
|
||||
```yaml
|
||||
# <fluid-path>/config/fluid/patches/image_in_manager.yaml
|
||||
...
|
||||
...
|
||||
containers:
|
||||
- name: manager
|
||||
image: <registry>/<namespace>/<img-repo>:<img-tag>
|
||||
```
|
||||
```yaml
|
||||
# <fluid-path>/config/fluid/patches/image_in_csi-plugin.yaml
|
||||
...
|
||||
...
|
||||
containers:
|
||||
- name: plugins
|
||||
image: <registry>/<namespace>/<csi-img-name>:<csi-img-tag>
|
||||
```
|
||||
|
||||
3. Install CRDs
|
||||
```shell
|
||||
$ kubectl apply -k config/crd
|
||||
```
|
||||
|
||||
Check CRD with:
|
||||
|
||||
```shell
|
||||
$ kubectl get crd | grep fluid
|
||||
alluxiodataloads.data.fluid.io 2020-08-22T03:53:46Z
|
||||
alluxioruntimes.data.fluid.io 2020-08-22T03:53:46Z
|
||||
datasets.data.fluid.io 2020-08-22T03:53:46Z
|
||||
```
|
||||
|
||||
4. Install your implementation
|
||||
```shelll
|
||||
$ kubectl apply -k config/fluid
|
||||
```
|
||||
|
||||
Check Fluid system with:
|
||||
|
||||
```shell
|
||||
$ kubectl get pod -n fluid-system
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
controller-manager-7fd6457ccf-p7j2x 1/1 Running 0 84s
|
||||
csi-nodeplugin-fluid-pj9tv 2/2 Running 0 84s
|
||||
csi-nodeplugin-fluid-t8ctj 2/2 Running 0 84s
|
||||
```
|
||||
|
||||
5. Run samples to verify your implementation
|
||||
|
||||
Here is a sample provided by us, you may want to rewrite it according to your implementation.
|
||||
```shell
|
||||
$ kubectl apply -k config/samples
|
||||
```
|
||||
|
||||
Check sample pods:
|
||||
|
||||
```shell
|
||||
$ kubectl get pod
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
cifar10-fuse-vb6l4 1/1 Running 0 6m15s
|
||||
cifar10-fuse-vtqpx 1/1 Running 0 6m15s
|
||||
cifar10-master-0 2/2 Running 0 8m24s
|
||||
cifar10-worker-729xz 2/2 Running 0 6m15s
|
||||
cifar10-worker-d6kmd 2/2 Running 0 6m15s
|
||||
nginx-0 1/1 Running 0 8m30s
|
||||
nginx-1 1/1 Running 0 8m30s
|
||||
```
|
||||
|
||||
6. Check logs to verify your implementation
|
||||
```shell
|
||||
$ kubectl logs -n fluid-system <CONTROLLER_MANAGER_NAME>
|
||||
```
|
||||
|
||||
7. Clean up
|
||||
```shell
|
||||
$ kubectl delete -k config/samples
|
||||
$ kubectl delete -k config/fluid
|
||||
$ kubectl delete -k config/crd
|
||||
```
|
||||
|
||||
### Debug
|
||||
You can debug your program in multiple ways, here is just a brief guide for how to debug with `go-delve`
|
||||
|
||||
**Prerequisites**
|
||||
|
||||
Make sure you have `go-delve` installed. See [go-delve installation guide](https://github.com/go-delve/delve/tree/master/Documentation/installation) for more information
|
||||
|
||||
**Debug locally**
|
||||
```shell
|
||||
# build & debug in one line
|
||||
$ dlv debug <fluid-path>/cmd/controller/main.go
|
||||
|
||||
# debug binary
|
||||
$ make manager
|
||||
$ dlv exec bin/manager
|
||||
```
|
||||
|
||||
**Debug remotely**
|
||||
|
||||
On remote host:
|
||||
```shell
|
||||
$ dlv debug --headless --listen ":12345" --log --api-version=2 cmd/controller/main.go
|
||||
```
|
||||
The command above will make `go-delve` start a debug service and listen for port 12345.
|
||||
|
||||
On local host, connect to the debug service:
|
||||
```shell
|
||||
$ dlv connect "<remote-address>:12345" --api-version=2
|
||||
```
|
||||
|
||||
> Note: To debug remotely, make sure the specified port is not occupied and the firewall has been properly configured.
|
|
@ -0,0 +1,370 @@
|
|||
# DEMO - Speed Up Accessing Remote Files
|
||||
Powered by [Alluxio](https://www.alluxio.io) and [Fuse](https://github.com/libfuse/libfuse), Fluid provides a simple way for users to access files stored in remote filesystems, just like accessing some ordinary file in local filesystems.
|
||||
What's more, with a powerful caching capability provided, users can enjoy a great speedup on accessing remote files especially for those that have a frequent access pattern.
|
||||
|
||||
This demo aims to show you an overview of all the features mentioned above.
|
||||
|
||||
## Prerequisites
|
||||
Before everything we are going to do, please refer to [Installation Guide](../userguide/install.md) to install Fluid on your Kubernetes Cluster, and make sure all the components used by Fluid are ready like this:
|
||||
```shell
|
||||
$ kubectl get pod -n fluid-system
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
controller-manager-7fd6457ccf-jnkvn 1/1 Running 0 60s
|
||||
csi-nodeplugin-fluid-6rhpt 2/2 Running 0 60s
|
||||
csi-nodeplugin-fluid-6zwgl 2/2 Running 0 60s
|
||||
```
|
||||
|
||||
Normally, you shall see a Pod named "controller-manager" and several Pods named "csi-nodeplugin".
|
||||
The num of "csi-nodeplugin" Pods depends on how many nodes your Kubernetes cluster have(e.g. 2 in this demo), so please make sure all "csi-nodeplugin" Pods are working properly.
|
||||
|
||||
## Set up workspace
|
||||
```shell
|
||||
$ mkdir <any-path>/accelerate
|
||||
$ cd <any-path>/accelerate
|
||||
```
|
||||
|
||||
## Install Resources to Kubernetes
|
||||
|
||||
**Check the `Dataset` object to be created**
|
||||
```shell
|
||||
$ cat<<EOF >dataset.yaml
|
||||
apiVersion: data.fluid.io/v1alpha1
|
||||
kind: Dataset
|
||||
metadata:
|
||||
name: hbase
|
||||
spec:
|
||||
mounts:
|
||||
- mountPoint: https://mirrors.tuna.tsinghua.edu.cn/apache/hbase/2.2.5/
|
||||
name: hbase
|
||||
EOF
|
||||
```
|
||||
Here, we'd like to create a resource object with kind `Dataset`. `Dataset` is a Custom Resource Definition(CRD) defined by Fluid and used to tell Fluid where to find all the data you'd like to access.
|
||||
Under the hood, Fluid uses Alluxio to do some mount operations, so `mountPoint` property can be any legal UFS path acknowledged by Alluxio. Here, we use [WebUFS](https://docs.alluxio.io/os/user/stable/en/ufs/WEB.html) for its simplicity.
|
||||
|
||||
For more information about UFS, please refer to [Alluxio Docs - Storage Integrations](https://docs.alluxio.io/os/user/stable/en/ufs/HDFS.html)
|
||||
|
||||
For more information about properties in `Dataset`, please refer to our [API doc](../dev/api_doc.md)
|
||||
|
||||
> We use hbase v2.2.5 on a mirror site of Apache downloads as an example of remote file. It's nothing special, you can change it to any remote file you like. But please note that, if you are going to use WebUFS like we do, files on Apache sites are highly recommended because you might need some advanced configurations due to current implementation of WebUFS.
|
||||
|
||||
**Create the `Dataset` object**
|
||||
```shell
|
||||
$ kubectl create -f dataset.yaml
|
||||
dataset.data.fluid.io/hbase created
|
||||
```
|
||||
|
||||
**Check status of the `Dataset` object**
|
||||
```shell
|
||||
$ kubectl get dataset hbase -o yaml
|
||||
apiVersion: data.fluid.io/v1alpha1
|
||||
kind: Dataset
|
||||
...
|
||||
status:
|
||||
conditions: []
|
||||
phase: NotBound
|
||||
```
|
||||
|
||||
With a `NotBound` phase in status, the dataset is not ready cause there isn't any `AlluxioRuntime` object supporting it. We'll create one in the following steps.
|
||||
|
||||
**Check the `AlluxioRuntime` object to be created**
|
||||
```shell
|
||||
$ cat<<EOF >runtime.yaml
|
||||
apiVersion: data.fluid.io/v1alpha1
|
||||
kind: AlluxioRuntime
|
||||
metadata:
|
||||
name: hbase
|
||||
spec:
|
||||
replicas: 2
|
||||
tieredstore:
|
||||
levels:
|
||||
- mediumtype: MEM
|
||||
path: /dev/shm
|
||||
quota: 2Gi
|
||||
high: "0.95"
|
||||
low: "0.7"
|
||||
storageType: Memory
|
||||
properties:
|
||||
alluxio.user.file.writetype.default: MUST_CACHE
|
||||
alluxio.master.journal.folder: /journal
|
||||
alluxio.master.journal.type: UFS
|
||||
alluxio.user.block.size.bytes.default: 256MB
|
||||
alluxio.user.streaming.reader.chunk.size.bytes: 256MB
|
||||
alluxio.user.local.reader.chunk.size.bytes: 256MB
|
||||
alluxio.worker.network.reader.buffer.size: 256MB
|
||||
alluxio.user.streaming.data.timeout: 300sec
|
||||
master:
|
||||
jvmOptions:
|
||||
- "-Xmx4G"
|
||||
worker:
|
||||
jvmOptions:
|
||||
- "-Xmx4G"
|
||||
fuse:
|
||||
jvmOptions:
|
||||
- "-Xmx4G "
|
||||
- "-Xms4G "
|
||||
# For now, only support local
|
||||
shortCircuitPolicy: local
|
||||
args:
|
||||
- fuse
|
||||
- --fuse-opts=direct_io,ro,max_read=131072,attr_timeout=7200,entry_timeout=7200,nonempty
|
||||
EOF
|
||||
```
|
||||
|
||||
**Create a `AlluxioRuntime` object**
|
||||
```shell
|
||||
$ kubectl create -f runtime.yaml
|
||||
alluxioruntime.data.fluid.io/hbase created
|
||||
```
|
||||
`AlluxioRuntime` is another CRD defined by Fluid. An `AluxioRuntime` object describes specifications used to run an Alluxio instance.
|
||||
|
||||
Wait for a while, and make sure all components defined in the `AlluxioRuntime` object are ready. You shall see something like this:
|
||||
```shell
|
||||
$ kubectl get pod
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
hbase-fuse-hvxgh 1/1 Running 0 27s
|
||||
hbase-fuse-sjhxk 1/1 Running 0 27s
|
||||
hbase-master-0 2/2 Running 0 62s
|
||||
hbase-worker-92cln 2/2 Running 0 27s
|
||||
hbase-worker-rlb5w 2/2 Running 0 27s
|
||||
```
|
||||
|
||||
**Check status of the `Dataset` object again**
|
||||
```shell
|
||||
$ kubectl get dataset hbase -o yaml
|
||||
...
|
||||
...
|
||||
status:
|
||||
cacheStates:
|
||||
cacheCapacity: 4GiB
|
||||
cached: 0B
|
||||
cachedPercentage: 0%
|
||||
conditions:
|
||||
- lastTransitionTime: "2020-07-29T08:23:44Z"
|
||||
lastUpdateTime: "2020-07-29T08:26:29Z"
|
||||
message: The ddc runtime is ready.
|
||||
reason: DatasetReady
|
||||
status: "True"
|
||||
type: Ready
|
||||
phase: Bound
|
||||
runtimes:
|
||||
- category: Accelerate
|
||||
name: hbase
|
||||
namespace: default
|
||||
type: alluxio
|
||||
ufsTotal: 443.5MiB
|
||||
```
|
||||
Status of the `Dataset` object has been updated since a related Alluxio instance is ready and successfully bounded to the `Dataset` object. As you can see, basic information about runtime along with some other status info are provided in `status`.
|
||||
|
||||
**Check status of the `AlluxioRuntime` object**
|
||||
```shell
|
||||
$ kubectl get alluxioruntime hbase -o yaml
|
||||
...
|
||||
...
|
||||
status:
|
||||
cacheStates:
|
||||
cacheCapacity: 4GiB
|
||||
cached: 0B
|
||||
cachedPercentage: 0%
|
||||
conditions:
|
||||
- lastProbeTime: "2020-07-29T08:23:05Z"
|
||||
lastTransitionTime: "2020-07-29T08:23:05Z"
|
||||
message: The master is initialized.
|
||||
reason: Master is initialized
|
||||
status: "True"
|
||||
type: MasterInitialized
|
||||
- lastProbeTime: "2020-07-29T08:23:40Z"
|
||||
lastTransitionTime: "2020-07-29T08:23:05Z"
|
||||
message: The master is ready.
|
||||
reason: Master is ready
|
||||
status: "True"
|
||||
type: MasterReady
|
||||
- lastProbeTime: "2020-07-29T08:23:20Z"
|
||||
lastTransitionTime: "2020-07-29T08:23:20Z"
|
||||
message: The workers are initialized.
|
||||
reason: Workers are initialized
|
||||
status: "True"
|
||||
type: WorkersInitialized
|
||||
- lastProbeTime: "2020-07-29T08:23:20Z"
|
||||
lastTransitionTime: "2020-07-29T08:23:20Z"
|
||||
message: The fuses are initialized.
|
||||
reason: Fuses are initialized
|
||||
status: "True"
|
||||
type: FusesInitialized
|
||||
- lastProbeTime: "2020-07-29T08:23:40Z"
|
||||
lastTransitionTime: "2020-07-29T08:23:40Z"
|
||||
message: The workers are partially ready.
|
||||
reason: Workers are ready
|
||||
status: "True"
|
||||
type: WorkersReady
|
||||
- lastProbeTime: "2020-07-29T08:23:40Z"
|
||||
lastTransitionTime: "2020-07-29T08:23:40Z"
|
||||
message: The fuses are ready.
|
||||
reason: Fuses are ready
|
||||
status: "True"
|
||||
type: FusesReady
|
||||
currentFuseNumberScheduled: 2
|
||||
currentMasterNumberScheduled: 1
|
||||
currentWorkerNumberScheduled: 2
|
||||
desiredFuseNumberScheduled: 2
|
||||
desiredMasterNumberScheduled: 1
|
||||
desiredWorkerNumberScheduled: 2
|
||||
fuseNumberAvailable: 2
|
||||
fuseNumberReady: 2
|
||||
fusePhase: Ready
|
||||
masterNumberReady: 1
|
||||
masterPhase: Ready
|
||||
valueFile: hbase-alluxio-values
|
||||
workerNumberAvailable: 2
|
||||
workerNumberReady: 2
|
||||
workerPhase: Ready
|
||||
```
|
||||
Detailed information about the Alluxio instance is provided here.
|
||||
|
||||
**Check related PersistentVolume and PersistentVolumeClaim**
|
||||
```shell
|
||||
$ kubectl get pv
|
||||
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
|
||||
hbase 100Gi RWX Retain Bound default/hbase 18m
|
||||
```
|
||||
|
||||
```shell
|
||||
$ kubectl get pvc
|
||||
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
|
||||
hbase Bound hbase 100Gi RWX 18m
|
||||
```
|
||||
|
||||
Related PV and PVC have been created by Fluid since the `Dataset` object is ready(bounded).
|
||||
Workloads are now able to access remote files by mounting PVC.
|
||||
|
||||
## Remote File Access
|
||||
|
||||
**Check the app to be created**
|
||||
|
||||
```shell
|
||||
$ cat<<EOF >nginx.yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: nginx
|
||||
spec:
|
||||
containers:
|
||||
- name: nginx
|
||||
image: nginx
|
||||
volumeMounts:
|
||||
- mountPath: /data
|
||||
name: hbase-vol
|
||||
volumes:
|
||||
- name: hbase-vol
|
||||
persistentVolumeClaim:
|
||||
claimName: hbase
|
||||
EOF
|
||||
```
|
||||
|
||||
**Run a demo app to access remote files**
|
||||
```shell
|
||||
$ kubectl create -f nginx.yaml
|
||||
```
|
||||
|
||||
Login to nginx Pod:
|
||||
```shell
|
||||
$ kubectl exec -it nginx -- bash
|
||||
```
|
||||
|
||||
Check file status
|
||||
```shell
|
||||
$ ls -1 /data/hbase
|
||||
CHANGES.md
|
||||
RELEASENOTES.md
|
||||
api_compare_2.2.5RC0_to_2.2.4.html
|
||||
hbase-2.2.5-bin.tar.gz
|
||||
hbase-2.2.5-client-bin.tar.gz
|
||||
hbase-2.2.5-src.tar.gz
|
||||
```
|
||||
|
||||
```shell
|
||||
$ du -h /data/hbase/*
|
||||
174K /data/hbase/CHANGES.md
|
||||
106K /data/hbase/RELEASENOTES.md
|
||||
115K /data/hbase/api_compare_2.2.5RC0_to_2.2.4.html
|
||||
211M /data/hbase/hbase-2.2.5-bin.tar.gz
|
||||
200M /data/hbase/hbase-2.2.5-client-bin.tar.gz
|
||||
34M /data/hbase/hbase-2.2.5-src.tar.gz
|
||||
```
|
||||
|
||||
Logout:
|
||||
```shell
|
||||
$ exit
|
||||
```
|
||||
|
||||
As you may have seen, all the files on the WebUFS(e.g. hbase-related files on Apache mirror in our case) appear no differences from any other file in the local filesystem of the nginx Pod.
|
||||
|
||||
## Speed Up Accessing Remote Files
|
||||
To demonstrate how great speedup you may enjoy when accessing remote files, here is a demo job:
|
||||
|
||||
**Check the test job to be launched**
|
||||
```shell
|
||||
$ cat<<EOF >app.yaml
|
||||
apiVersion: batch/v1
|
||||
kind: Job
|
||||
metadata:
|
||||
name: fluid-copy-test
|
||||
spec:
|
||||
template:
|
||||
spec:
|
||||
restartPolicy: OnFailure
|
||||
containers:
|
||||
- name: busybox
|
||||
image: busybox
|
||||
command: ["/bin/sh"]
|
||||
args: ["-c", "set -x; time cp -r /data/hbase ./"]
|
||||
volumeMounts:
|
||||
- mountPath: /data
|
||||
name: hbase-vol
|
||||
volumes:
|
||||
- name: hbase-vol
|
||||
persistentVolumeClaim:
|
||||
claimName: hbase
|
||||
EOF
|
||||
```
|
||||
|
||||
**Launch a test job**
|
||||
```shell
|
||||
$ kubectl create -f app.yaml
|
||||
job.batch/fluid-test created
|
||||
```
|
||||
Under the hood, the test job executes a shell command `time cp -r /data/hbase ./` and prints its result.
|
||||
Wait for a while and make sure the job has completed. You can check its result by:
|
||||
|
||||
```shell
|
||||
$ kubectl logs fluid-copy-test-h59w9
|
||||
+ time cp -r /data/hbase ./
|
||||
real 1m 2.74s
|
||||
user 0m 0.00s
|
||||
sys 0m 1.35s
|
||||
```
|
||||
It's our first time to read such a file, and it takes us about 63s. It may be not as fast as you expected but:
|
||||
|
||||
**Re-Launch the test job**
|
||||
```shell
|
||||
$ kubectl delete -f app.yaml
|
||||
$ kubectl create -f app.yaml
|
||||
```
|
||||
|
||||
It'll finish very soon after creation this time:
|
||||
```shell
|
||||
$ kubectl logs fluid-copy-test-d9h2x
|
||||
+ time cp -r /data/hbase ./
|
||||
real 0m 2.94s
|
||||
user 0m 0.00s
|
||||
sys 0m 1.27s
|
||||
```
|
||||
The same read operation takes only 3s this time.
|
||||
|
||||
The great speedup attributes to the powerful caching capability provided by Alluxio. That means that once you access some remote file, it will be cached in Alluxio, and your next following operations will enjoy a local access instead of a remote one, and thus a great speedup.
|
||||
|
||||
> Note: Time spent for the test job depends on your network environment. If it takes too long for you to complete the job, changing a mirror or some smaller file might help.
|
||||
|
||||
## Clean Up
|
||||
```shell
|
||||
$ kubectl delete -f .
|
||||
```
|
|
@ -0,0 +1,284 @@
|
|||
# DEMO - Cache Co-locality for Workload Scheduling
|
||||
In Fluid, remote files specified in `Dataset` object are schedulable, which means you are able to control where to put your data in a k8s cluster,
|
||||
just like what you may have done to Pods. Also, Fluid is able to make cache co-locality scheduling decisions for workloads to minimize overhead costs.
|
||||
|
||||
This demo will show you an overview about features mentioned above.
|
||||
|
||||
## Prerequisites
|
||||
Before everything we are going to do, please refer to [Installation Guide](../userguide/install.md) to install Fluid on your Kubernetes Cluster, and make sure all the components used by Fluid are ready like this:
|
||||
```shell
|
||||
$ kubectl get pod -n fluid-system
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
controller-manager-7fd6457ccf-jnkvn 1/1 Running 0 60s
|
||||
csi-nodeplugin-fluid-6rhpt 2/2 Running 0 60s
|
||||
csi-nodeplugin-fluid-6zwgl 2/2 Running 0 60s
|
||||
```
|
||||
|
||||
Normally, you shall see a Pod named "controller-manager" and several Pods named "csi-nodeplugin".
|
||||
The num of "csi-nodeplugin" Pods depends on how many nodes your Kubernetes cluster have(e.g. 2 in this demo), so please make sure all "csi-nodeplugin" Pods are working properly.
|
||||
|
||||
## Set up workspace
|
||||
```shell
|
||||
$ mkdir <any-path>/co-locality
|
||||
$ cd <any-path>/co-locality
|
||||
```
|
||||
|
||||
## Install Resources to Kubernetes
|
||||
**Check all nodes in your Kubernetes cluster**
|
||||
```shell
|
||||
$ kubectl get nodes
|
||||
NAME STATUS ROLES AGE VERSION
|
||||
cn-beijing.192.168.1.146 Ready <none> 7d14h v1.16.9-aliyun.1
|
||||
cn-beijing.192.168.1.147 Ready <none> 7d14h v1.16.9-aliyun.1
|
||||
```
|
||||
|
||||
**Label one of the nodes**
|
||||
```shell
|
||||
$ kubectl label nodes cn-beijing.192.168.1.146 hbase-cache=true
|
||||
```
|
||||
Since we'll use `NodeSelector` to manage where to put our data, we mark the desired node by labeling it.
|
||||
|
||||
|
||||
**Check all nodes again**
|
||||
```shell
|
||||
$ kubectl get node -L hbase-cache
|
||||
NAME STATUS ROLES AGE VERSION HBASE-CACHE
|
||||
cn-beijing.192.168.1.146 Ready <none> 7d14h v1.16.9-aliyun.1 true
|
||||
cn-beijing.192.168.1.147 Ready <none> 7d14h v1.16.9-aliyun.1
|
||||
```
|
||||
Only one of the two nodes holds a label `hbase-cache=true`. In the following steps, we are going to make sure it's the only location the data cache can be put on.
|
||||
|
||||
**Check the `Dataset` object to be created**
|
||||
```shell
|
||||
$ cat<<EOF >dataset.yaml
|
||||
apiVersion: data.fluid.io/v1alpha1
|
||||
kind: Dataset
|
||||
metadata:
|
||||
name: hbase
|
||||
spec:
|
||||
mounts:
|
||||
- mountPoint: https://mirrors.tuna.tsinghua.edu.cn/apache/hbase/2.2.5/
|
||||
name: hbase
|
||||
nodeAffinity:
|
||||
required:
|
||||
nodeSelectorTerms:
|
||||
- matchExpressions:
|
||||
- key: hbase-cache
|
||||
operator: In
|
||||
values:
|
||||
- "true"
|
||||
EOF
|
||||
```
|
||||
We defined a `nodeSelectorTerm` in `Dataset` object's `spec` to make sure only nodes with label `hbase-cache=true` are considered to be available for the dataset.
|
||||
|
||||
**Create the dataset object**
|
||||
```shell
|
||||
$ kubectl create -f dataset.yaml
|
||||
dataset.data.fluid.io/hbase created
|
||||
```
|
||||
|
||||
**Check the `AlluxioRuntime` object to be created**
|
||||
```shell
|
||||
$ cat<<EOF >runtime.yaml
|
||||
apiVersion: data.fluid.io/v1alpha1
|
||||
kind: AlluxioRuntime
|
||||
metadata:
|
||||
name: hbase
|
||||
spec:
|
||||
replicas: 2
|
||||
tieredstore:
|
||||
levels:
|
||||
- mediumtype: MEM
|
||||
path: /dev/shm
|
||||
quota: 2Gi
|
||||
high: "0.95"
|
||||
low: "0.7"
|
||||
storageType: Memory
|
||||
properties:
|
||||
alluxio.user.file.writetype.default: MUST_CACHE
|
||||
alluxio.master.journal.folder: /journal
|
||||
alluxio.master.journal.type: UFS
|
||||
alluxio.user.block.size.bytes.default: 256MB
|
||||
alluxio.user.streaming.reader.chunk.size.bytes: 256MB
|
||||
alluxio.user.local.reader.chunk.size.bytes: 256MB
|
||||
alluxio.worker.network.reader.buffer.size: 256MB
|
||||
alluxio.user.streaming.data.timeout: 300sec
|
||||
master:
|
||||
jvmOptions:
|
||||
- "-Xmx4G"
|
||||
worker:
|
||||
jvmOptions:
|
||||
- "-Xmx4G"
|
||||
fuse:
|
||||
jvmOptions:
|
||||
- "-Xmx4G "
|
||||
- "-Xms4G "
|
||||
- "-XX:+UseG1GC "
|
||||
- "-XX:MaxDirectMemorySize=4g "
|
||||
- "-XX:+UnlockExperimentalVMOptions "
|
||||
- "-XX:ActiveProcessorCount=8 "
|
||||
# For now, only support local
|
||||
shortCircuitPolicy: local
|
||||
args:
|
||||
- fuse
|
||||
- --fuse-opts=direct_io,ro,max_read=131072,attr_timeout=7200,entry_timeout=7200,nonempty
|
||||
EOF
|
||||
```
|
||||
In this snippet of yaml, there are many specifications used by Fluid to launch an Alluxio instance. By creating such an `AlluxioRuntime` object, an Alluxio instance with 1 master and 2 workers is expected to be launched.
|
||||
|
||||
**Create the `AlluxioRuntime` object**
|
||||
```shell
|
||||
$ kubectl create -f runtime.yaml
|
||||
alluxioruntime.data.fluid.io/hbase created
|
||||
|
||||
$ kubectl get pod -o wide
|
||||
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
|
||||
hbase-fuse-42csf 1/1 Running 0 104s 192.168.1.146 cn-beijing.192.168.1.146 <none> <none>
|
||||
hbase-master-0 2/2 Running 0 3m3s 192.168.1.147 cn-beijing.192.168.1.147 <none> <none>
|
||||
hbase-worker-l62m4 2/2 Running 0 104s 192.168.1.146 cn-beijing.192.168.1.146 <none> <none>
|
||||
```
|
||||
While two running workers are expected, there's only one running on the node with label `hbase-cache=true`. The `nodeSelectorTerm` stops another worker from being deployed.
|
||||
|
||||
**Check status of the `AlluxioRuntime` object**
|
||||
```shell
|
||||
$ kubectl get alluxioruntime hbase -o yaml
|
||||
...
|
||||
status:
|
||||
cacheStates:
|
||||
cacheCapacity: 2GiB
|
||||
cached: 0B
|
||||
cachedPercentage: 0%
|
||||
conditions:
|
||||
...
|
||||
currentFuseNumberScheduled: 1
|
||||
currentMasterNumberScheduled: 1
|
||||
currentWorkerNumberScheduled: 1
|
||||
desiredFuseNumberScheduled: 2
|
||||
desiredMasterNumberScheduled: 1
|
||||
desiredWorkerNumberScheduled: 2
|
||||
fuseNumberAvailable: 1
|
||||
fuseNumberReady: 1
|
||||
fusePhase: PartialReady
|
||||
masterNumberReady: 1
|
||||
masterPhase: Ready
|
||||
valueFile: hbase-alluxio-values
|
||||
workerNumberAvailable: 1
|
||||
workerNumberReady: 1
|
||||
workerPhase: PartialReady
|
||||
```
|
||||
As expected, `workerPhase` is `PartialReady` and `currentWorkerNumberScheduled: 1` is less than `desiredWorkerNumberScheduled: 2`.
|
||||
|
||||
**Check the workload to be created**
|
||||
|
||||
A sample workload is provided to demonstrate how cache co-locality scheduling works. Let's check it out first:
|
||||
```shell
|
||||
$ cat<<EOF >app.yaml
|
||||
apiVersion: apps/v1beta1
|
||||
kind: StatefulSet
|
||||
metadata:
|
||||
name: nginx
|
||||
labels:
|
||||
app: nginx
|
||||
spec:
|
||||
replicas: 2
|
||||
serviceName: "nginx"
|
||||
podManagementPolicy: "Parallel"
|
||||
selector: # define how the deployment finds the pods it manages
|
||||
matchLabels:
|
||||
app: nginx
|
||||
template: # define the pods specifications
|
||||
metadata:
|
||||
labels:
|
||||
app: nginx
|
||||
spec:
|
||||
affinity:
|
||||
# prevent two Nginx Pod from being scheduled at the same Node
|
||||
# just for demonstrating co-locality demo
|
||||
podAntiAffinity:
|
||||
requiredDuringSchedulingIgnoredDuringExecution:
|
||||
- labelSelector:
|
||||
matchExpressions:
|
||||
- key: app
|
||||
operator: In
|
||||
values:
|
||||
- nginx
|
||||
topologyKey: "kubernetes.io/hostname"
|
||||
containers:
|
||||
- name: nginx
|
||||
image: nginx
|
||||
volumeMounts:
|
||||
- mountPath: /data
|
||||
name: hbase-vol
|
||||
volumes:
|
||||
- name: hbase-vol
|
||||
persistentVolumeClaim:
|
||||
claimName: hbase
|
||||
EOF
|
||||
```
|
||||
The `podAntiAffinity` property might be a little confusing.
|
||||
Here is the explanation: The `podAntiAffinity` property makes sure all pods created by the workload should be distributed across different nodes, which can provide us a clear view of how cache co-locality scheduling works.
|
||||
In short, it's just a property for demonstration, you don't need to put much focus on that :)
|
||||
|
||||
|
||||
**Run the workload**
|
||||
|
||||
```shell
|
||||
$ kubectl create -f app.yaml
|
||||
statefulset.apps/nginx created
|
||||
```
|
||||
|
||||
**Check status of the workload**
|
||||
```shell
|
||||
$ kubectl get pod -o wide -l app=nginx
|
||||
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
|
||||
nginx-0 1/1 Running 0 2m5s 192.168.1.146 cn-beijing.192.168.1.146 <none> <none>
|
||||
nginx-1 0/1 Pending 0 2m5s <none> <none> <none> <none>
|
||||
```
|
||||
Only one Pod is ready, and running on the only node that matches the `nodeSelectorTerm`
|
||||
|
||||
**Check the reason why it's still not ready**
|
||||
```shell
|
||||
$ kubectl describe pod nginx-1
|
||||
...
|
||||
Events:
|
||||
Type Reason Age From Message
|
||||
---- ------ ---- ---- -------
|
||||
Warning FailedScheduling <unknown> default-scheduler 0/2 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy existing pods anti-affinity rules, 1 node(s) had volume node affinity conflict.
|
||||
Warning FailedScheduling <unknown> default-scheduler 0/2 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy existing pods anti-affinity rules, 1 node(s) had volume node affinity conflict.
|
||||
```
|
||||
As you may have seen, for one reason, `podAntiAffinity` prevents `nginx-1` Pod from being scheduled together with `nginx-0`. For another, there's only one node satisfying the given affinity condition.
|
||||
|
||||
**Label another node**
|
||||
```shell
|
||||
$ kubectl label node cn-beijing.192.168.1.147 hbase-cache=true
|
||||
```
|
||||
Now all of the two nodes hold the same label `hbase-cache=true`, re-check all the pods:
|
||||
```shell
|
||||
$ kubectl get pod -o wide
|
||||
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
|
||||
hbase-fuse-42csf 1/1 Running 0 44m 192.168.1.146 cn-beijing.192.168.1.146 <none> <none>
|
||||
hbase-fuse-kth4g 1/1 Running 0 10m 192.168.1.147 cn-beijing.192.168.1.147 <none> <none>
|
||||
hbase-master-0 2/2 Running 0 46m 192.168.1.147 cn-beijing.192.168.1.147 <none> <none>
|
||||
hbase-worker-l62m4 2/2 Running 0 44m 192.168.1.146 cn-beijing.192.168.1.146 <none> <none>
|
||||
hbase-worker-rvncl 2/2 Running 0 10m 192.168.1.147 cn-beijing.192.168.1.147 <none> <none>
|
||||
```
|
||||
There're two running Alluxio workers now.
|
||||
|
||||
```shell
|
||||
$ kubectl get pod -l app=nginx -o wide
|
||||
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
|
||||
nginx-0 1/1 Running 0 21m 192.168.1.146 cn-beijing.192.168.1.146 <none> <none>
|
||||
nginx-1 1/1 Running 0 21m 192.168.1.147 cn-beijing.192.168.1.147 <none> <none>
|
||||
```
|
||||
Another nginx Pod is also no longer pending.
|
||||
|
||||
In conclusion, schedulable data cache and cache co-locality scheduling for workloads are both supported by Fluid. Usually, they work together and offer a more flexible way to users who need some data management in Kubernetes.
|
||||
|
||||
## Clean Up
|
||||
```shell
|
||||
$ kubectl delete -f .
|
||||
|
||||
# unlabel nodes
|
||||
$ kubectl label node cn-beijing.192.168.1.146 hbase-cache-
|
||||
$ kubectl label node cn-beijing.192.168.1.147 hbase-cache-
|
||||
```
|
|
@ -0,0 +1 @@
|
|||
# dawnbench_en
|
|
@ -0,0 +1,294 @@
|
|||
# Accelerate Machine Learning Training with Fluid
|
||||
|
||||
This article describes how to deploy [ImageNet](http://www.image-net.org/) dataset stored on [Aliyun OSS](https://cn.aliyun.com/product/oss) to Kubernetes cluster with Fluid, and train a ResNet-50 model on this dataset using [arena](https://github.com/kubeflow/arena). In this article, we perform machine learning training on 4 nodes, each node with 8 GPU cards.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- [Fluid](https://github.com/fluid-cloudnative/fluid) (version >= 0.1.0)
|
||||
- [arena](https://github.com/kubeflow/arena)(version >= 0.4.0)
|
||||
|
||||
> **NOTE**:
|
||||
>
|
||||
> 1. The document requires Fluid installed on your Kubernetes cluster. Please refer to [Fluid Installation Guide](../userguide/install.md) to finish installation before going to the next step.
|
||||
>
|
||||
> 2. Arena is a CLI that is convenient for data scientists to run and monitor machine learning tasks. See [arena-installation-tutorial](https://github.com/kubeflow/arena/blob/master/docs/installation/INSTALL_FROM_BINARY.md) for more information.
|
||||
|
||||
## Deploy Dataset on Kubernetes Cluster with Fluid
|
||||
|
||||
### Create Dataset and Runtime
|
||||
|
||||
The following `dataset.yaml` file defined a `Dataset` and `Runtime` separated by `---`.
|
||||
|
||||
The dataset is stored on [Alibaba Cloud OSS](https://cn.aliyun.com/product/oss). To ensure that Alluxio can successfully mount the dataset, please make sure that configurations in the `dataset.yaml` are correct set, including `mountPoint`, `fs.oss.accessKeyId`, `fs.oss.accessKeySecret` and `fs.oss.endpoint`.
|
||||
|
||||
> See Alluxio's official document [Aliyun Object Storage Service](https://docs.alluxio.io/os/user/stable/en/ufs/OSS.html) for more examples of using OSS in Alluxio.
|
||||
|
||||
This document takes 4 machines to training machine learning tasks, so `spec.replicas` is set to `4`. In addition, the following configuration file `dataset.yaml` also sets many parameters based on our experience to optimize the IO performance of Alluxio in machine learning tasks, including Alluxio, Fuse and JVM levels. You can adjust these parameters according to the test environment and task requirements.
|
||||
|
||||
```shell
|
||||
$ cat << EOF >> dataset.yaml
|
||||
apiVersion: data.fluid.io/v1alpha1
|
||||
kind: Dataset
|
||||
metadata:
|
||||
name: imagenet
|
||||
spec:
|
||||
mounts:
|
||||
- mountPoint: oss://<OSS_BUCKET>/<OSS_DIRECTORY>/
|
||||
name: imagenet
|
||||
options:
|
||||
fs.oss.accessKeyId: <OSS_ACCESS_KEY_ID>
|
||||
fs.oss.accessKeySecret: <OSS_ACCESS_KEY_SECRET>
|
||||
fs.oss.endpoint: <OSS_ENDPOINT>
|
||||
---
|
||||
apiVersion: data.fluid.io/v1alpha1
|
||||
kind: AlluxioRuntime
|
||||
metadata:
|
||||
name: imagenet
|
||||
spec:
|
||||
replicas: 4
|
||||
data:
|
||||
replicas: 1
|
||||
# alluxioVersion:
|
||||
# image: registry.cn-huhehaote.aliyuncs.com/alluxio/alluxio
|
||||
# imageTag: "2.3.0-SNAPSHOT-bbce37a"
|
||||
# imagePullPolicy: Always
|
||||
tieredstore:
|
||||
levels:
|
||||
- mediumtype: SSD
|
||||
path: /var/lib/docker/alluxio
|
||||
quota: 50Gi
|
||||
high: "0.99"
|
||||
low: "0.8"
|
||||
properties:
|
||||
# alluxio fuse
|
||||
alluxio.fuse.jnifuse.enabled: "true"
|
||||
alluxio.fuse.debug.enabled: "false"
|
||||
alluxio.fuse.cached.paths.max: "1000000"
|
||||
alluxio.fuse.logging.threshold: 1000ms
|
||||
# alluxio master
|
||||
alluxio.master.metastore: ROCKS
|
||||
alluxio.master.journal.folder: /journal
|
||||
alluxio.master.journal.type: UFS
|
||||
alluxio.master.metastore.inode.cache.max.size: "10000000"
|
||||
alluxio.master.journal.log.size.bytes.max: 500MB
|
||||
alluxio.master.metadata.sync.concurrency.level: "128"
|
||||
alluxio.master.metadata.sync.executor.pool.size: "128"
|
||||
alluxio.master.metadata.sync.ufs.prefetch.pool.size: "128"
|
||||
alluxio.master.rpc.executor.max.pool.size: "1024"
|
||||
alluxio.master.rpc.executor.core.pool.size: "128"
|
||||
# alluxio worker
|
||||
alluxio.worker.allocator.class: alluxio.worker.block.allocator.GreedyAllocator
|
||||
alluxio.worker.network.reader.buffer.size: 32MB
|
||||
alluxio.worker.file.buffer.size: 320MB
|
||||
alluxio.worker.block.master.client.pool.size: "1024"
|
||||
# alluxio user
|
||||
alluxio.user.block.worker.client.pool.min: "512"
|
||||
alluxio.user.file.writetype.default: MUST_CACHE
|
||||
alluxio.user.ufs.block.read.location.policy: alluxio.client.block.policy.LocalFirstAvoidEvictionPolicy
|
||||
alluxio.user.block.write.location.policy.class: alluxio.client.block.policy.LocalFirstAvoidEvictionPolicy
|
||||
alluxio.user.block.size.bytes.default: 16MB
|
||||
alluxio.user.streaming.reader.chunk.size.bytes: 32MB
|
||||
alluxio.user.local.reader.chunk.size.bytes: 32MB
|
||||
alluxio.user.metrics.collection.enabled: "false"
|
||||
alluxio.user.update.file.accesstime.disabled: "true"
|
||||
alluxio.user.file.passive.cache.enabled: "false"
|
||||
alluxio.user.block.avoid.eviction.policy.reserved.size.bytes: 2GB
|
||||
alluxio.user.block.master.client.pool.gc.threshold: 2day
|
||||
alluxio.user.file.master.client.threads: "1024"
|
||||
alluxio.user.block.master.client.threads: "1024"
|
||||
alluxio.user.file.readtype.default: CACHE
|
||||
alluxio.user.metadata.cache.enabled: "true"
|
||||
alluxio.user.metadata.cache.expiration.time: 2day
|
||||
alluxio.user.metadata.cache.max.size: "1000000"
|
||||
alluxio.user.direct.memory.io.enabled: "true"
|
||||
alluxio.user.worker.list.refresh.interval: 2min
|
||||
alluxio.user.logging.threshold: 1000ms
|
||||
# other alluxio configurations
|
||||
alluxio.web.ui.enabled: "false"
|
||||
alluxio.security.stale.channel.purge.interval: 365d
|
||||
alluxio.job.worker.threadpool.size: "164"
|
||||
master:
|
||||
jvmOptions:
|
||||
- "-Xmx6G"
|
||||
- "-XX:+UnlockExperimentalVMOptions"
|
||||
- "-XX:ActiveProcessorCount=8"
|
||||
worker:
|
||||
jvmOptions:
|
||||
- "-Xmx12G"
|
||||
- "-XX:+UnlockExperimentalVMOptions"
|
||||
- "-XX:MaxDirectMemorySize=32g"
|
||||
- "-XX:ActiveProcessorCount=8"
|
||||
resources:
|
||||
limits:
|
||||
cpu: 8
|
||||
fuse:
|
||||
# image: registry.cn-huhehaote.aliyuncs.com/alluxio/alluxio-fuse
|
||||
# imageTag: "2.3.0-SNAPSHOT-bbce37a"
|
||||
# imagePullPolicy: Always
|
||||
env:
|
||||
MAX_IDLE_THREADS: "32"
|
||||
jvmOptions:
|
||||
- "-Xmx16G"
|
||||
- "-Xms16G"
|
||||
- "-XX:+UseG1GC"
|
||||
- "-XX:MaxDirectMemorySize=32g"
|
||||
- "-XX:+UnlockExperimentalVMOptions"
|
||||
- "-XX:ActiveProcessorCount=24"
|
||||
resources:
|
||||
limits:
|
||||
cpu: 16
|
||||
shortCircuitPolicy: local
|
||||
args:
|
||||
- fuse
|
||||
- --fuse-opts=kernel_cache,ro,max_read=131072,attr_timeout=7200,entry_timeout=7200,nonempty
|
||||
EOF
|
||||
```
|
||||
|
||||
Create Dataset and Alluxio Runtime with:
|
||||
|
||||
```shell
|
||||
$ kubectl create -f dataset.yaml
|
||||
```
|
||||
|
||||
Check the status Alluxio Runtime, and there should be `1` Master,`4` Worker and `4` Fuse running:
|
||||
|
||||
```shell
|
||||
$ kubectl describe alluxioruntime imagenet
|
||||
Name: imagenet
|
||||
Namespace: default
|
||||
Labels: <none>
|
||||
Annotations: <none>
|
||||
API Version: data.fluid.io/v1alpha1
|
||||
Kind: AlluxioRuntime
|
||||
Metadata:
|
||||
# more metadata
|
||||
Spec:
|
||||
# more spec
|
||||
Status:
|
||||
Cache States:
|
||||
Cache Capacity: 200GiB
|
||||
Cached: 0B
|
||||
Cached Percentage: 0%
|
||||
Conditions:
|
||||
# more conditions
|
||||
Current Fuse Number Scheduled: 4
|
||||
Current Master Number Scheduled: 1
|
||||
Current Worker Number Scheduled: 4
|
||||
Desired Fuse Number Scheduled: 4
|
||||
Desired Master Number Scheduled: 1
|
||||
Desired Worker Number Scheduled: 4
|
||||
Fuse Number Available: 4
|
||||
Fuse Numb Status: True
|
||||
Type: Ready
|
||||
Phase: Bound
|
||||
Runtimes:
|
||||
Category: Accelerate
|
||||
Name: imagenet
|
||||
Namespace: default
|
||||
Type: alluxio
|
||||
Ufs Total: 143.7GiB
|
||||
Events: <none>
|
||||
```
|
||||
|
||||
At the same time, Dataset is bound to Alluxio Runtime:
|
||||
|
||||
```shell
|
||||
$ kubectl describe dataset
|
||||
Name: imagenet
|
||||
Namespace: default
|
||||
Labels: <none>
|
||||
Annotations: <none>
|
||||
API Version: data.fluid.io/v1alpha1
|
||||
Kind: Dataset
|
||||
Metadata:
|
||||
# more metadata
|
||||
Spec:
|
||||
# more spec
|
||||
Status:
|
||||
Cache States:
|
||||
Cache Capacity: 200GiB
|
||||
Cached: 0B
|
||||
Cached Percentage: 0%
|
||||
Conditions:
|
||||
Last Transition Time: 2020-08-18T11:01:09Z
|
||||
Last Update Time: 2020-08-18T11:02:48Z
|
||||
Message: The ddc runtime is ready.
|
||||
Reason: DatasetReady
|
||||
Status: True
|
||||
Type: Ready
|
||||
Phase: Bound
|
||||
Runtimes:
|
||||
Category: Accelerate
|
||||
Name: imagenet
|
||||
Namespace: default
|
||||
Type: alluxio
|
||||
Ufs Total: 143.7GiB
|
||||
Events: <none>
|
||||
```
|
||||
|
||||
A pv and pvc named `imagenet` are successfully created. So far, the dataset stored on cloud has been successfully deployed to the kubernetes cluster.
|
||||
|
||||
```shell
|
||||
$ kubectl get pv,pvc
|
||||
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
|
||||
persistentvolume/imagenet 100Gi RWX Retain Bound default/imagenet 7m11s
|
||||
|
||||
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
|
||||
persistentvolumeclaim/imagenet Bound imagenet 100Gi RWX 7m11s
|
||||
```
|
||||
|
||||
## Example: Run Deep Learning Frameworks Using Arena
|
||||
|
||||
`arena` provides a convenient way to help users submit and monitor machine learning tasks. In this article, we use `arena` to simplify the deployment process of machine learning tasks.
|
||||
|
||||
If you have installed `arena` and dataset has been successfully deployed to the local cluster, you can start training a ResNet50 model by simply executing the following command:
|
||||
|
||||
```shell
|
||||
arena submit mpi \
|
||||
--name horovod-resnet50-v2-4x8-fluid \
|
||||
--gpus=8 \
|
||||
--workers=4 \
|
||||
--working-dir=/horovod-demo/tensorflow-demo/ \
|
||||
--data imagenet:/data \
|
||||
-e DATA_DIR=/data/imagenet \
|
||||
-e num_batch=1000 \
|
||||
-e datasets_num_private_threads=8 \
|
||||
--image=registry.cn-hangzhou.aliyuncs.com/tensorflow-samples/horovod-benchmark-dawnbench-v2:0.18.1-tf1.14.0-torch1.2.0-mxnet1.5.0-py3.6 \
|
||||
./launch-example.sh 4 8
|
||||
```
|
||||
|
||||
Notes:
|
||||
|
||||
- `--name`:specify the name of job, `horovod-resnet50-v2-4x8-fluid` in this example
|
||||
- `--workers`:specify the number of nodes (workers) participating in training
|
||||
- `--gpus`:specify the number of GPUs used by each worker
|
||||
- `--working-dir`:specify working directory
|
||||
- `--data`:tell workers to mount a volume named `imagenet` to the directory `/data`
|
||||
- `-e DATA_DIR`:specify the directory where dataset locates
|
||||
- `./launch-example.sh 4 8`:run shell script to launch training process
|
||||
|
||||
Check whether the task is executed normally:
|
||||
|
||||
```shell
|
||||
$ arena get horovod-resnet50-v2-4x8-fluid -e
|
||||
STATUS: RUNNING
|
||||
NAMESPACE: default
|
||||
PRIORITY: N/A
|
||||
TRAINING DURATION: 16s
|
||||
|
||||
NAME STATUS TRAINER AGE INSTANCE NODE
|
||||
horovod-resnet50-v2-4x8-fluid RUNNING MPIJOB 16s horovod-resnet50-v2-4x8-fluid-launcher-czlfn 192.168.1.21
|
||||
horovod-resnet50-v2-4x8-fluid RUNNING MPIJOB 16s horovod-resnet50-v2-4x8-fluid-worker-0 192.168.1.16
|
||||
horovod-resnet50-v2-4x8-fluid RUNNING MPIJOB 16s horovod-resnet50-v2-4x8-fluid-worker-1 192.168.1.21
|
||||
horovod-resnet50-v2-4x8-fluid RUNNING MPIJOB 16s horovod-resnet50-v2-4x8-fluid-worker-2 192.168.1.25
|
||||
horovod-resnet50-v2-4x8-fluid RUNNING MPIJOB 16s horovod-resnet50-v2-4x8-fluid-worker-3 192.168.3.29
|
||||
```
|
||||
|
||||
If you find that `4` workers are in the `RUNNING` state, congratulations! It means that you have successfully started the training.
|
||||
|
||||
If you want to know where the training is going, please check the arena log:
|
||||
|
||||
```shell
|
||||
$ arena logs --tail 100 -f horovod-resnet50-v2-4x8-fluid
|
||||
```
|
||||
|
|
@ -0,0 +1 @@
|
|||
# warmup_en
|
|
@ -0,0 +1 @@
|
|||
# diagnose
|
|
@ -0,0 +1,182 @@
|
|||
# Get Started with fluid
|
||||
|
||||
This document mainly describes how to create a Kubernetes cluster environment, complete fluid installation and deployment with Helm, and use fluid to create a data set and speed up your application.
|
||||
|
||||
## Create a Kubernetes cluster:
|
||||
A Kubernetes environment is prerequisite for fluid,choose the most suitable solution to get it based on your experience:
|
||||
|
||||
- If you have already had a Kubernetes cluster, you can skip to step [Deploy fluid](#deploy-fluid).
|
||||
- If you have not used Kubernetes before, you can use Minikube to create a Kubernetes cluster.
|
||||
[Minikube](https://kubernetes.io/docs/setup/minikube/) can create a Kubernetes cluster in a virtual machine, which can run on macOS, Linux and Windows.
|
||||
|
||||
Please ensure that the following requirements are met:
|
||||
|
||||
- [Minikube](https://kubernetes.io/docs/tasks/tools/install-minikube/) :version 1.0.0+
|
||||
- [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl) : version 1.14+
|
||||
|
||||
After installing Minikube:
|
||||
```shell
|
||||
minikube start
|
||||
```
|
||||
|
||||
If the installation is successful, you will get prompt message like this:
|
||||
```shell
|
||||
minikube v1.12.1 on Darwin 10.14.5
|
||||
```
|
||||
|
||||
Use `kubectl` to access the newly created Kubernetes cluster
|
||||
```shell
|
||||
$ kubectl get pods
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
nginx-deployment-558fc78868-kvjnf 1/1 Running 1 4d12h
|
||||
nginx-deployment-558fc78868-kx9gt 1/1 Running 1 4d12h
|
||||
```
|
||||
|
||||
## Deploy fluid
|
||||
Before the installation, make sure that the following requirements have been met:
|
||||
|
||||
- You can access the Kubernetes cluster with `kubectl` successfully.
|
||||
- [Helm](https://helm.sh/docs/intro/install/): Helm 3 is installed.
|
||||
- [Git](): Git is installed
|
||||
1. Download fluid
|
||||
```shell
|
||||
git clone https://github.com/fluid-cloudnative/fluid.git
|
||||
cd fluid/charts/fluid
|
||||
```
|
||||
2. Install fluid with Helm
|
||||
```shell
|
||||
helm install fluid fluid
|
||||
NAME: fluid
|
||||
LAST DEPLOYED: Tue Jul 7 11:22:07 2020
|
||||
NAMESPACE: default
|
||||
STATUS: deployed
|
||||
REVISION: 1
|
||||
TEST SUITE: None
|
||||
```
|
||||
3. Check installation results
|
||||
```shell
|
||||
kubectl get po -n fluid-system
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
controller-manager-6b864dfd4f-995gm 1/1 Running 0 32h
|
||||
csi-nodeplugin-fluid-c6pzj 2/2 Running 0 32h
|
||||
csi-nodeplugin-fluid-wczmq 2/2 Running 0 32h
|
||||
```
|
||||
|
||||
## Create a dataset
|
||||
Fluid provides cloud-native data acceleration and management capabilities, and use `dataset` as a high-level abstraction to facilitate user management. Here we will show you how to create a dataset with fluid.
|
||||
1. Create a Dataset object through the CRD file, which describes the source of the dataset.
|
||||
```yaml
|
||||
apiVersion: data.fluid.io/v1alpha1
|
||||
kind: Dataset
|
||||
metadata:
|
||||
name: demo
|
||||
spec:
|
||||
mounts:
|
||||
- mountPoint: https://mirror.bit.edu.cn/apache/spark/spark-3.0.0/
|
||||
name: spark
|
||||
```
|
||||
Create dataset with kubectl
|
||||
|
||||
```shell
|
||||
kubectl create -f dataset.yaml
|
||||
```
|
||||
After the dataset is created, it is in the `not bound` state and needs to be bound to a runtime to use it.
|
||||
|
||||
|
||||
2. Also we create an Alluxio Runtime object based on the alluxioRuntimeCRD file, which enables the dataset.
|
||||
|
||||
```yaml
|
||||
apiVersion: data.fluid.io/v1alpha1
|
||||
kind: AlluxioRuntime
|
||||
metadata:
|
||||
name: demo
|
||||
spec:
|
||||
replicas: 1
|
||||
tieredstore:
|
||||
levels:
|
||||
- mediumtype: MEM
|
||||
path: /dev/shm
|
||||
quota: 2Gi
|
||||
high: "0.95"
|
||||
low: "0.7"
|
||||
storageType: Memory
|
||||
properties:
|
||||
alluxio.user.file.writetype.default: MUST_CACHE
|
||||
alluxio.master.journal.folder: /journal
|
||||
alluxio.master.journal.type: UFS
|
||||
alluxio.user.block.size.bytes.default: 256MB
|
||||
alluxio.user.streaming.reader.chunk.size.bytes: 256MB
|
||||
alluxio.user.local.reader.chunk.size.bytes: 256MB
|
||||
alluxio.worker.network.reader.buffer.size: 256MB
|
||||
alluxio.user.streaming.data.timeout: 300sec
|
||||
master:
|
||||
jvmOptions:
|
||||
- "-Xmx4G"
|
||||
worker:
|
||||
jvmOptions:
|
||||
- "-Xmx4G"
|
||||
fuse:
|
||||
jvmOptions:
|
||||
- "-Xmx4G "
|
||||
- "-Xms4G "
|
||||
# For now, only support local
|
||||
shortCircuitPolicy: local
|
||||
args:
|
||||
- fuse
|
||||
- --fuse-opts=direct_io,ro,max_read=131072
|
||||
```
|
||||
|
||||
Create Alluxio Runtime with kubectl
|
||||
|
||||
```shell
|
||||
kubectl create -f runtime.yaml
|
||||
```
|
||||
|
||||
3. Next, we create an application to access this dataset. Here we will access the same data multiple times and compare the time consumed by each access.
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: demo-app
|
||||
spec:
|
||||
containers:
|
||||
- name: demo
|
||||
image: nginx
|
||||
volumeMounts:
|
||||
- mountPath: /data
|
||||
name: demo
|
||||
volumes:
|
||||
- name: demo
|
||||
persistentVolumeClaim:
|
||||
claimName: demo
|
||||
```
|
||||
|
||||
Create Pod with kubectl
|
||||
|
||||
```shell
|
||||
kubectl create -f app.yaml
|
||||
```
|
||||
|
||||
4. Dive into the container to access data, the first access will take longer.
|
||||
```
|
||||
kubectl exec -it demo-app -- bash
|
||||
# du -sh /data/spark/spark-3.0.0-bin-without-hadoop.tgz
|
||||
150M /data/spark/spark-3.0.0-bin-without-hadoop.tgz
|
||||
# time cp /data/spark/spark-3.0.0-bin-without-hadoop.tgz /dev/null
|
||||
real 0m13.171s
|
||||
user 0m0.002s
|
||||
sys 0m0.028s
|
||||
```
|
||||
|
||||
5. In order to avoid the influence of other factors like page cache, we will delete the previous container, create the same application, and try to access the same file. Since the file has been cached by alluxio at this time, you can see that it takes significantly less time now.
|
||||
```
|
||||
kubectl delete -f app.yaml && kubectl create -f app.yaml
|
||||
...
|
||||
# time cp /data/spark/spark-3.0.0-bin-without-hadoop.tgz /dev/null
|
||||
real 0m0.344s
|
||||
user 0m0.002s
|
||||
sys 0m0.020s
|
||||
```
|
||||
|
||||
At this point, we have successfully created a data set and completed the acceleration. For the further use and management of the dataset, please refer to the two examples of [accelerate](../samples/accelerate_data_accessing.md) and [co-locality](../samples/data_co_locality.md).
|
|
@ -0,0 +1,87 @@
|
|||
# Deploy Fluid on Your Kubernetes Cluster
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- git
|
||||
|
||||
- kubernetes cluster(version >= 1.14), and support CSI
|
||||
- kubectl(version >= 1.14)
|
||||
- [helm](https://helm.sh/)(version >= 3.0)
|
||||
|
||||
The following documents assume that you have installed all the above requirements.
|
||||
|
||||
For the installation and configuration of kubectl, please refer to [here](https://kubernetes.io/docs/tasks/tools/install-kubectl/).
|
||||
|
||||
For the installation and configuration of Helm 3, please refer to [here](https://v3.helm.sh/docs/intro/install/).
|
||||
|
||||
## How to Deploy
|
||||
|
||||
### Download Fluid Chart
|
||||
|
||||
You can execute the following command in any folder to clone source code from [fluid repository](https://github.com/fluid-cloudnative/fluid):
|
||||
|
||||
```shell
|
||||
$ git clone https://github.com/fluid-cloudnative/fluid.git
|
||||
```
|
||||
|
||||
[helm charts](https://github.com/fluid-cloudnative/fluid/tree/master/charts) used to deploy Fluid is included in source code.
|
||||
|
||||
### Install Fluid with Helm
|
||||
|
||||
Enter the cloned local repository:
|
||||
|
||||
```shell
|
||||
$ cd fluid
|
||||
```
|
||||
|
||||
Create namespace:
|
||||
|
||||
```shell
|
||||
$ kubectl create ns fluid-system
|
||||
```
|
||||
|
||||
Install fluid with:
|
||||
|
||||
```shell
|
||||
$ helm install fluid charts/fluid/fluid
|
||||
NAME: fluid
|
||||
LAST DEPLOYED: Fri Jul 24 16:10:18 2020
|
||||
NAMESPACE: default
|
||||
STATUS: deployed
|
||||
REVISION: 1
|
||||
TEST SUITE: None
|
||||
```
|
||||
|
||||
> The general format of the `helm install` command is like: `helm install <RELEASE_NAME> <SOURCE>`. In the above command, `fluid` means the release name, and `charts/fluid/fluid` specify the path to the helm chart.
|
||||
|
||||
### Check Status of Component
|
||||
|
||||
**Check CRD used by Fluid:**
|
||||
|
||||
```shell
|
||||
$ kubectl get crd | grep data.fluid.io
|
||||
alluxiodataloads.data.fluid.io 2020-07-24T06:54:50Z
|
||||
alluxioruntimes.data.fluid.io 2020-07-24T06:54:50Z
|
||||
datasets.data.fluid.io 2020-07-24T06:54:50Z
|
||||
```
|
||||
|
||||
**Check the status of pods:**
|
||||
|
||||
```shell
|
||||
$ kubectl get pod -n fluid-system
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
controller-manager-7f99c884dd-894g9 1/1 Running 0 5m28s
|
||||
csi-nodeplugin-fluid-dm9b8 2/2 Running 0 5m28s
|
||||
csi-nodeplugin-fluid-hwtvh 2/2 Running 0 5m28s
|
||||
```
|
||||
|
||||
If the Pod status is as shown above, then Fluid is installed on your Kubernetes cluster successfully!
|
||||
|
||||
### Uninstall Fluid
|
||||
|
||||
```shell
|
||||
$ helm delete fluid
|
||||
$ kubectl delete -f charts/fluid/fluid/crds
|
||||
```
|
||||
|
||||
> The `fluid` here means the <RELEASE_NAME> during installation.
|
|
@ -0,0 +1,15 @@
|
|||
# Overview
|
||||
|
||||
[Fluid](https://github.com/fluid-cloudnative/fluid) is An open source Kubernetes-native Distributed Dataset Orchestrator and Accelerator for Data Analysis and Machine Learning. It provides a full management life-cycle for Data orchastration system(Alluxio) including deployment, scaling, configuratio changes. With Fluid, the end user can manage the data without touching the Data Caching System.
|
||||
|
||||
|
||||
> **Note:**
|
||||
>
|
||||
> You can only deploy Fluid in a Kubernetes cluster.
|
||||
|
||||
The corresponding relationship between Fluid and Alluxio versions is as follows:
|
||||
|
||||
| Fluid version | Compatible Alluxio versions |
|
||||
|:---|:---|
|
||||
| v0.1 | [Alluxio JNI Fuse 2.3](https://github.com/Alluxio/alluxio/tree/branch-2.3-fuse)|
|
||||
|
|
@ -1,65 +0,0 @@
|
|||
## 安装Fluid
|
||||
本文档假设您已经有可用并可以访问的Kubernetes集群。
|
||||
|
||||
### 要求
|
||||
- Kubernetes >=1.16, kubectl >= 1.16
|
||||
- Helm 3
|
||||
|
||||
对于kubectl的安装和配置,请参考[此处](https://kubernetes.io/docs/tasks/tools/install-kubectl/)
|
||||
|
||||
对于Helm 3的安装和配置,请参考[此处](https://v3.helm.sh/docs/intro/install/)
|
||||
|
||||
### 步骤
|
||||
1\. 通过export KUBECONFIG=<your-kubeconfig-path>或创建`~/.kube/config`以准备kubeconfig文件
|
||||
|
||||
2\. 检查helm能否正常管理Kubernetes集群
|
||||
```shell script
|
||||
$ helm list
|
||||
$ echo $?
|
||||
```
|
||||
|
||||
3\. 获取Fluid Chart
|
||||
```shell script
|
||||
$ cd <some-dir>
|
||||
$ wget http://kubeflow.oss-cn-beijing.aliyuncs.com/fluid-0.1.0.tgz
|
||||
$ tar -xvf fluid-0.1.0.tgz
|
||||
```
|
||||
|
||||
4\. 使用Helm安装Fluid
|
||||
```shell script
|
||||
$ helm install <release-name> fluid
|
||||
NAME: <release-name>
|
||||
LAST DEPLOYED: Fri Jul 24 16:10:18 2020
|
||||
NAMESPACE: default
|
||||
STATUS: deployed
|
||||
REVISION: 1
|
||||
TEST SUITE: None
|
||||
```
|
||||
`<release-name>`是任何您喜欢的名字(e.g. `fluid-release`),该名字用于Helm的Release管理
|
||||
|
||||
5\. 检查各组件状态
|
||||
|
||||
**查看Fluid使用的CRD:**
|
||||
```shell script
|
||||
$ kubectl get crd | grep data.fluid.io
|
||||
alluxiodataloads.data.fluid.io 2020-07-24T06:54:50Z
|
||||
alluxioruntimes.data.fluid.io 2020-07-24T06:54:50Z
|
||||
datasets.data.fluid.io 2020-07-24T06:54:50Z
|
||||
```
|
||||
|
||||
**查看各Pod的状态:**
|
||||
```shell script
|
||||
$ kubectl get pod -n fluid-system
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
controller-manager-7f99c884dd-894g9 1/1 Running 0 5m28s
|
||||
csi-nodeplugin-fluid-dm9b8 2/2 Running 0 5m28s
|
||||
csi-nodeplugin-fluid-hwtvh 2/2 Running 0 5m28s
|
||||
```
|
||||
如果Pod状态如上所示,那么Fluid就可以正常使用了!
|
||||
|
||||
6\. 卸载Fluid
|
||||
```shell script
|
||||
$ helm del <release-name>
|
||||
```
|
||||
`<release-name>`可以通过`helm list | grep fluid`查看
|
||||
|
Binary file not shown.
After Width: | Height: | Size: 7.1 KiB |
|
@ -0,0 +1,35 @@
|
|||
#!/bin/bash
|
||||
|
||||
|
||||
MAINFONT="WenQuanYi Micro Hei"
|
||||
MONOFONT="WenQuanYi Micro Hei Mono"
|
||||
|
||||
# MAINFONT="Tsentsiu Sans HG"
|
||||
# MONOFONT="Tsentsiu Sans Console HG"
|
||||
|
||||
#_version_tag="$(date '+%Y%m%d').$(git rev-parse --short HEAD)"
|
||||
_version_tag="$(date '+%Y%m%d')"
|
||||
|
||||
# default version: `pandoc --latex-engine=xelatex doc.md -s -o output2.pdf`
|
||||
# used to debug template setting error
|
||||
lang="en zh"
|
||||
|
||||
for d in ${lang}
|
||||
do
|
||||
if [ $d = "en" ]; then
|
||||
docs_title=" Fluid Documentation"
|
||||
else
|
||||
docs_title=" Fluid 用户文档"
|
||||
fi
|
||||
pandoc -N --toc --smart --latex-engine=xelatex \
|
||||
--template=templates/template.tex \
|
||||
--columns=120 \
|
||||
--listings \
|
||||
-V title="$docs_title" \
|
||||
-V author="Fluid" \
|
||||
-V date="${_version_tag}" \
|
||||
-V CJKmainfont="${MAINFONT}" \
|
||||
-V fontsize=12pt \
|
||||
-V geometry:margin=1in \
|
||||
"$d/doc.md" -s -o "output_$d.pdf"
|
||||
done
|
|
@ -0,0 +1,176 @@
|
|||
#!/usr/bin/env python3
|
||||
# coding: utf8
|
||||
#
|
||||
# Generate all-in-one Markdown file for ``doc-cn``
|
||||
# Tip: 不支持中文文件名
|
||||
# readme.md 中的目录引用的md多次(或者md的sub heading),以第一次出现为主
|
||||
# 每个版本都会生成一个自己的 PDF
|
||||
|
||||
from __future__ import print_function, unicode_literals
|
||||
|
||||
import re
|
||||
import os
|
||||
import sys
|
||||
|
||||
followups = []
|
||||
in_toc = False
|
||||
contents = []
|
||||
lang = sys.argv[1]
|
||||
# pattern=[]()
|
||||
hyper_link_pattern = re.compile(r'\[(.*?)\]\((.*?)(#.*?)?\)')
|
||||
# pattern= -- []()
|
||||
toc_line_pattern = re.compile(r'([\-\+]+)\s\[(.*?)\]\((.*?)(#.*?)?\)')
|
||||
# pattern= ! []()
|
||||
image_link_pattern = re.compile(r'!\[(.*?)\]\((.*?)\)')
|
||||
# pattern= -+
|
||||
level_pattern = re.compile(r'(\s*[\-\+]+)\s')
|
||||
# match all headings
|
||||
heading_patthern = re.compile(r'(^#+|\n#+)\s')
|
||||
|
||||
entry_file = lang + "TOC.md"
|
||||
|
||||
# stage 1, parse toc
|
||||
with open(entry_file) as fp:
|
||||
level = 0
|
||||
current_level = ""
|
||||
for line in fp:
|
||||
if not in_toc and line.startswith("## "):
|
||||
in_toc = True
|
||||
elif in_toc and line.startswith('## '):
|
||||
in_toc = False
|
||||
# yes, toc processing done
|
||||
# contents.append(line[1:]) # skip 1 level TOC
|
||||
break
|
||||
## line.strip避免添加空行
|
||||
elif in_toc and not line.startswith('#') and line.strip():
|
||||
## get level from space length
|
||||
level_space_str = level_pattern.findall(line)[0][:-1]
|
||||
#pingcap 文档两个空格一级缩进
|
||||
level = len(level_space_str) // 2 + 1 ## python divide get integer
|
||||
|
||||
matches = toc_line_pattern.findall(line)
|
||||
if matches:
|
||||
for match in matches:
|
||||
fpath = match[2]
|
||||
if fpath.startswith('http'):
|
||||
## remove list format character `- `, `+ `
|
||||
followups.append(('TOC', level, line.strip()[2:]))
|
||||
elif fpath.endswith('.md'):
|
||||
# remove first slash from the fpath
|
||||
key = ('FILE', level, fpath)
|
||||
if key not in followups:
|
||||
followups.append(key)
|
||||
else:
|
||||
name = line.strip().split(None, 1)[-1]
|
||||
key = ('TOC', level, name)
|
||||
if key not in followups:
|
||||
followups.append(key)
|
||||
|
||||
else:
|
||||
pass
|
||||
|
||||
# overview part in README.md
|
||||
followups.insert(1, ("RAW", 0, fp.read()))
|
||||
# stage 2, get file heading
|
||||
file_link_name = {}
|
||||
title_pattern = re.compile(r'(^#+)\s.*')
|
||||
for tp, lv, f in followups:
|
||||
if tp != 'FILE':
|
||||
continue
|
||||
try:
|
||||
for line in open(lang + f).readlines():
|
||||
if line.startswith("#"):
|
||||
tag = line.strip()
|
||||
break
|
||||
except Exception as e:
|
||||
print(e)
|
||||
tag = ""
|
||||
if tag.startswith('# '):
|
||||
tag = tag[2:]
|
||||
elif tag.startswith('## '):
|
||||
tag = tag[3:]
|
||||
file_link_name[f] = tag.lower().replace(' ', '-')
|
||||
|
||||
def replace_link_wrap(chapter, name):
|
||||
# Note: 仅仅支持 hash 匹配,如果在多个文档中有同名 heading 会碰撞
|
||||
# 支持 chapter 文档中的 ./ddd.md, xxx.md, xxx.md#xxx 等
|
||||
def replace_link(match):
|
||||
full = match.group(0)
|
||||
link_name = match.group(1)
|
||||
link = match.group(2)
|
||||
frag = match.group(3)
|
||||
if link.startswith('https'):
|
||||
return '[{}]({})'.format(link_name,link)
|
||||
if link.endswith('.md') or '.md#' in link:
|
||||
if link.startswith('../'):
|
||||
link=link[3:]
|
||||
if not frag:
|
||||
for fpath in file_link_name:
|
||||
if link == fpath:
|
||||
frag = '#' + file_link_name[fpath]
|
||||
return '[%s](%s)' % (link_name, frag)
|
||||
elif link.endswith('.png') or link.endswith('.jpeg') or link.endswith('.svg') or link.endswith('.gif') or link.endswith('.jpg'):
|
||||
# special handing for pic
|
||||
img_link = re.sub(r'[\.\/]*media\/', './media/', link, count=0, flags=0)
|
||||
return '[%s](%s)' % (link_name, img_link)
|
||||
else:
|
||||
return full
|
||||
|
||||
return hyper_link_pattern.sub(replace_link, chapter)
|
||||
|
||||
|
||||
def replace_heading_func(diff_level=0):
|
||||
|
||||
def replace_heading(match):
|
||||
if diff_level == 0:
|
||||
return match.group(0)
|
||||
else:
|
||||
return '\n' + '#' * (match.group(0).count('#') + diff_level) + ' '
|
||||
|
||||
|
||||
return replace_heading
|
||||
|
||||
def replace_img_link(match):
|
||||
full = match.group(0)
|
||||
link_name = match.group(1)
|
||||
link = match.group(2)
|
||||
|
||||
if link.endswith('.png'):
|
||||
fname = os.path.basename(link)
|
||||
return '![%s](./media/%s)' % (link_name, fname)
|
||||
|
||||
# stage 3, concat files
|
||||
for type_, level, name in followups:
|
||||
if type_ == 'TOC':
|
||||
contents.append("\n{} {}\n".format('#' * level, name))
|
||||
elif type_ == 'RAW':
|
||||
contents.append(name)
|
||||
elif type_ == 'FILE':
|
||||
igore='api_doc.md'
|
||||
if igore in name:
|
||||
contents.append('# api reference\n\n')
|
||||
continue
|
||||
try:
|
||||
with open(lang + name) as fp:
|
||||
chapter = fp.read()
|
||||
chapter = replace_link_wrap(chapter, name)
|
||||
# chapter = image_link_pattern.sub(replace_img_link, chapter)
|
||||
|
||||
# fix heading level
|
||||
off=heading_patthern.findall(chapter)
|
||||
if len(off)!=0:
|
||||
diff_level = level - off[0].count('#')
|
||||
|
||||
#print(name, type_, level, diff_level)
|
||||
chapter = heading_patthern.sub(replace_heading_func(diff_level), chapter)
|
||||
contents.append(chapter)
|
||||
contents.append('') # add an empty line
|
||||
except Exception as e:
|
||||
print(e)
|
||||
print("generate file error: ignore!")
|
||||
|
||||
# stage 4, generage final doc.md
|
||||
target_doc_file = lang + 'doc.md'
|
||||
with open(target_doc_file, 'w') as fp:
|
||||
fp.write('\n'.join(contents))
|
||||
contents = []
|
|
@ -0,0 +1,16 @@
|
|||
import PyPDF2
|
||||
import sys
|
||||
|
||||
lang=sys.argv[1]
|
||||
offset=0
|
||||
merger=PyPDF2.PdfFileMerger()
|
||||
target=[]
|
||||
target.append("output_{}.pdf".format(lang))
|
||||
target.append("api.pdf")
|
||||
output="docs_{}.pdf".format(lang)
|
||||
|
||||
for pdf in target:
|
||||
merger.merge(offset,pdf)
|
||||
pn=PyPDF2.PdfFileReader(pdf).getNumPages()
|
||||
offset+=pn
|
||||
merger.write(output)
|
|
@ -0,0 +1,293 @@
|
|||
\documentclass[$if(fontsize)$$fontsize$,$endif$$if(lang)$$lang$,$endif$$if(papersize)$$papersize$,$endif$$for(classoption)$$classoption$$sep$,$endfor$]{$documentclass$}
|
||||
$if(fontfamily)$
|
||||
\usepackage{$fontfamily$}
|
||||
$else$
|
||||
\usepackage{lmodern}
|
||||
$endif$
|
||||
$if(linestretch)$
|
||||
\usepackage{setspace}
|
||||
\setstretch{$linestretch$}
|
||||
$endif$
|
||||
\usepackage{amssymb,amsmath}
|
||||
\usepackage{ifxetex,ifluatex}
|
||||
\usepackage{fixltx2e} % provides \textsubscript
|
||||
\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
|
||||
\usepackage[T1]{fontenc}
|
||||
\usepackage[utf8]{inputenc}
|
||||
$if(euro)$
|
||||
\usepackage{eurosym}
|
||||
$endif$
|
||||
\else % if luatex or xelatex
|
||||
\ifxetex
|
||||
\usepackage{mathspec}
|
||||
\usepackage{xltxtra,xunicode}
|
||||
$if(CJKmainfont)$
|
||||
\usepackage{xeCJK}
|
||||
\setCJKmainfont{$CJKmainfont$}
|
||||
$endif$
|
||||
\else
|
||||
\usepackage{fontspec}
|
||||
\fi
|
||||
\defaultfontfeatures{Mapping=tex-text,Scale=MatchLowercase}
|
||||
\newcommand{\euro}{€}
|
||||
$if(mainfont)$
|
||||
\setmainfont{$mainfont$}
|
||||
$endif$
|
||||
$if(sansfont)$
|
||||
\setsansfont{$sansfont$}
|
||||
$endif$
|
||||
$if(monofont)$
|
||||
\setmonofont[Mapping=tex-ansi]{$monofont$}
|
||||
$endif$
|
||||
$if(mathfont)$
|
||||
\setmathfont(Digits,Latin,Greek){$mathfont$}
|
||||
$endif$
|
||||
\fi
|
||||
% use upquote if available, for straight quotes in verbatim environments
|
||||
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
|
||||
% use microtype if available
|
||||
\IfFileExists{microtype.sty}{%
|
||||
\usepackage{microtype}
|
||||
\UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
|
||||
}{}
|
||||
$if(geometry)$
|
||||
\usepackage[$for(geometry)$$geometry$$sep$,$endfor$]{geometry}
|
||||
$endif$
|
||||
$if(natbib)$
|
||||
\usepackage{natbib}
|
||||
\bibliographystyle{$if(biblio-style)$$biblio-style$$else$plainnat$endif$}
|
||||
$endif$
|
||||
$if(biblatex)$
|
||||
\usepackage{biblatex}
|
||||
$if(biblio-files)$
|
||||
\bibliography{$biblio-files$}
|
||||
$endif$
|
||||
$endif$
|
||||
$if(listings)$
|
||||
|
||||
\usepackage{xcolor}
|
||||
\usepackage{listings}
|
||||
\lstset{
|
||||
basicstyle=\ttfamily,
|
||||
keywordstyle=\color[rgb]{0.13,0.29,0.53}\bfseries,
|
||||
stringstyle=\color[rgb]{0.31,0.60,0.02},
|
||||
commentstyle=\color[rgb]{0.56,0.35,0.01}\itshape,
|
||||
numberstyle=\footnotesize,
|
||||
frame=single,
|
||||
showspaces=false, % show spaces everywhere adding particular underscores; it overrides 'showstringspaces'
|
||||
showstringspaces=false, % underline spaces within strings only
|
||||
columns=flexible,
|
||||
breaklines=true,
|
||||
postbreak=\raisebox{0ex}[0ex][0ex]{\ensuremath{\color{gray}\hookrightarrow\space}}
|
||||
}
|
||||
|
||||
$endif$
|
||||
$if(lhs)$
|
||||
\lstnewenvironment{code}{\lstset{columns=flexible,breaklines=true,language=Haskell,basicstyle=\small\ttfamily}}{}
|
||||
$endif$
|
||||
$if(highlighting-macros)$
|
||||
$highlighting-macros$
|
||||
$endif$
|
||||
$if(verbatim-in-note)$
|
||||
\usepackage{fancyvrb}
|
||||
$endif$
|
||||
$if(tables)$
|
||||
\usepackage{longtable,booktabs}
|
||||
$if(beamer)$
|
||||
\usepackage{caption}
|
||||
% Make caption package work with longtable
|
||||
\makeatletter
|
||||
\def\fnum@table{\tablename~\thetable}
|
||||
\makeatother
|
||||
$else$
|
||||
% Correct order of tables after \paragraph or \subparagraph
|
||||
\usepackage{etoolbox}
|
||||
\makeatletter
|
||||
\patchcmd\longtable{\par}{\if@noskipsec\mbox{}\fi\par}{}{}
|
||||
\makeatother
|
||||
% Allow footnotes in longtable head/foot
|
||||
\IfFileExists{footnotehyper.sty}{\usepackage{footnotehyper}}{\usepackage{footnote}}
|
||||
\makesavenoteenv{longtable}
|
||||
$endif$
|
||||
$endif$
|
||||
$if(graphics)$
|
||||
\usepackage{graphicx}
|
||||
\makeatletter
|
||||
\def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth\else\Gin@nat@width\fi}
|
||||
\def\maxheight{\ifdim\Gin@nat@height>\textheight\textheight\else\Gin@nat@height\fi}
|
||||
\makeatother
|
||||
% Scale images if necessary, so that they will not overflow the page
|
||||
% margins by default, and it is still possible to overwrite the defaults
|
||||
% using explicit options in \includegraphics[width, height, ...]{}
|
||||
\setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio}
|
||||
$endif$
|
||||
\ifxetex
|
||||
\usepackage[setpagesize=false, % page size defined by xetex
|
||||
unicode=false, % unicode breaks when used with xetex
|
||||
xetex]{hyperref}
|
||||
\else
|
||||
\usepackage[unicode=true]{hyperref}
|
||||
\fi
|
||||
\hypersetup{breaklinks=true,
|
||||
bookmarks=true,
|
||||
pdfauthor={$author-meta$},
|
||||
pdftitle={$title-meta$},
|
||||
colorlinks=true,
|
||||
citecolor=$if(citecolor)$$citecolor$$else$blue$endif$,
|
||||
urlcolor=$if(urlcolor)$$urlcolor$$else$blue$endif$,
|
||||
linkcolor=$if(linkcolor)$$linkcolor$$else$magenta$endif$,
|
||||
pdfborder={0 0 0}}
|
||||
\urlstyle{same} % don't use monospace font for urls
|
||||
$if(links-as-notes)$
|
||||
% Make links footnotes instead of hotlinks:
|
||||
\renewcommand{\href}[2]{#2\footnote{\url{#1}}}
|
||||
$endif$
|
||||
$if(strikeout)$
|
||||
\usepackage[normalem]{ulem}
|
||||
% avoid problems with \sout in headers with hyperref:
|
||||
\pdfstringdefDisableCommands{\renewcommand{\sout}{}}
|
||||
$endif$
|
||||
\setlength{\parindent}{0pt}
|
||||
\setlength{\parskip}{6pt plus 2pt minus 1pt}
|
||||
\setlength{\emergencystretch}{3em} % prevent overfull lines
|
||||
\providecommand{\tightlist}{%
|
||||
\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
|
||||
$if(numbersections)$
|
||||
\setcounter{secnumdepth}{5}
|
||||
$else$
|
||||
\setcounter{secnumdepth}{0}
|
||||
$endif$
|
||||
$if(verbatim-in-note)$
|
||||
\VerbatimFootnotes % allows verbatim text in footnotes
|
||||
$endif$
|
||||
$if(lang)$
|
||||
\ifxetex
|
||||
\usepackage{polyglossia}
|
||||
\setmainlanguage{$mainlang$}
|
||||
\else
|
||||
\usepackage[$lang$]{babel}
|
||||
\fi
|
||||
$endif$
|
||||
|
||||
$if(title)$
|
||||
\title{$title$$if(subtitle)$\\\vspace{0.5em}{\large $subtitle$}$endif$}
|
||||
$endif$
|
||||
$if(author)$
|
||||
\author{$for(author)$$author$$sep$ \and $endfor$}
|
||||
$endif$
|
||||
\date{$date$}
|
||||
$for(header-includes)$
|
||||
$header-includes$
|
||||
$endfor$
|
||||
|
||||
% quote style
|
||||
% http://tex.stackexchange.com/questions/179982/add-a-black-border-to-block-quotations
|
||||
\usepackage{framed}
|
||||
% \usepackage{xcolor}
|
||||
\let\oldquote=\quote
|
||||
\let\endoldquote=\endquote
|
||||
\colorlet{shadecolor}{orange!15}
|
||||
\renewenvironment{quote}{\begin{shaded*}\begin{oldquote}}{\end{oldquote}\end{shaded*}}
|
||||
|
||||
% https://www.zhihu.com/question/25082703/answer/30038248
|
||||
% no cross chapter
|
||||
\usepackage[section]{placeins}
|
||||
% no float everywhere
|
||||
\usepackage{float}
|
||||
\floatplacement{figure}{H}
|
||||
|
||||
% we chinese write article this way
|
||||
\usepackage{indentfirst}
|
||||
\setlength{\parindent}{2em}
|
||||
|
||||
\renewcommand{\contentsname}{Table of Contents}
|
||||
\renewcommand\figurename{Figure}
|
||||
|
||||
% fix overlap toc number and title
|
||||
% https://blog.csdn.net/golden1314521/article/details/39926135
|
||||
\usepackage{titlesec}
|
||||
\usepackage{titletoc}
|
||||
% \titlecontents{标题名}[左间距]{标题格式}{标题标志}{无序号标题}{指引线与页码}[下间距]
|
||||
% fix overlap
|
||||
\titlecontents{subsection}
|
||||
[4em]
|
||||
{}%
|
||||
{\contentslabel{3em}}%
|
||||
{}%
|
||||
{\titlerule*[0.5pc]{$$\cdot$$}\contentspage\hspace*{0em}}%
|
||||
|
||||
\titlecontents{subsubsection}
|
||||
[7em]
|
||||
{}%
|
||||
{\contentslabel{3.5em}}%
|
||||
{}%
|
||||
{\titlerule*[0.5pc]{$$\cdot$$}\contentspage\hspace*{0em}}%
|
||||
|
||||
\usepackage[all]{background}
|
||||
% \backgroundsetup{contents=Fluid.,color=blue,opacity=0.2}
|
||||
\backgroundsetup{contents=\includegraphics{media/logo},
|
||||
placement=top,scale=0.2,hshift=1250pt,vshift=-20pt,
|
||||
opacity=0.8,angle=0}
|
||||
|
||||
% avoid level-4, 5 heading to be connected with following content
|
||||
% https://github.com/jgm/pandoc/issues/1658
|
||||
\let\oldparagraph\paragraph
|
||||
\renewcommand{\paragraph}[1]{\oldparagraph{#1}\mbox{}}
|
||||
\let\oldsubparagraph\subparagraph
|
||||
\renewcommand{\subparagraph}[1]{\oldsubparagraph{#1}\mbox{}}
|
||||
|
||||
\begin{document}
|
||||
|
||||
% no bg at title page
|
||||
\NoBgThispage
|
||||
$if(title)$
|
||||
\maketitle
|
||||
$endif$
|
||||
$if(abstract)$
|
||||
\begin{abstract}
|
||||
$abstract$
|
||||
\end{abstract}
|
||||
$endif$
|
||||
|
||||
$for(include-before)$
|
||||
$include-before$
|
||||
|
||||
$endfor$
|
||||
$if(toc)$
|
||||
{
|
||||
\hypersetup{linkcolor=black}
|
||||
\setcounter{tocdepth}{$toc-depth$}
|
||||
\tableofcontents
|
||||
}
|
||||
$endif$
|
||||
|
||||
$if(lof)$
|
||||
\listoffigures
|
||||
$endif$
|
||||
|
||||
\newpage
|
||||
|
||||
$body$
|
||||
|
||||
$if(natbib)$
|
||||
$if(biblio-files)$
|
||||
$if(biblio-title)$
|
||||
$if(book-class)$
|
||||
\renewcommand\bibname{$biblio-title$}
|
||||
$else$
|
||||
\renewcommand\refname{$biblio-title$}
|
||||
$endif$
|
||||
$endif$
|
||||
\bibliography{$biblio-files$}
|
||||
|
||||
$endif$
|
||||
$endif$
|
||||
$if(biblatex)$
|
||||
\printbibliography$if(biblio-title)$[title=$biblio-title$]$endif$
|
||||
|
||||
$endif$
|
||||
$for(include-after)$
|
||||
$include-after$
|
||||
|
||||
$endfor$
|
||||
\end{document}
|
|
@ -0,0 +1,22 @@
|
|||
# Fluid Documentation
|
||||
|
||||
<!-- markdownlint-disable MD007 -->
|
||||
<!-- markdownlint-disable MD032 -->
|
||||
|
||||
## TOC
|
||||
|
||||
+ Userguide
|
||||
- [Overview](userguide/overview.md)
|
||||
- [Get Started](userguide/get_started.md)
|
||||
- [Installation](userguide/install.md)
|
||||
- [Diagnose](userguide/diagnose.md)
|
||||
+ Samples
|
||||
- [Accelerate Data Accessing](samples/accelerate_data_accessing.md)
|
||||
- [Cache Co-locality](samples/data_co_locality.md)
|
||||
- [Machine Learning](samples/machinelearning.md)
|
||||
- [Dawnbench](samples/dawnbench.md)
|
||||
- [Warm up](samples/warmup.md)
|
||||
+ Developer Guide
|
||||
- [How to develop](dev/how_to_develop.md)
|
||||
- [API_Doc](dev/api_doc.md)
|
||||
|
File diff suppressed because it is too large
Load Diff
|
@ -1,177 +1,184 @@
|
|||
## Fluid开发文档
|
||||
|
||||
### 环境需求
|
||||
- golang 1.13+
|
||||
- docker 19.03+
|
||||
- GNU Make
|
||||
|
||||
对于golang的安装与配置,请参考[此处](https://golang.org/dl/)
|
||||
|
||||
对于docker的安装与配置,请参考[此处](https://docs.docker.com/engine/install/)
|
||||
|
||||
Fluid需要使用`make`命令进行项目构建,使用以下命令安装`make`:
|
||||
|
||||
- Linux
|
||||
- `sudo apt-get install build-essential`
|
||||
|
||||
### 获取Fluid源码
|
||||
不支持Go module:
|
||||
```shell script
|
||||
mkdir -p $GOPATH/src/github.com/cloudnativefluid/
|
||||
cd $GOPATH/src/github.com/cloudnativefluid
|
||||
git clone https://github.com/cheyang/fluid.git
|
||||
```
|
||||
|
||||
支持Go module:
|
||||
```shell script
|
||||
cd <any-place-you-like>
|
||||
git clone https://github.com/cheyang/fluid.git
|
||||
```
|
||||
> 有关Go module可以参阅 [golang 官方文档](https://github.com/golang/go/wiki/Modules) 获取更多信息
|
||||
|
||||
### 编译
|
||||
Fluid项目根目录下的`Makefile`文件已经包含了项目开发中的编译、构建、部署等基本逻辑
|
||||
```shell script
|
||||
# 构建Controller Manager Binary
|
||||
make manager
|
||||
# 构建CSI Binary
|
||||
make csi
|
||||
```
|
||||
构建得到的Binary程序位于`./bin`目录下
|
||||
|
||||
>注意:如果您正在使用Go Module进行项目开发,那么可能需要将Makefile文件中的相关目标的`GO111MODULE=off`修改为`GO111MODULE=on`以使得编译成功
|
||||
|
||||
### 镜像构建
|
||||
```shell script
|
||||
# 为manager镜像命名
|
||||
export IMG=<your-registry>/<your-namespace>/<img-name>
|
||||
# 为CSI插件镜像命名
|
||||
export CSI_IMG=<your-registry>/<your-namespace>/<csi-img-name>
|
||||
|
||||
# 构建manager镜像
|
||||
make docker-build
|
||||
# 构建CSI插件镜像
|
||||
make docker-build-csi
|
||||
```
|
||||
|
||||
在运行Fluid之前,需要将构建的镜像推送到可以访问的镜像仓库中
|
||||
|
||||
1\. 登录镜像仓库:
|
||||
```shell script
|
||||
sudo docker login <docker-registry>
|
||||
```
|
||||
|
||||
2\. 推送镜像:
|
||||
```shell script
|
||||
make docker-push
|
||||
|
||||
make docker-push-csi
|
||||
```
|
||||
|
||||
### 运行
|
||||
接下来的内容将假设在本地环境中已经通过`KUBECONFIG`环境变量或是在`~/.kube/config`文件中配置好了可以访问的Kubernetes集群,您可以通过`kubectl cluster-info`对该配置进行快速检查。更多有关`kubeconfig`的信息可以参考
|
||||
[kubernetes官方文档](https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/)
|
||||
|
||||
> 以下内容将使用`kustomize`,`kubectl 1.14+`已经内置了`kustomize`工具,正在使用`kubectl 1.14`版本以下的开发者请参考 [此处](https://kustomize.io/) 获取有关kustomize的更多信息
|
||||
|
||||
|
||||
0\. 将构建的镜像上传到Kubernetes集群可以访问的镜像仓库
|
||||
> 如果构建并上传的镜像在私有仓库中,请确保在kubernetes集群的各个结点上已经成功执行了`sudo docker login <docker-registry>`操作
|
||||
|
||||
|
||||
1\. 修改`config/fluid/patches`中各文件的镜像名
|
||||
|
||||
```yaml
|
||||
# config/fluid/patches/image_in_manager.yaml
|
||||
...
|
||||
...
|
||||
containers:
|
||||
- name: manager
|
||||
image: <your-registry>/<your-namespace>/<img-name>:<img-tag>
|
||||
```
|
||||
|
||||
```yaml
|
||||
# config/fluid/patches/image_in_csi-plugin.yaml
|
||||
...
|
||||
...
|
||||
containers:
|
||||
- name: plugins
|
||||
image: <your-registry>/<your-namespace>/<csi-img-name>:<img-tag>
|
||||
```
|
||||
|
||||
2\. 创建CRD
|
||||
```shell script
|
||||
kubectl apply -k config/crd
|
||||
```
|
||||
|
||||
3\. 创建Fluid各组件
|
||||
```shell script
|
||||
kubectl apply -k config/fluid
|
||||
```
|
||||
|
||||
4\.编写样例或使用提供的样例
|
||||
```shell script
|
||||
kubectl apply -k config/samples
|
||||
```
|
||||
|
||||
5\.查看各组件的运行情况,确保各组件和样例资源正常运行
|
||||
```shell script
|
||||
$ kubectl get pod -n fluid-system
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
controller-manager-7fd6457ccf-p7j2x 1/1 Running 0 84s
|
||||
csi-nodeplugin-fluid-pj9tv 2/2 Running 0 84s
|
||||
csi-nodeplugin-fluid-t8ctj 2/2 Running 0 84s
|
||||
```
|
||||
```shell script
|
||||
$ kubectl get pod
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
cifar10-fuse-vb6l4 1/1 Running 0 6m15s
|
||||
cifar10-fuse-vtqpx 1/1 Running 0 6m15s
|
||||
cifar10-master-0 2/2 Running 0 8m24s
|
||||
cifar10-worker-729xz 2/2 Running 0 6m15s
|
||||
cifar10-worker-d6kmd 2/2 Running 0 6m15s
|
||||
nginx-0 1/1 Running 0 8m30s
|
||||
nginx-1 1/1 Running 0 8m30s
|
||||
```
|
||||
> 注意: 上述命令可能随您组件的不同实现或是不同的样例产生不同的结果
|
||||
|
||||
6\.通过日志等方法查看您的组件是否运作正常(e.g. `kubectl logs -n fluid-system controller-manager`)
|
||||
|
||||
7\.环境清理
|
||||
```shell script
|
||||
kubectl delete -k config/samples
|
||||
|
||||
kubectl delete -k config/fluid
|
||||
|
||||
kubectl delete -k config/crd
|
||||
```
|
||||
|
||||
### 调试
|
||||
|
||||
**前提条件**
|
||||
|
||||
确保环境中已经安装了go-delve,具体安装过程可以参考[go-delve安装手册](https://github.com/go-delve/delve/tree/master/Documentation/installation)
|
||||
|
||||
**本地调试**
|
||||
```shell script
|
||||
# 让go-delve完成编译工作
|
||||
dlv debug cmd/controller/main.go
|
||||
# 先编译后调试
|
||||
make manager
|
||||
dlv exec bin/manager
|
||||
```
|
||||
|
||||
**远程调试**
|
||||
在开发Fluid时,通常情况下更为常用的方式是远程调试,确保本机和远程主机均已正确安装了go-delve
|
||||
|
||||
在远程主机上:
|
||||
```shell script
|
||||
dlv debug --headless --listen ":12345" --log --api-version=2 cmd/controller/main.go
|
||||
```
|
||||
这将使得远程主机的调试程序监听指定的端口(e.g. 12345)
|
||||
|
||||
在本机上:
|
||||
```shell script
|
||||
dlv connect "<remote-addr>:12345" --api-version=2
|
||||
```
|
||||
# Fluid开发文档
|
||||
|
||||
## 环境需求
|
||||
|
||||
- git
|
||||
|
||||
- golang (version >= 1.13)
|
||||
- docker (version >= 19.03)
|
||||
- Kubernetes (version >= 1.14)
|
||||
- GNU Make
|
||||
|
||||
对于golang的安装与配置,请参考[此处](https://golang.org/dl/)。
|
||||
|
||||
对于docker的安装与配置,请参考[此处](https://docs.docker.com/engine/install/)。
|
||||
|
||||
Fluid需要使用`make`命令进行项目构建,使用以下命令安装`make`:
|
||||
|
||||
- Linux
|
||||
- `sudo apt-get install build-essential`
|
||||
|
||||
## 编译、运行和调试
|
||||
|
||||
### 获取Fluid源码
|
||||
|
||||
```shell
|
||||
$ mkdir -p $GOPATH/src/github.com/cloudnativefluid/
|
||||
$ cd $GOPATH/src/github.com/cloudnativefluid
|
||||
$ git clone https://github.com/fluid-cloudnative/fluid.git
|
||||
```
|
||||
|
||||
> **注意**:本文在非Go Module模式下完成Fluid的编译、运行和调试。
|
||||
>
|
||||
> 有关Go module可以参阅 [golang 官方文档](https://github.com/golang/go/wiki/Modules) 获取更多信息。
|
||||
|
||||
### 编译
|
||||
Fluid项目根目录下的`Makefile`文件已经包含了项目开发中的编译、构建、部署等基本逻辑
|
||||
```shell
|
||||
# 构建Controller Manager Binary
|
||||
$ make manager
|
||||
|
||||
# 构建CSI Binary
|
||||
$ make csi
|
||||
```
|
||||
构建得到的Binary程序位于`./bin`目录下。
|
||||
|
||||
### 镜像构建
|
||||
|
||||
1. 设置镜像名称
|
||||
```shell
|
||||
# 为manager镜像命名
|
||||
$ export IMG=<your-registry>/<your-namespace>/<img-name>
|
||||
# 为CSI插件镜像命名
|
||||
$ export CSI_IMG=<your-registry>/<your-namespace>/<csi-img-name>
|
||||
# 构建manager镜像
|
||||
$ make docker-build
|
||||
# 构建CSI插件镜像
|
||||
$ make docker-build-csi
|
||||
```
|
||||
|
||||
在运行Fluid之前,需要将构建的镜像推送到可以访问的镜像仓库中
|
||||
|
||||
2. 登录镜像仓库:
|
||||
```shell
|
||||
$ sudo docker login <docker-registry>
|
||||
```
|
||||
|
||||
3. 推送镜像:
|
||||
```shell
|
||||
$ make docker-push
|
||||
$ make docker-push-csi
|
||||
```
|
||||
|
||||
### 运行
|
||||
接下来的内容将假设在本地环境中已经通过`KUBECONFIG`环境变量或是在`~/.kube/config`文件中配置好了可以访问的Kubernetes集群,您可以通过`kubectl cluster-info`对该配置进行快速检查。更多有关`kubeconfig`的信息可以参考
|
||||
[kubernetes官方文档](https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/)
|
||||
|
||||
> 以下内容将使用`kustomize`,`kubectl 1.14+`已经内置了`kustomize`工具,正在使用`kubectl 1.14`版本以下的开发者请参考 [此处](https://kustomize.io/) 获取有关kustomize的更多信息
|
||||
|
||||
|
||||
1. 将构建的镜像上传到Kubernetes集群可以访问的镜像仓库
|
||||
> 如果构建并上传的镜像在私有仓库中,请确保在kubernetes集群的各个结点上已经成功执行了`sudo docker login <docker-registry>`操作
|
||||
|
||||
|
||||
2. 修改`config/fluid/patches`中各文件的镜像名
|
||||
```yaml
|
||||
|
||||
# config/fluid/patches/image_in_manager.yaml
|
||||
...
|
||||
...
|
||||
containers:
|
||||
- name: manager
|
||||
image: <your-registry>/<your-namespace>/<img-name>:<img-tag>
|
||||
|
||||
```
|
||||
|
||||
3. 创建CRD
|
||||
```shell
|
||||
$ kubectl apply -k config/crd
|
||||
```
|
||||
|
||||
检查CRD:
|
||||
|
||||
```shell
|
||||
$ kubectl get crd | grep fluid
|
||||
alluxiodataloads.data.fluid.io 2020-08-22T03:53:46Z
|
||||
alluxioruntimes.data.fluid.io 2020-08-22T03:53:46Z
|
||||
datasets.data.fluid.io 2020-08-22T03:53:46Z
|
||||
```
|
||||
|
||||
4. 创建Fluid各组件
|
||||
```shell
|
||||
$ kubectl apply -k config/fluid
|
||||
```
|
||||
|
||||
检查Fluid组件:
|
||||
|
||||
```shell
|
||||
$ kubectl get pod -n fluid-system
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
controller-manager-7fd6457ccf-p7j2x 1/1 Running 0 84s
|
||||
csi-nodeplugin-fluid-pj9tv 2/2 Running 0 84s
|
||||
csi-nodeplugin-fluid-t8ctj 2/2 Running 0 84s
|
||||
```
|
||||
|
||||
5. 编写样例或使用提供的样例
|
||||
```shell
|
||||
$ kubectl apply -k config/samples
|
||||
```
|
||||
|
||||
检查样例pod:
|
||||
|
||||
```shell
|
||||
$ kubectl get pod
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
cifar10-fuse-vb6l4 1/1 Running 0 6m15s
|
||||
cifar10-fuse-vtqpx 1/1 Running 0 6m15s
|
||||
cifar10-master-0 2/2 Running 0 8m24s
|
||||
cifar10-worker-729xz 2/2 Running 0 6m15s
|
||||
cifar10-worker-d6kmd 2/2 Running 0 6m15s
|
||||
nginx-0 1/1 Running 0 8m30s
|
||||
nginx-1 1/1 Running 0 8m30s
|
||||
```
|
||||
> 注意: 上述命令可能随您组件的不同实现或是不同的样例产生不同的结果。
|
||||
|
||||
6. 通过日志等方法查看您的组件是否运作正常(e.g. `kubectl logs -n fluid-system controller-manager`)
|
||||
|
||||
7. 环境清理
|
||||
```shell
|
||||
$ kubectl delete -k config/samples
|
||||
|
||||
$ kubectl delete -k config/fluid
|
||||
|
||||
$ kubectl delete -k config/crd
|
||||
```
|
||||
|
||||
### 调试
|
||||
|
||||
**前提条件**
|
||||
|
||||
确保环境中已经安装了go-delve,具体安装过程可以参考[go-delve安装手册](https://github.com/go-delve/delve/tree/master/Documentation/installation)
|
||||
|
||||
**本地调试**
|
||||
```shell
|
||||
# 让go-delve完成编译工作
|
||||
$ dlv debug cmd/controller/main.go
|
||||
# 先编译后调试
|
||||
$ make manager
|
||||
$ dlv exec bin/manager
|
||||
```
|
||||
|
||||
**远程调试**
|
||||
在开发Fluid时,通常情况下更为常用的方式是远程调试,确保本机和远程主机均已正确安装了go-delve
|
||||
|
||||
在远程主机上:
|
||||
```shell
|
||||
$ dlv debug --headless --listen ":12345" --log --api-version=2 cmd/controller/main.go
|
||||
```
|
||||
这将使得远程主机的调试程序监听指定的端口(e.g. 12345)
|
||||
|
||||
在本机上:
|
||||
```shell
|
||||
$ dlv connect "<remote-addr>:12345" --api-version=2
|
||||
```
|
||||
> 注意:要进行远程调试,请确保远程主机指定的端口未被占用并且已经对远程主机的防火墙进行了适当的配置
|
|
@ -0,0 +1,367 @@
|
|||
# 示例 - 远程文件访问加速
|
||||
通过[Alluxio](https://www.alluxio.io)和[Fuse](https://github.com/libfuse/libfuse),Fluid为用户提供了一种更为简单的文件访问接口,使得任意运行在Kubernetes集群上的程序能够像访问本地文件一样轻松访问存储在远程文件系统中的文件。更为重要的是,Fluid借助Alluxio提供了强大的文件缓存能力,这意味着用户在访问远程文件时,尤其是那些具有较高访问频率的远程文件时,用户可以享受到大幅度的文件访问速度的提升。
|
||||
|
||||
本文档通过一个简单的例子演示了上述功能特性
|
||||
|
||||
## 前提条件
|
||||
在运行该示例之前,请参考[安装文档](../userguide/install.md)完成安装,并检查Fluid各组件正常运行:
|
||||
```shell
|
||||
$ kubectl get pod -n fluid-system
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
controller-manager-7fd6457ccf-jnkvn 1/1 Running 0 60s
|
||||
csi-nodeplugin-fluid-6rhpt 2/2 Running 0 60s
|
||||
csi-nodeplugin-fluid-6zwgl 2/2 Running 0 60s
|
||||
```
|
||||
通常来说,你会看到一个名为“controller-manager”的Pod和多个名为“csi-nodeplugin”的Pod正在运行。其中,“csi-nodeplugin”这些Pod的数量取决于你的Kubernetes集群中结点的数量。
|
||||
|
||||
## 新建工作环境
|
||||
```shell
|
||||
$ mkdir <any-path>/accelerate
|
||||
$ cd <any-path>/accelerate
|
||||
```
|
||||
|
||||
## 运行示例
|
||||
|
||||
**查看待创建的Dataset资源对象**
|
||||
```shell
|
||||
$ cat<<EOF >dataset.yaml
|
||||
apiVersion: data.fluid.io/v1alpha1
|
||||
kind: Dataset
|
||||
metadata:
|
||||
name: hbase
|
||||
spec:
|
||||
mounts:
|
||||
- mountPoint: https://mirrors.tuna.tsinghua.edu.cn/apache/hbase/2.2.5/
|
||||
name: hbase
|
||||
EOF
|
||||
```
|
||||
|
||||
在这里,我们将要创建一个kind为`Dataset`的资源对象(Resource object)。`Dataset`是Fluid所定义的一个Custom Resource Definition(CRD),该CRD被用来告知Fluid在哪里可以找到你所需要的数据。Fluid将该CRD对象中定义的`mountPoint`属性挂载到Alluxio之上,因此该属性可以是任何合法的能够被Alluxio识别的UFS地址。在本示例中,为了简单,我们使用[WebUFS](https://docs.alluxio.io/os/user/stable/en/ufs/WEB.html)进行演示。
|
||||
|
||||
更多有关UFS的信息,请参考[Alluxio文档-底层存储系统](https://docs.alluxio.io/os/user/stable/cn/ufs/OSS.html)部分。
|
||||
|
||||
> 本示例将以Apache镜像站点上的Hbase v2.25相关资源作为演示中使用的远程文件。这个选择并没有任何特殊之处,你可以将这个远程文件修改为任意你喜欢的远程文件。但是,如果你想要和我们一样使用WebUFS进行操作的话,最好还是选择一个Apache镜像源站点( e.g. [清华镜像源](https://mirrors.tuna.tsinghua.edu.cn/apache) ),因为基于目前WebUFS的实现,如果你选择其他更加复杂的网页作为WebUFS,你可能需要进行更多更复杂的配置。
|
||||
|
||||
**创建Dataset资源对象**
|
||||
```shell
|
||||
$ kubectl create -f dataset.yaml
|
||||
dataset.data.fluid.io/hbase created
|
||||
```
|
||||
|
||||
**查看Dataset资源对象状态**
|
||||
```shell
|
||||
$ kubectl get dataset hbase -o yaml
|
||||
apiVersion: data.fluid.io/v1alpha1
|
||||
kind: Dataset
|
||||
...
|
||||
status:
|
||||
conditions: []
|
||||
phase: NotBound
|
||||
```
|
||||
|
||||
如上所示,`status`中的`phase`属性值为`NotBound`,这意味着该`Dataset`资源对象目前还未与任何`AlluxioRuntime`资源对象绑定,接下来,我们将创建一个`AlluxioRuntime`资源对象。
|
||||
|
||||
**查看待创建的AlluxioRuntime资源对象**
|
||||
```shell
|
||||
$ cat<<EOF >runtime.yaml
|
||||
apiVersion: data.fluid.io/v1alpha1
|
||||
kind: AlluxioRuntime
|
||||
metadata:
|
||||
name: hbase
|
||||
spec:
|
||||
replicas: 2
|
||||
tieredstore:
|
||||
levels:
|
||||
- mediumtype: MEM
|
||||
path: /dev/shm
|
||||
quota: 2Gi
|
||||
high: "0.95"
|
||||
low: "0.7"
|
||||
storageType: Memory
|
||||
properties:
|
||||
alluxio.user.file.writetype.default: MUST_CACHE
|
||||
alluxio.master.journal.folder: /journal
|
||||
alluxio.master.journal.type: UFS
|
||||
alluxio.user.block.size.bytes.default: 256MB
|
||||
alluxio.user.streaming.reader.chunk.size.bytes: 256MB
|
||||
alluxio.user.local.reader.chunk.size.bytes: 256MB
|
||||
alluxio.worker.network.reader.buffer.size: 256MB
|
||||
alluxio.user.streaming.data.timeout: 300sec
|
||||
master:
|
||||
jvmOptions:
|
||||
- "-Xmx4G"
|
||||
worker:
|
||||
jvmOptions:
|
||||
- "-Xmx4G"
|
||||
fuse:
|
||||
jvmOptions:
|
||||
- "-Xmx4G "
|
||||
- "-Xms4G "
|
||||
# For now, only support local
|
||||
shortCircuitPolicy: local
|
||||
args:
|
||||
- fuse
|
||||
- --fuse-opts=direct_io,ro,max_read=131072,attr_timeout=7200,entry_timeout=7200,nonempty
|
||||
EOF
|
||||
```
|
||||
|
||||
**创建AlluxioRuntime资源对象**
|
||||
```shell
|
||||
$ kubectl create -f runtime.yaml
|
||||
alluxioruntime.data.fluid.io/hbase created
|
||||
```
|
||||
|
||||
`AlluxioRuntime`是另一个Fluid定义的CRD。一个`AlluxioRuntime`资源对象描述了在Kubernetes集群中运行一个Alluxio实例所需要的配置信息。
|
||||
|
||||
等待一段时间,让AlluxioRuntime资源对象中的各个组件得以顺利启动,你会看到类似以下状态:
|
||||
```shell
|
||||
$ kubectl get pod
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
hbase-fuse-hvxgh 1/1 Running 0 27s
|
||||
hbase-fuse-sjhxk 1/1 Running 0 27s
|
||||
hbase-master-0 2/2 Running 0 62s
|
||||
hbase-worker-92cln 2/2 Running 0 27s
|
||||
hbase-worker-rlb5w 2/2 Running 0 27s
|
||||
```
|
||||
|
||||
|
||||
|
||||
**再次查看Dataset资源对象状态**
|
||||
```shell
|
||||
$ kubectl get dataset hbase -o yaml
|
||||
...
|
||||
...
|
||||
status:
|
||||
cacheStates:
|
||||
cacheCapacity: 4GiB
|
||||
cached: 0B
|
||||
cachedPercentage: 0%
|
||||
conditions:
|
||||
- lastTransitionTime: "2020-07-29T08:23:44Z"
|
||||
lastUpdateTime: "2020-07-29T08:26:29Z"
|
||||
message: The ddc runtime is ready.
|
||||
reason: DatasetReady
|
||||
status: "True"
|
||||
type: Ready
|
||||
phase: Bound
|
||||
runtimes:
|
||||
- category: Accelerate
|
||||
name: hbase
|
||||
namespace: default
|
||||
type: alluxio
|
||||
ufsTotal: 443.5MiB
|
||||
```
|
||||
因为已经与一个成功启动的AlluxioRuntime绑定,该Dataset资源对象的`Status`得到了更新,此时`phase`属性值已经变为`Bound`状态。从上述状态中可以获知有关资源对象的基本信息
|
||||
|
||||
**查看AlluxioRuntime状态**
|
||||
```shell
|
||||
$ kubectl get alluxioruntime hbase -o yaml
|
||||
...
|
||||
...
|
||||
status:
|
||||
cacheStates:
|
||||
cacheCapacity: 4GiB
|
||||
cached: 0B
|
||||
cachedPercentage: 0%
|
||||
conditions:
|
||||
- lastProbeTime: "2020-07-29T08:23:05Z"
|
||||
lastTransitionTime: "2020-07-29T08:23:05Z"
|
||||
message: The master is initialized.
|
||||
reason: Master is initialized
|
||||
status: "True"
|
||||
type: MasterInitialized
|
||||
- lastProbeTime: "2020-07-29T08:23:40Z"
|
||||
lastTransitionTime: "2020-07-29T08:23:05Z"
|
||||
message: The master is ready.
|
||||
reason: Master is ready
|
||||
status: "True"
|
||||
type: MasterReady
|
||||
- lastProbeTime: "2020-07-29T08:23:20Z"
|
||||
lastTransitionTime: "2020-07-29T08:23:20Z"
|
||||
message: The workers are initialized.
|
||||
reason: Workers are initialized
|
||||
status: "True"
|
||||
type: WorkersInitialized
|
||||
- lastProbeTime: "2020-07-29T08:23:20Z"
|
||||
lastTransitionTime: "2020-07-29T08:23:20Z"
|
||||
message: The fuses are initialized.
|
||||
reason: Fuses are initialized
|
||||
status: "True"
|
||||
type: FusesInitialized
|
||||
- lastProbeTime: "2020-07-29T08:23:40Z"
|
||||
lastTransitionTime: "2020-07-29T08:23:40Z"
|
||||
message: The workers are partially ready.
|
||||
reason: Workers are ready
|
||||
status: "True"
|
||||
type: WorkersReady
|
||||
- lastProbeTime: "2020-07-29T08:23:40Z"
|
||||
lastTransitionTime: "2020-07-29T08:23:40Z"
|
||||
message: The fuses are ready.
|
||||
reason: Fuses are ready
|
||||
status: "True"
|
||||
type: FusesReady
|
||||
currentFuseNumberScheduled: 2
|
||||
currentMasterNumberScheduled: 1
|
||||
currentWorkerNumberScheduled: 2
|
||||
desiredFuseNumberScheduled: 2
|
||||
desiredMasterNumberScheduled: 1
|
||||
desiredWorkerNumberScheduled: 2
|
||||
fuseNumberAvailable: 2
|
||||
fuseNumberReady: 2
|
||||
fusePhase: Ready
|
||||
masterNumberReady: 1
|
||||
masterPhase: Ready
|
||||
valueFile: hbase-alluxio-values
|
||||
workerNumberAvailable: 2
|
||||
workerNumberReady: 2
|
||||
workerPhase: Ready
|
||||
```
|
||||
`AlluxioRuntime`资源对象的`status`中包含了更多更详细的信息
|
||||
|
||||
**查看与远程文件关联的PersistentVolume以及PersistentVolumeClaim**
|
||||
```shell
|
||||
$ kubectl get pv
|
||||
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
|
||||
hbase 100Gi RWX Retain Bound default/hbase 18m
|
||||
```
|
||||
|
||||
```shell
|
||||
$ kubectl get pvc
|
||||
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
|
||||
hbase Bound hbase 100Gi RWX 18m
|
||||
```
|
||||
`Dataset`资源对象准备完成后(即与Alluxio实例绑定后),与该资源对象关联的PV, PVC已经由Fluid生成,应用可以通过该PVC完成远程文件在Pod中的挂载,并通过挂载目录实现远程文件访问
|
||||
|
||||
## 远程文件访问
|
||||
|
||||
**查看待创建的应用**
|
||||
```shell
|
||||
$ cat<<EOF >nginx.yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: nginx
|
||||
spec:
|
||||
containers:
|
||||
- name: nginx
|
||||
image: nginx
|
||||
volumeMounts:
|
||||
- mountPath: /data
|
||||
name: hbase-vol
|
||||
volumes:
|
||||
- name: hbase-vol
|
||||
persistentVolumeClaim:
|
||||
claimName: hbase
|
||||
EOF
|
||||
```
|
||||
|
||||
**启动应用进行远程文件访问**
|
||||
```shell
|
||||
$ kubectl create -f nginx.yaml
|
||||
```
|
||||
|
||||
登录Nginx Pod:
|
||||
```shell
|
||||
$ kubectl exec -it nginx -- bash
|
||||
```
|
||||
|
||||
查看远程文件挂载情况:
|
||||
```shell
|
||||
$ ls -1 /data/hbase
|
||||
CHANGES.md
|
||||
RELEASENOTES.md
|
||||
api_compare_2.2.5RC0_to_2.2.4.html
|
||||
hbase-2.2.5-bin.tar.gz
|
||||
hbase-2.2.5-client-bin.tar.gz
|
||||
hbase-2.2.5-src.tar.gz
|
||||
```
|
||||
|
||||
```shell
|
||||
$ du -h /data/hbase/*
|
||||
174K /data/hbase/CHANGES.md
|
||||
106K /data/hbase/RELEASENOTES.md
|
||||
115K /data/hbase/api_compare_2.2.5RC0_to_2.2.4.html
|
||||
211M /data/hbase/hbase-2.2.5-bin.tar.gz
|
||||
200M /data/hbase/hbase-2.2.5-client-bin.tar.gz
|
||||
34M /data/hbase/hbase-2.2.5-src.tar.gz
|
||||
```
|
||||
|
||||
登出Nginx Pod:
|
||||
```shell
|
||||
$ exit
|
||||
```
|
||||
|
||||
正如你所见,WebUFS上所存储的全部文件(也就是hbase v2.2.5的相关文件)可以以和本地文件完全没有区别的方式存在于某个Pod中,并且可以被该Pod十分方便地访问
|
||||
|
||||
## 远程文件访问加速
|
||||
|
||||
为了演示在访问远程文件时,你能获得多大的加速效果,我们提供了一个测试作业的样例:
|
||||
|
||||
**查看待创建的测试作业**
|
||||
```shell
|
||||
$ cat<<EOF >app.yaml
|
||||
apiVersion: batch/v1
|
||||
kind: Job
|
||||
metadata:
|
||||
name: fluid-copy-test
|
||||
spec:
|
||||
template:
|
||||
spec:
|
||||
restartPolicy: OnFailure
|
||||
containers:
|
||||
- name: busybox
|
||||
image: busybox
|
||||
command: ["/bin/sh"]
|
||||
args: ["-c", "set -x; time cp -r /data/hbase ./"]
|
||||
volumeMounts:
|
||||
- mountPath: /data
|
||||
name: hbase-vol
|
||||
volumes:
|
||||
- name: hbase-vol
|
||||
persistentVolumeClaim:
|
||||
claimName: hbase
|
||||
EOF
|
||||
```
|
||||
|
||||
**启动测试作业**
|
||||
```shell
|
||||
$ kubectl create -f app.yaml
|
||||
job.batch/fluid-test created
|
||||
```
|
||||
|
||||
该测试程序会执行`time cp -r /data/hbase ./`的shell命令,其中`/data/hbase`是远程文件在Pod中挂载的位置,该命令完成后会在终端显示命令执行的时长:
|
||||
|
||||
```shell
|
||||
kubectl logs fluid-copy-test-h59w9
|
||||
+ time cp -r /data/hbase ./
|
||||
real 1m 2.74s
|
||||
user 0m 0.00s
|
||||
sys 0m 1.35s
|
||||
```
|
||||
可见,第一次远程文件的读取耗费了接近63s的时间。当然,你可能会觉得这并没有你预期的那么快,但是:
|
||||
|
||||
**再次启动测试作业**
|
||||
```shell
|
||||
$ kubectl delete -f app.yaml
|
||||
$ kubectl create -f app.yaml
|
||||
```
|
||||
|
||||
由于远程文件已经被缓存,此次测试作业能够迅速完成:
|
||||
```shell
|
||||
$ kubectl logs fluid-copy-test-d9h2x
|
||||
+ time cp -r /data/hbase ./
|
||||
real 0m 2.94s
|
||||
user 0m 0.00s
|
||||
sys 0m 1.27s
|
||||
```
|
||||
同样的文件访问操作仅耗费了3s
|
||||
|
||||
这种大幅度的加速效果归因于Alluxio所提供的强大的缓存能力,这种缓存能力意味着,只要你访问某个远程文件一次,该文件就会被缓存在Alluxio中,你的所有接下来的重复访问都不再需要进行远程文件读取,而是从Alluxio中直接获取数据,因此对于数据的访问加速也就不难解释了。
|
||||
|
||||
> 注意: 上述文件的访问速度与示例运行环境的网络条件有关,如果文件访问速度过慢,请更换更小的远程文件尝试
|
||||
|
||||
## 环境清理
|
||||
```shell
|
||||
$ kubectl delete -f .
|
||||
```
|
||||
|
|
@ -1,9 +1,11 @@
|
|||
# 示例 - 数据缓存亲和性调度
|
||||
Fluid提供了针对数据缓存的调度机制,这意味着用户能够像管理Pod一样管理数据缓存在Kubernetes集群中的存放位置,这些存放位置同样也会间接地影响相关应用的调度策略。本文档通过一个简单的示例来演示上述功能特性,该示例将会尝试将远程文件的数据缓存分布在指定的集群结点之上,并启动应用使用这些数据缓存
|
||||
在Fluid中,`Dataset`资源对象中所定义的远程文件是可被调度的,这意味着你能够像管理你的Pod一样管理远程文件缓存在Kubernetes集群上的存放位置。另外,Fluid同样支持对于应用的数据缓存亲和性调度,这种调度方式将应用(e.g. 数据分析任务、机器学习任务等)与所需要的数据缓存放置在一起,以尽可能地减少额外的开销。
|
||||
|
||||
本文档将向你简单地展示上述特性
|
||||
|
||||
## 前提条件
|
||||
在运行该示例之前,请参考[安装文档](../installation_cn/README.md)完成安装,并检查Fluid各组件正常运行:
|
||||
```shell script
|
||||
在运行该示例之前,请参考[安装文档](../userguide/install.md)完成安装,并检查Fluid各组件正常运行:
|
||||
```shell
|
||||
$ kubectl get pod -n fluid-system
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
controller-manager-7fd6457ccf-jnkvn 1/1 Running 0 60s
|
||||
|
@ -11,9 +13,17 @@ csi-nodeplugin-fluid-6rhpt 2/2 Running 0 60s
|
|||
csi-nodeplugin-fluid-6zwgl 2/2 Running 0 60s
|
||||
```
|
||||
|
||||
通常来说,你会看到一个名为“controller-manager”的Pod和多个名为“csi-nodeplugin”的Pod正在运行。其中,“csi-nodeplugin”这些Pod的数量取决于你的Kubernetes集群中结点的数量。
|
||||
|
||||
## 新建工作环境
|
||||
```shell
|
||||
$ mkdir <any-path>/co-locality
|
||||
$ cd <any-path>/co-locality
|
||||
```
|
||||
|
||||
## 运行示例
|
||||
**查看全部结点**
|
||||
```shell script
|
||||
```shell
|
||||
$ kubectl get nodes
|
||||
NAME STATUS ROLES AGE VERSION
|
||||
cn-beijing.192.168.1.146 Ready <none> 7d14h v1.16.9-aliyun.1
|
||||
|
@ -21,22 +31,23 @@ cn-beijing.192.168.1.147 Ready <none> 7d14h v1.16.9-aliyun.1
|
|||
```
|
||||
|
||||
**使用标签标识结点**
|
||||
```shell script
|
||||
```shell
|
||||
$ kubectl label nodes cn-beijing.192.168.1.146 hbase-cache=true
|
||||
```
|
||||
在接下来的步骤中,我们将使用`NodeSelector`来管理集群中存放数据的位置,所以在这里标记期望的结点
|
||||
|
||||
**再次查看结点**
|
||||
```shell script
|
||||
```shell
|
||||
$ kubectl get node -L hbase-cache
|
||||
NAME STATUS ROLES AGE VERSION HBASE-CACHE
|
||||
cn-beijing.192.168.1.146 Ready <none> 7d14h v1.16.9-aliyun.1 true
|
||||
cn-beijing.192.168.1.147 Ready <none> 7d14h v1.16.9-aliyun.1
|
||||
```
|
||||
目前,在全部2个结点中,仅有一个结点添加了`hbase-cache=true`的标签,接下来将使用该标签作为依据进行数据缓存的调度
|
||||
目前,在全部2个结点中,仅有一个结点添加了`hbase-cache=true`的标签,接下来,我们希望数据缓存仅会被放置在该结点之上
|
||||
|
||||
**检查待创建的Dataset资源对象**
|
||||
```shell script
|
||||
$ cat samples/co-locality/dataset.yaml
|
||||
```shell
|
||||
$ cat<<EOF >dataset.yaml
|
||||
apiVersion: data.fluid.io/v1alpha1
|
||||
kind: Dataset
|
||||
metadata:
|
||||
|
@ -53,41 +64,68 @@ spec:
|
|||
operator: In
|
||||
values:
|
||||
- "true"
|
||||
EOF
|
||||
```
|
||||
在该Dataset资源对象的`spec.nodeAffinity`属性中定义了亲和性调度的相关配置,该配置要求将数据缓存放置在具有`hbase-cache=true`标签的结点之上
|
||||
在该`Dataset`资源对象的`spec`属性中,我们定义了一个`nodeSelectorTerm`的子属性,该子属性要求数据缓存必须被放置在具有`hbase-cache=true`标签的结点之上
|
||||
|
||||
**创建Dataset资源对象**
|
||||
```shell script
|
||||
$ kubectl create -f samples/co-locality/dataset.yaml
|
||||
```shell
|
||||
$ kubectl create -f dataset.yaml
|
||||
dataset.data.fluid.io/hbase created
|
||||
```
|
||||
|
||||
**检查待创建的AlluxioRuntime资源对象**
|
||||
```shell script
|
||||
cat samples/co-locality/runtime.yaml
|
||||
```shell
|
||||
$ cat<<EOF >runtime.yaml
|
||||
apiVersion: data.fluid.io/v1alpha1
|
||||
kind: AlluxioRuntime
|
||||
metadata:
|
||||
name: hbase
|
||||
spec:
|
||||
...
|
||||
replicas: 2
|
||||
tieredstore:
|
||||
levels:
|
||||
- mediumtype: MEM
|
||||
path: /dev/shm
|
||||
quota: 2Gi
|
||||
high: "0.95"
|
||||
low: "0.7"
|
||||
storageType: Memory
|
||||
properties:
|
||||
alluxio.user.file.writetype.default: MUST_CACHE
|
||||
alluxio.master.journal.folder: /journal
|
||||
alluxio.master.journal.type: UFS
|
||||
alluxio.user.block.size.bytes.default: 256MB
|
||||
alluxio.user.streaming.reader.chunk.size.bytes: 256MB
|
||||
alluxio.user.local.reader.chunk.size.bytes: 256MB
|
||||
alluxio.worker.network.reader.buffer.size: 256MB
|
||||
alluxio.user.streaming.data.timeout: 300sec
|
||||
master:
|
||||
replicas: 1
|
||||
...
|
||||
jvmOptions:
|
||||
- "-Xmx4G"
|
||||
worker:
|
||||
...
|
||||
jvmOptions:
|
||||
- "-Xmx4G"
|
||||
fuse:
|
||||
image: alluxio/alluxio-fuse
|
||||
imageTag: "2.3.0-SNAPSHOT"
|
||||
imagePullPolicy: Always
|
||||
...
|
||||
jvmOptions:
|
||||
- "-Xmx4G "
|
||||
- "-Xms4G "
|
||||
- "-XX:+UseG1GC "
|
||||
- "-XX:MaxDirectMemorySize=4g "
|
||||
- "-XX:+UnlockExperimentalVMOptions "
|
||||
- "-XX:ActiveProcessorCount=8 "
|
||||
# For now, only support local
|
||||
shortCircuitPolicy: local
|
||||
args:
|
||||
- fuse
|
||||
- --fuse-opts=direct_io,ro,max_read=131072,attr_timeout=7200,entry_timeout=7200,nonempty
|
||||
EOF
|
||||
```
|
||||
该配置文件表明希望创建一个AlluxioRuntime资源,其中包含1个Alluxio Master和2个Alluxio Worker,并且对于任意一个Alluxio Worker均会启动一个Alluxio Fuse组件与其协同工作
|
||||
该配置文件片段中,包含了许多与Alluxio相关的配置信息,这些信息将被Fluid用来启动一个Alluxio实例。通过创建这么一个`AlluxioRuntime`资源对象,Fluid将会启动一个包含1个Alluxio Master和2个Alluxio Worker的Alluxio实例
|
||||
|
||||
**创建AlluxioRuntime资源并查看状态**
|
||||
```shell script
|
||||
$ kubectl create -f samples/co-locality/runtime.yaml
|
||||
```shell
|
||||
$ kubectl create -f runtime.yaml
|
||||
alluxioruntime.data.fluid.io/hbase created
|
||||
|
||||
$ kubectl get pod -o wide
|
||||
|
@ -96,10 +134,10 @@ hbase-fuse-42csf 1/1 Running 0 104s 192.168.1.146 cn-beij
|
|||
hbase-master-0 2/2 Running 0 3m3s 192.168.1.147 cn-beijing.192.168.1.147 <none> <none>
|
||||
hbase-worker-l62m4 2/2 Running 0 104s 192.168.1.146 cn-beijing.192.168.1.146 <none> <none>
|
||||
```
|
||||
仅有一组Alluxio Worker/Alluxio Fuse成功启动,并且均运行在具有指定标签的结点(即`cn-beijing.192.168.1.146`)之上。
|
||||
在此处可以看到,尽管我们期望看见两个AlluxioWorker被启动,但仅有一组Alluxio Worker成功启动,并且运行在具有指定标签(即`hbase-cache=true`)的结点之上。
|
||||
|
||||
**检查AlluxioRuntime状态**
|
||||
```shell script
|
||||
```shell
|
||||
$ kubectl get alluxioruntime hbase -o yaml
|
||||
...
|
||||
status:
|
||||
|
@ -125,16 +163,31 @@ status:
|
|||
workerNumberReady: 1
|
||||
workerPhase: PartialReady
|
||||
```
|
||||
与预想一致,无论是Alluxio Worker还是Alluxio Fuse,其状态均为PartialReady,这是另一个结点无法满足Dataset资源对象的亲和性要求所致
|
||||
与预想一致,`workerPhase`状态此时为`PartialReady`,并且`currentWorkerNumberScheduled: 1`小于`desiredWorkerNumberScheduled: 2`
|
||||
|
||||
**查看待创建的应用**
|
||||
```shell script
|
||||
$ cat samples/co-locality/app.yaml
|
||||
...
|
||||
|
||||
我们提供了一个样例应用来演示Fluid是如何进行数据缓存亲和性调度的,首先查看该应用:
|
||||
|
||||
```shell
|
||||
$ cat<<EOF >app.yaml
|
||||
apiVersion: apps/v1beta1
|
||||
kind: StatefulSet
|
||||
metadata:
|
||||
name: nginx
|
||||
labels:
|
||||
app: nginx
|
||||
spec:
|
||||
...
|
||||
replicas: 2
|
||||
serviceName: "nginx"
|
||||
podManagementPolicy: "Parallel"
|
||||
selector: # define how the deployment finds the pods it manages
|
||||
matchLabels:
|
||||
app: nginx
|
||||
template: # define the pods specifications
|
||||
...
|
||||
metadata:
|
||||
labels:
|
||||
app: nginx
|
||||
spec:
|
||||
affinity:
|
||||
# prevent two Nginx Pod from being scheduled at the same Node
|
||||
|
@ -148,28 +201,38 @@ spec:
|
|||
values:
|
||||
- nginx
|
||||
topologyKey: "kubernetes.io/hostname"
|
||||
...
|
||||
containers:
|
||||
- name: nginx
|
||||
image: nginx
|
||||
volumeMounts:
|
||||
- mountPath: /data
|
||||
name: hbase-vol
|
||||
volumes:
|
||||
- name: hbase-vol
|
||||
persistentVolumeClaim:
|
||||
claimName: hbase
|
||||
EOF
|
||||
```
|
||||
该应用定义了`PodAntiAffinity`的相关配置,这些配置将确保属于相同应用的多个Pod不会被调度到同一结点,通过这样的配置,能够更加清楚地演示数据缓存的调度对使用该数据缓存的应用的影响
|
||||
其中的`podAntiAffinity`可能会让人有一点疑惑,关于这个属性的解释如下:`podAntiAffinity`属性将会确保属于相同应用的多个Pod被分散到多个不同的结点,这样的配置能够让我们更加清晰的观察到Fluid的数据缓存亲和性调度是怎么进行的。所以简单来说,这只是一个专用于演示的属性,你不必太过在意它
|
||||
|
||||
**运行应用**
|
||||
|
||||
```shell script
|
||||
$ kubectl create -f samples/co-locality/app.yaml
|
||||
```shell
|
||||
$ kubectl create -f app.yaml
|
||||
statefulset.apps/nginx created
|
||||
```
|
||||
|
||||
**查看应用运行状态**
|
||||
```shell script
|
||||
kubectl get pod -o wide -l app=nginx
|
||||
```shell
|
||||
$ kubectl get pod -o wide -l app=nginx
|
||||
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
|
||||
nginx-0 1/1 Running 0 2m5s 192.168.1.146 cn-beijing.192.168.1.146 <none> <none>
|
||||
nginx-1 0/1 Pending 0 2m5s <none> <none> <none> <none>
|
||||
```
|
||||
仅有一个Nginx Pod成功启动,并且运行在具有指定标签的结点上
|
||||
仅有一个Nginx Pod成功启动,并且运行在满足`nodeSelectorTerm`的结点之上
|
||||
|
||||
**查看应用启动失败原因**
|
||||
```shell script
|
||||
```shell
|
||||
$ kubectl describe pod nginx-1
|
||||
...
|
||||
Events:
|
||||
|
@ -178,14 +241,14 @@ Events:
|
|||
Warning FailedScheduling <unknown> default-scheduler 0/2 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy existing pods anti-affinity rules, 1 node(s) had volume node affinity conflict.
|
||||
Warning FailedScheduling <unknown> default-scheduler 0/2 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy existing pods anti-affinity rules, 1 node(s) had volume node affinity conflict.
|
||||
```
|
||||
一方面,由于`samples/co-locality/app.yaml`中对于`PodAntiAffinity`的配置,使得两个Nginx Pod无法被调度到同一节点。**另一方面,由于目前满足Dataset资源对象亲和性要求的结点仅有一个,因此仅有一个Nginx Pod被成功调度**
|
||||
如上所示,一方面,为了满足`PodAntiAffinity`属性的要求,使得两个Nginx Pod无法被调度到同一节点。另一方面,由于目前满足Dataset资源对象亲和性要求的结点仅有一个,因此仅有一个Nginx Pod被成功调度
|
||||
|
||||
**为结点添加标签**
|
||||
```shell script
|
||||
kubectl label node cn-beijing.192.168.1.147 hbase-cache=true
|
||||
**为另一个结点添加标签**
|
||||
```shell
|
||||
$ kubectl label node cn-beijing.192.168.1.147 hbase-cache=true
|
||||
```
|
||||
现在两个结点都具有相同的标签了,此时重新检查各个组件的运行状态
|
||||
```shell script
|
||||
现在全部两个结点都具有相同的标签了,此时重新检查各个组件的运行状态
|
||||
```shell
|
||||
$ kubectl get pod -o wide
|
||||
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
|
||||
hbase-fuse-42csf 1/1 Running 0 44m 192.168.1.146 cn-beijing.192.168.1.146 <none> <none>
|
||||
|
@ -194,22 +257,24 @@ hbase-master-0 2/2 Running 0 46m 192.168.1.147 cn-beiji
|
|||
hbase-worker-l62m4 2/2 Running 0 44m 192.168.1.146 cn-beijing.192.168.1.146 <none> <none>
|
||||
hbase-worker-rvncl 2/2 Running 0 10m 192.168.1.147 cn-beijing.192.168.1.147 <none> <none>
|
||||
```
|
||||
两个Alluxio Worker和Alluxio Fuse都成功启动,并且分别运行在两个结点上
|
||||
两个Alluxio Worker都成功启动,并且分别运行在两个结点上
|
||||
|
||||
```shell script
|
||||
```shell
|
||||
$ kubectl get pod -l app=nginx -o wide
|
||||
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
|
||||
nginx-0 1/1 Running 0 21m 192.168.1.146 cn-beijing.192.168.1.146 <none> <none>
|
||||
nginx-1 1/1 Running 0 21m 192.168.1.147 cn-beijing.192.168.1.147 <none> <none>
|
||||
```
|
||||
两个Nginx Pod均成功启动,并且分别运行在两个结点上
|
||||
另一个nginx Pod不再处于`Pending`状态,已经成功启动并运行在另一个结点上
|
||||
|
||||
可见,可调度的数据缓存以及对应用的数据缓存亲和性调度都是被Fluid所支持的特性。在绝大多数情况下,这两个特性协同工作,为用户提供了一种更灵活更便捷的方式在Kubernetes集群中管理数据。
|
||||
|
||||
可见,Fluid支持数据缓存的调度策略,这些调度策略为用户提供了更加灵活的数据缓存管理能力
|
||||
|
||||
## 环境清理
|
||||
```shell script
|
||||
kubectl delete -f samples/co-locality
|
||||
```shell
|
||||
$ kubectl delete -f .
|
||||
|
||||
kubectl label node cn-beijing.192.168.1.146 hbase-cache-
|
||||
kubectl label node cn-beijing.192.168.1.147 hbase-cache-
|
||||
$ kubectl label node cn-beijing.192.168.1.146 hbase-cache-
|
||||
$ kubectl label node cn-beijing.192.168.1.147 hbase-cache-
|
||||
```
|
|
@ -15,7 +15,7 @@ arena是一个方便数据科学家运行和监视机器学习任务的CLI,安
|
|||
|
||||
### 部署fluid
|
||||
|
||||
请参照[fluid部署教程](../installation_cn/README.md)在kubernetes集群上安装fluid。
|
||||
请参照[fluid部署教程](../userguide/install.md)在kubernetes集群上安装fluid。
|
||||
|
||||
### 创建dataset
|
||||
|
|
@ -0,0 +1,296 @@
|
|||
# 用Fluid加速机器学习训练
|
||||
|
||||
本文介绍如何使用Fluid部署[阿里云OSS](https://cn.aliyun.com/product/oss)云端[ImageNet](http://www.image-net.org/)数据集到kubernetes集群,并使用[arena](https://github.com/kubeflow/arena)在此数据集上训练ResNet-50模型。本文以四机八卡测试环境为例。
|
||||
|
||||
## 前提条件
|
||||
|
||||
- [Fluid](https://github.com/fluid-cloudnative/fluid) (version >= 0.1.0)
|
||||
- [arena](https://github.com/kubeflow/arena)(version >= 0.4.0)
|
||||
|
||||
> **注意**:
|
||||
>
|
||||
> 1. 本文要求在Kubernetes集群中已安装好Fluid,如果您还没部署Fluid,请参考[Fluid安装手册](../userguide/install.md)在您的Kubernetes集群上安装Fluid。
|
||||
>
|
||||
> 2. `arena`是一个方便数据科学家运行和监视机器学习任务的CLI, 本文使用`arena`提交机器学习任务,安装教程可参考[arena安装教程](https://github.com/kubeflow/arena/blob/master/docs/installation/INSTALL_FROM_BINARY.md)。
|
||||
|
||||
## 用Fluid部署云端数据集
|
||||
|
||||
### 创建Dataset和Runtime
|
||||
|
||||
如下的`dataset.yaml`文件中定义了一个`Dataset`和`Runtime`,并`---`符号将它们的定义分割。
|
||||
|
||||
数据集存储在[阿里云OSS](https://cn.aliyun.com/product/oss),为保证Alluxio能够成功挂载OSS上的数据集,请确保`dataset.yaml`文件中设置了正确的`mountPoint`、`fs.oss.accessKeyId`、`fs.oss.accessKeySecret`和`fs.oss.endpoint`。
|
||||
|
||||
> 你可以参考Alluxio的官方文档示例[Aliyun Object Storage Service](https://docs.alluxio.io/os/user/stable/en/ufs/OSS.html),了解更多在Alluxio中使用OSS的例子。
|
||||
|
||||
本文档以四机八卡为例,所以在`dataset.yaml`中设置`spec.replicas=4`。此外,`dataset.yaml`文件还根据我们的测试经验设置了许多参数以优化Alluxio的IO性能(包括Alluxio、Fuse和JVM等层次),您可以自行根据机器配置和任务需求调整参数。
|
||||
|
||||
```shell
|
||||
$ cat << EOF >> dataset.yaml
|
||||
apiVersion: data.fluid.io/v1alpha1
|
||||
kind: Dataset
|
||||
metadata:
|
||||
name: imagenet
|
||||
spec:
|
||||
mounts:
|
||||
- mountPoint: oss://<OSS_BUCKET>/<OSS_DIRECTORY>/
|
||||
name: imagenet
|
||||
options:
|
||||
fs.oss.accessKeyId: <OSS_ACCESS_KEY_ID>
|
||||
fs.oss.accessKeySecret: <OSS_ACCESS_KEY_SECRET>
|
||||
fs.oss.endpoint: <OSS_ENDPOINT>
|
||||
---
|
||||
apiVersion: data.fluid.io/v1alpha1
|
||||
kind: AlluxioRuntime
|
||||
metadata:
|
||||
name: imagenet
|
||||
spec:
|
||||
replicas: 4
|
||||
data:
|
||||
replicas: 1
|
||||
# alluxioVersion:
|
||||
# image: registry.cn-huhehaote.aliyuncs.com/alluxio/alluxio
|
||||
# imageTag: "2.3.0-SNAPSHOT-bbce37a"
|
||||
# imagePullPolicy: Always
|
||||
tieredstore:
|
||||
levels:
|
||||
- mediumtype: SSD
|
||||
path: /var/lib/docker/alluxio
|
||||
quota: 50Gi
|
||||
high: "0.99"
|
||||
low: "0.8"
|
||||
properties:
|
||||
# alluxio fuse
|
||||
alluxio.fuse.jnifuse.enabled: "true"
|
||||
alluxio.fuse.debug.enabled: "false"
|
||||
alluxio.fuse.cached.paths.max: "1000000"
|
||||
alluxio.fuse.logging.threshold: 1000ms
|
||||
# alluxio master
|
||||
alluxio.master.metastore: ROCKS
|
||||
alluxio.master.journal.folder: /journal
|
||||
alluxio.master.journal.type: UFS
|
||||
alluxio.master.metastore.inode.cache.max.size: "10000000"
|
||||
alluxio.master.journal.log.size.bytes.max: 500MB
|
||||
alluxio.master.metadata.sync.concurrency.level: "128"
|
||||
alluxio.master.metadata.sync.executor.pool.size: "128"
|
||||
alluxio.master.metadata.sync.ufs.prefetch.pool.size: "128"
|
||||
alluxio.master.rpc.executor.max.pool.size: "1024"
|
||||
alluxio.master.rpc.executor.core.pool.size: "128"
|
||||
# alluxio worker
|
||||
alluxio.worker.allocator.class: alluxio.worker.block.allocator.GreedyAllocator
|
||||
alluxio.worker.network.reader.buffer.size: 32MB
|
||||
alluxio.worker.file.buffer.size: 320MB
|
||||
alluxio.worker.block.master.client.pool.size: "1024"
|
||||
# alluxio user
|
||||
alluxio.user.block.worker.client.pool.min: "512"
|
||||
alluxio.user.file.writetype.default: MUST_CACHE
|
||||
alluxio.user.ufs.block.read.location.policy: alluxio.client.block.policy.LocalFirstAvoidEvictionPolicy
|
||||
alluxio.user.block.write.location.policy.class: alluxio.client.block.policy.LocalFirstAvoidEvictionPolicy
|
||||
alluxio.user.block.size.bytes.default: 16MB
|
||||
alluxio.user.streaming.reader.chunk.size.bytes: 32MB
|
||||
alluxio.user.local.reader.chunk.size.bytes: 32MB
|
||||
alluxio.user.metrics.collection.enabled: "false"
|
||||
alluxio.user.update.file.accesstime.disabled: "true"
|
||||
alluxio.user.file.passive.cache.enabled: "false"
|
||||
alluxio.user.block.avoid.eviction.policy.reserved.size.bytes: 2GB
|
||||
alluxio.user.block.master.client.pool.gc.threshold: 2day
|
||||
alluxio.user.file.master.client.threads: "1024"
|
||||
alluxio.user.block.master.client.threads: "1024"
|
||||
alluxio.user.file.readtype.default: CACHE
|
||||
alluxio.user.metadata.cache.enabled: "true"
|
||||
alluxio.user.metadata.cache.expiration.time: 2day
|
||||
alluxio.user.metadata.cache.max.size: "1000000"
|
||||
alluxio.user.direct.memory.io.enabled: "true"
|
||||
alluxio.user.worker.list.refresh.interval: 2min
|
||||
alluxio.user.logging.threshold: 1000ms
|
||||
# other alluxio configurations
|
||||
alluxio.web.ui.enabled: "false"
|
||||
alluxio.security.stale.channel.purge.interval: 365d
|
||||
alluxio.job.worker.threadpool.size: "164"
|
||||
master:
|
||||
jvmOptions:
|
||||
- "-Xmx6G"
|
||||
- "-XX:+UnlockExperimentalVMOptions"
|
||||
- "-XX:ActiveProcessorCount=8"
|
||||
worker:
|
||||
jvmOptions:
|
||||
- "-Xmx12G"
|
||||
- "-XX:+UnlockExperimentalVMOptions"
|
||||
- "-XX:MaxDirectMemorySize=32g"
|
||||
- "-XX:ActiveProcessorCount=8"
|
||||
resources:
|
||||
limits:
|
||||
cpu: 8
|
||||
fuse:
|
||||
# image: registry.cn-huhehaote.aliyuncs.com/alluxio/alluxio-fuse
|
||||
# imageTag: "2.3.0-SNAPSHOT-bbce37a"
|
||||
# imagePullPolicy: Always
|
||||
env:
|
||||
MAX_IDLE_THREADS: "32"
|
||||
jvmOptions:
|
||||
- "-Xmx16G"
|
||||
- "-Xms16G"
|
||||
- "-XX:+UseG1GC"
|
||||
- "-XX:MaxDirectMemorySize=32g"
|
||||
- "-XX:+UnlockExperimentalVMOptions"
|
||||
- "-XX:ActiveProcessorCount=24"
|
||||
resources:
|
||||
limits:
|
||||
cpu: 16
|
||||
shortCircuitPolicy: local
|
||||
args:
|
||||
- fuse
|
||||
- --fuse-opts=kernel_cache,ro,max_read=131072,attr_timeout=7200,entry_timeout=7200,nonempty
|
||||
EOF
|
||||
```
|
||||
|
||||
创建Dataset和Runtime:
|
||||
|
||||
```shell
|
||||
$ kubectl create -f dataset.yaml
|
||||
```
|
||||
|
||||
检查Alluxio Runtime,可以看到`1`个Master,`4`个Worker和`4`个Fuse已成功部署:
|
||||
|
||||
```shell
|
||||
$ kubectl describe alluxioruntime imagenet
|
||||
Name: imagenet
|
||||
Namespace: default
|
||||
Labels: <none>
|
||||
Annotations: <none>
|
||||
API Version: data.fluid.io/v1alpha1
|
||||
Kind: AlluxioRuntime
|
||||
Metadata:
|
||||
# more metadata
|
||||
Spec:
|
||||
# more spec
|
||||
Status:
|
||||
Cache States:
|
||||
Cache Capacity: 200GiB
|
||||
Cached: 0B
|
||||
Cached Percentage: 0%
|
||||
Conditions:
|
||||
# more conditions
|
||||
Current Fuse Number Scheduled: 4
|
||||
Current Master Number Scheduled: 1
|
||||
Current Worker Number Scheduled: 4
|
||||
Desired Fuse Number Scheduled: 4
|
||||
Desired Master Number Scheduled: 1
|
||||
Desired Worker Number Scheduled: 4
|
||||
Fuse Number Available: 4
|
||||
Fuse Numb Status: True
|
||||
Type: Ready
|
||||
Phase: Bound
|
||||
Runtimes:
|
||||
Category: Accelerate
|
||||
Name: imagenet
|
||||
Namespace: default
|
||||
Type: alluxio
|
||||
Ufs Total: 143.7GiB
|
||||
Events: <none>
|
||||
```
|
||||
|
||||
同时,检查到Dataset也绑定到Alluxio Runtime:
|
||||
|
||||
```shell
|
||||
$ kubectl describe dataset
|
||||
Name: imagenet
|
||||
Namespace: default
|
||||
Labels: <none>
|
||||
Annotations: <none>
|
||||
API Version: data.fluid.io/v1alpha1
|
||||
Kind: Dataset
|
||||
Metadata:
|
||||
# more metadata
|
||||
Spec:
|
||||
# more spec
|
||||
Status:
|
||||
Cache States:
|
||||
Cache Capacity: 200GiB
|
||||
Cached: 0B
|
||||
Cached Percentage: 0%
|
||||
Conditions:
|
||||
Last Transition Time: 2020-08-18T11:01:09Z
|
||||
Last Update Time: 2020-08-18T11:02:48Z
|
||||
Message: The ddc runtime is ready.
|
||||
Reason: DatasetReady
|
||||
Status: True
|
||||
Type: Ready
|
||||
Phase: Bound
|
||||
Runtimes:
|
||||
Category: Accelerate
|
||||
Name: imagenet
|
||||
Namespace: default
|
||||
Type: alluxio
|
||||
Ufs Total: 143.7GiB
|
||||
Events: <none>
|
||||
```
|
||||
|
||||
检查pv和pvc,名为imagenet的pv和pvc被成功创建:
|
||||
|
||||
```shell
|
||||
$ kubectl get pv,pvc
|
||||
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
|
||||
persistentvolume/imagenet 100Gi RWX Retain Bound default/imagenet 7m11s
|
||||
|
||||
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
|
||||
persistentvolumeclaim/imagenet Bound imagenet 100Gi RWX 7m11s
|
||||
```
|
||||
|
||||
至此,OSS云端数据集已成功部署到kubernetes集群中。
|
||||
|
||||
## 示例:使用arena提交深度学习任务
|
||||
|
||||
`arena`提供了便捷的方式帮助用户提交和监控机器学习任务。在本文中,我们使用`arena`简化机器学习任务的部署流程。
|
||||
|
||||
如果您已经安装`arena`,并且云端数据集已成功部署到本地集群中,只需要简单执行以下命令便能提交ResNet50四机八卡训练任务:
|
||||
|
||||
```shell
|
||||
arena submit mpi \
|
||||
--name horovod-resnet50-v2-4x8-fluid \
|
||||
--gpus=8 \
|
||||
--workers=4 \
|
||||
--working-dir=/horovod-demo/tensorflow-demo/ \
|
||||
--data imagenet:/data \
|
||||
-e DATA_DIR=/data/imagenet \
|
||||
-e num_batch=1000 \
|
||||
-e datasets_num_private_threads=8 \
|
||||
--image=registry.cn-hangzhou.aliyuncs.com/tensorflow-samples/horovod-benchmark-dawnbench-v2:0.18.1-tf1.14.0-torch1.2.0-mxnet1.5.0-py3.6 \
|
||||
./launch-example.sh 4 8
|
||||
```
|
||||
|
||||
arena参数说明:
|
||||
|
||||
- `--name`:指定job的名字
|
||||
- `--workers`:指定参与训练的节点(worker)数
|
||||
- `--gpus`:指定每个worker使用的GPU数
|
||||
- `--working-dir`:指定工作路径
|
||||
- `--data`:挂载Volume `imagenet`到worker的`/data`目录
|
||||
- `-e DATA_DIR`:指定数据集位置
|
||||
- `./launch-example.sh 4 8`:运行脚本启动四机八卡测试
|
||||
|
||||
检查任务是否正常执行:
|
||||
|
||||
```shell
|
||||
$ arena get horovod-resnet50-v2-4x8-fluid -e
|
||||
STATUS: RUNNING
|
||||
NAMESPACE: default
|
||||
PRIORITY: N/A
|
||||
TRAINING DURATION: 16s
|
||||
|
||||
NAME STATUS TRAINER AGE INSTANCE NODE
|
||||
horovod-resnet50-v2-4x8-fluid RUNNING MPIJOB 16s horovod-resnet50-v2-4x8-fluid-launcher-czlfn 192.168.1.21
|
||||
horovod-resnet50-v2-4x8-fluid RUNNING MPIJOB 16s horovod-resnet50-v2-4x8-fluid-worker-0 192.168.1.16
|
||||
horovod-resnet50-v2-4x8-fluid RUNNING MPIJOB 16s horovod-resnet50-v2-4x8-fluid-worker-1 192.168.1.21
|
||||
horovod-resnet50-v2-4x8-fluid RUNNING MPIJOB 16s horovod-resnet50-v2-4x8-fluid-worker-2 192.168.1.25
|
||||
horovod-resnet50-v2-4x8-fluid RUNNING MPIJOB 16s horovod-resnet50-v2-4x8-fluid-worker-3 192.168.3.29
|
||||
```
|
||||
|
||||
如果您看到`4`个处于`RUNNING`状态的worker,说明您已经成功启动训练。
|
||||
|
||||
如果您想知道训练进行到哪一步了,请检查arena日志:
|
||||
|
||||
```shell
|
||||
$ arena logs --tail 100 -f horovod-resnet50-v2-4x8-fluid
|
||||
```
|
||||
|
|
@ -0,0 +1 @@
|
|||
# warm up
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
## 脚本介绍
|
||||
|
||||
fluid提供了shell脚本[diagnose-fluid.sh](../../tools/diagnose-fluid.sh)帮助用户快速收集fluid系统和Runtime容器的日志信息。
|
||||
fluid提供了shell脚本[diagnose-fluid.sh](../../../tools/diagnose-fluid.sh)帮助用户快速收集fluid系统和Runtime容器的日志信息。
|
||||
|
||||
## 如何使用
|
||||
|
|
@ -0,0 +1,170 @@
|
|||
# fluid 快速上手
|
||||
本文档介绍了如何创建 Kubernetes 集群环境,通过 Helm 完成 fluid 安装部署,并使用 fluid 创建数据集。
|
||||
|
||||
## 创建 Kubernetes 集群
|
||||
fluid 需要 Kubernetes 环境,根据你的使用经历选择最适合你的方案:
|
||||
|
||||
- 你已经有了一个 Kubernetes 环境,并满足 Kubernetes :版本>=1.14,可以直接[部署fluid](#部署fluid)
|
||||
- 你之前没有使用过 Kubernetes,可以使用 Minikube 创建 Kubernetes 集群.
|
||||
[Minikube](https://kubernetes.io/docs/setup/minikube/)可以在虚拟机中创建一个 Kubernetes 集群,可在 macOS, Linux 和 Windows 上运行。
|
||||
|
||||
请确保满足以下要求:
|
||||
- [Minikube](https://kubernetes.io/docs/tasks/tools/install-minikube/) :版本 1.0.0+
|
||||
- [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl) : 版本 1.14+
|
||||
|
||||
安装好Minikube之后:
|
||||
```shell
|
||||
minikube start
|
||||
```
|
||||
如果安装成功的话,会出现类似的提示信息:
|
||||
```shell
|
||||
Darwin 10.14.5 上的 minikube v1.12.1
|
||||
```
|
||||
使用 `kubectl`访问新创建的 Kubernetes 集群
|
||||
```shell
|
||||
$ kubectl get pods
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
nginx-deployment-558fc78868-kvjnf 1/1 Running 1 4d12h
|
||||
nginx-deployment-558fc78868-kx9gt 1/1 Running 1 4d12h
|
||||
```
|
||||
|
||||
## 部署fluid
|
||||
开始之前,确保已满足以下要求:
|
||||
|
||||
- 使用 `kubectl` 可以成功访问到 Kubernetes 集群
|
||||
- [Helm](https://helm.sh/docs/intro/install/) : Helm 3 已安装
|
||||
|
||||
1. 获取 fluid
|
||||
```shell
|
||||
git clone https://github.com/fluid-cloudnative/fluid.git
|
||||
```
|
||||
2. 使用 Helm 安装 fluid
|
||||
```shell
|
||||
helm install fluid fluid
|
||||
NAME: fluid
|
||||
LAST DEPLOYED: Tue Jul 7 11:22:07 2020
|
||||
NAMESPACE: default
|
||||
STATUS: deployed
|
||||
REVISION: 1
|
||||
TEST SUITE: None
|
||||
```
|
||||
3. 查看安装结果
|
||||
```shell
|
||||
kubectl get pod -n fluid-system
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
controller-manager-6b864dfd4f-995gm 1/1 Running 0 32h
|
||||
csi-nodeplugin-fluid-c6pzj 2/2 Running 0 32h
|
||||
csi-nodeplugin-fluid-wczmq 2/2 Running 0 32h
|
||||
```
|
||||
|
||||
## Create a dataset
|
||||
fluid提供了云原生的数据加速和管理能力,并抽象出了`数据集`概念方便用户管理,接下来将演示如何用 fluid 创建一个数据集。
|
||||
|
||||
1. 通过CRD文件创建一个Dataset对象,其中描述了数据集的来源。
|
||||
```yaml
|
||||
apiVersion: data.fluid.io/v1alpha1
|
||||
kind: Dataset
|
||||
metadata:
|
||||
name: demo
|
||||
spec:
|
||||
mounts:
|
||||
- mountPoint: https://mirror.bit.edu.cn/apache/spark/spark-3.0.0/
|
||||
name: spark
|
||||
```
|
||||
执行安装
|
||||
|
||||
```
|
||||
kubectl create -f dataset.yaml
|
||||
```
|
||||
dataset创建以后处于 `not bound` 状态,需要绑定 runtime 才能使用。
|
||||
|
||||
|
||||
2. 同样根据 alluxioRuntime的CRD文件创建一个 Alluxio Runtime 对象,用来描述支持这个数据集的 runtime。
|
||||
```yaml
|
||||
apiVersion: data.fluid.io/v1alpha1
|
||||
kind: AlluxioRuntime
|
||||
metadata:
|
||||
name: demo
|
||||
spec:
|
||||
replicas: 1
|
||||
tieredstore:
|
||||
levels:
|
||||
- mediumtype: MEM
|
||||
path: /dev/shm
|
||||
quota: 2Gi
|
||||
high: "0.95"
|
||||
low: "0.7"
|
||||
storageType: Memory
|
||||
properties:
|
||||
alluxio.user.file.writetype.default: MUST_CACHE
|
||||
alluxio.master.journal.folder: /journal
|
||||
alluxio.master.journal.type: UFS
|
||||
alluxio.user.block.size.bytes.default: 256MB
|
||||
alluxio.user.streaming.reader.chunk.size.bytes: 256MB
|
||||
alluxio.user.local.reader.chunk.size.bytes: 256MB
|
||||
alluxio.worker.network.reader.buffer.size: 256MB
|
||||
alluxio.user.streaming.data.timeout: 300sec
|
||||
master:
|
||||
jvmOptions:
|
||||
- "-Xmx4G"
|
||||
worker:
|
||||
jvmOptions:
|
||||
- "-Xmx4G"
|
||||
fuse:
|
||||
jvmOptions:
|
||||
- "-Xmx4G "
|
||||
- "-Xms4G "
|
||||
# For now, only support local
|
||||
shortCircuitPolicy: local
|
||||
args:
|
||||
- fuse
|
||||
- --fuse-opts=direct_io,ro,max_read=131072
|
||||
```
|
||||
使用kubectl完成创建
|
||||
|
||||
```shell
|
||||
kubectl create -f runtime.yaml
|
||||
```
|
||||
|
||||
3. 接下来,我们创建一个应用容器来使用该数据集,我们将多次访问同一数据,并比较访问时间来展示 fluid 的加速效果。
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: demo-app
|
||||
spec:
|
||||
containers:
|
||||
- name: demo
|
||||
image: nginx
|
||||
volumeMounts:
|
||||
- mountPath: /data
|
||||
name: demo
|
||||
volumes:
|
||||
- name: demo
|
||||
persistentVolumeClaim:
|
||||
claimName: demo
|
||||
```
|
||||
|
||||
4. 登录到应用容器中访问数据,初次访问会花费更长时间。
|
||||
```shell
|
||||
kubectl exec -it demo-app -- bash
|
||||
# du -sh /data/spark/spark-3.0.0-bin-without-hadoop.tgz
|
||||
150M /data/spark/spark-3.0.0-bin-without-hadoop.tgz
|
||||
# time cp /data/spark/spark-3.0.0-bin-without-hadoop.tgz /dev/null
|
||||
real 0m13.171s
|
||||
user 0m0.002s
|
||||
sys 0m0.028s
|
||||
```
|
||||
|
||||
5. 为了避免其他因素(比如 page cache )对结果造成影响,我们将删除之前的容器,新建相同的应用,尝试访问同样的文件。由于此时文件已经被 alluxio 缓存,可以看到第二次访问所需时间远小于第一次。
|
||||
```shell
|
||||
kubectl delete -f app.yaml && kubectl create -f app.yaml
|
||||
...
|
||||
# time cp /data/spark/spark-3.0.0-bin-without-hadoop.tgz /dev/null
|
||||
real 0m0.344s
|
||||
user 0m0.002s
|
||||
sys 0m0.020s
|
||||
```
|
||||
|
||||
到这里,我们已经成功创建了一个数据集并完成了加速,关于数据集的进一步使用和管理可以参考[accelerate](
|
||||
../user/accelerate_data_accessing.md)和[co-locality](../user/data_co_locality.md)这两个例子。
|
|
@ -0,0 +1,87 @@
|
|||
# 在Kubernetes集群上部署Fluid
|
||||
|
||||
## 前提条件
|
||||
|
||||
- git
|
||||
|
||||
- Kubernetes集群(version >= 1.14), 并且支持CSI功能
|
||||
- kubectl(version >= 1.14)
|
||||
- Helm(version >= 3.0)
|
||||
|
||||
接下来的文档假设您已经配置好上述所有环境。
|
||||
|
||||
对于kubectl的安装和配置,请参考[此处](https://kubernetes.io/docs/tasks/tools/install-kubectl/)。
|
||||
|
||||
对于Helm 3的安装和配置,请参考[此处](https://v3.helm.sh/docs/intro/install/)。
|
||||
|
||||
## Fluid安装步骤
|
||||
|
||||
### 获取Fluid Chart
|
||||
|
||||
您可以在任意文件夹,执行以下命令,从[fluid代码仓库](https://github.com/fluid-cloudnative/fluid)拷贝源代码:
|
||||
|
||||
```shell
|
||||
$ git clone https://github.com/fluid-cloudnative/fluid.git
|
||||
```
|
||||
|
||||
fluid源代码中包含了部署fluid所需的[helm charts](https://github.com/fluid-cloudnative/fluid/tree/master/charts)。
|
||||
|
||||
### 使用Helm安装Fluid
|
||||
|
||||
进入刚才克隆的本地代码仓库:
|
||||
|
||||
```shell
|
||||
$ cd fluid
|
||||
```
|
||||
|
||||
创建命名空间:
|
||||
|
||||
```shell
|
||||
$ kubectl create ns fluid-system
|
||||
```
|
||||
|
||||
安装fluid:
|
||||
|
||||
```shell
|
||||
$ helm install fluid charts/fluid/fluid
|
||||
NAME: fluid
|
||||
LAST DEPLOYED: Fri Jul 24 16:10:18 2020
|
||||
NAMESPACE: default
|
||||
STATUS: deployed
|
||||
REVISION: 1
|
||||
TEST SUITE: None
|
||||
```
|
||||
|
||||
> `helm install`命令的一般格式是`helm install <RELEASE_NAME> <SOURCE>`,在上面的命令中,`fluid`指定了安装的release名字,这可以自行更改,`charts/fluid/fluid`指定了helm chart的所在路径。
|
||||
|
||||
### 检查各组件状态
|
||||
|
||||
**查看Fluid使用的CRD:**
|
||||
|
||||
```shell
|
||||
$ kubectl get crd | grep data.fluid.io
|
||||
alluxiodataloads.data.fluid.io 2020-07-24T06:54:50Z
|
||||
alluxioruntimes.data.fluid.io 2020-07-24T06:54:50Z
|
||||
datasets.data.fluid.io 2020-07-24T06:54:50Z
|
||||
```
|
||||
|
||||
**查看各Pod的状态:**
|
||||
|
||||
```shell
|
||||
$ kubectl get pod -n fluid-system
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
controller-manager-7f99c884dd-894g9 1/1 Running 0 5m28s
|
||||
csi-nodeplugin-fluid-dm9b8 2/2 Running 0 5m28s
|
||||
csi-nodeplugin-fluid-hwtvh 2/2 Running 0 5m28s
|
||||
```
|
||||
|
||||
如果Pod状态如上所示,那么Fluid就可以正常使用了!
|
||||
|
||||
### 卸载Fluid
|
||||
|
||||
```shell
|
||||
$ helm delete fluid
|
||||
$ kubectl delete -f charts/fluid/fluid/crds
|
||||
```
|
||||
|
||||
> 这里的`fluid`对应安装时指定的<RELEASE_NAME>。
|
|
@ -0,0 +1,14 @@
|
|||
# Overview
|
||||
|
||||
[Fluid](https://github.com/fluid-cloudnative/fluid) is An open source Kubernetes-native Distributed Dataset Orchestrator and Accelerator for Data Analysis and Machine Learning. It provides a full management life-cycle for Data orchastration system(Alluxio) including deployment, scaling, configuratio changes. With Fluid, the end user can manage the data without touching the Data Caching System.
|
||||
|
||||
|
||||
> **Note:**
|
||||
>
|
||||
> You can only deploy Fluid in a Kubernetes cluster.
|
||||
|
||||
The corresponding relationship between Fluid and Alluxio versions is as follows:
|
||||
|
||||
| Fluid version | Compatible Alluxio versions |
|
||||
|:---|:---|
|
||||
| v0.1 | [Alluxio JNI Fuse 2.3](https://github.com/Alluxio/alluxio/tree/branch-2.3-fuse)|
|
Loading…
Reference in New Issue