Parquet S3 FDW 构建指南

构建 Parquet S3 FDW 扩展,及其依赖 libarrow, libparquet, 以及 libaws-cpp 的过程

构建parquet_s3_fdw

扩展 parquet_s3_fdw 有两个主要依赖:arrowawssdk


构建arrow

克隆 arrow 仓库并使用 cmake 构建:

cd ~ ; git clone git@github.com:apache/arrow.git;
mkdir -p ~/arrow/cpp/release; cd ~/arrow/cpp/release;
cmake .. -DARROW_PARQUET=ON -DARROW_S3=ON; make -j8
sudo make install

构建libaws

libaws-cpp 里有许多服务的驱动,但我们只需要两个:cores3

# 安装依赖
sudo yum install libcurl-devel openssl-devel libuuid-devel pulseaudio-libs-devel
# sudo apt-get install libcurl4-openssl-dev libssl-dev uuid-dev libpulse-dev # debian/ubuntu

# 克隆 libaws 仓库 (很大!)
cd ~; git clone --recurse-submodules git@github.com:aws/aws-sdk-cpp.git

mkdir -p ~/aws-sdk-cpp/release; cd ~/aws-sdk-cpp/release;
cmake .. -DBUILD_ONLY="s3"; make -j20
sudo make install

构建libarrow-s3

收集生成的 .so 文件,然后将其打包为一个 RPM / DEB 包:

mkdir -p ~/libarrow-s3
cp -d ~/arrow/cpp/release/release/libarrow.so*                                     ~/libarrow-s3/
cp -d ~/arrow/cpp/release/release/libparquet.so*                                   ~/libarrow-s3/
cp -f ~/aws-sdk-cpp/release/generated/src/aws-cpp-sdk-s3/libaws-cpp-sdk-s3.so      ~/libarrow-s3/
cp -f ~/aws-sdk-cpp/release/src/aws-cpp-sdk-core/libaws-cpp-sdk-core.so            ~/libarrow-s3/
cp -d ~/aws-sdk-cpp/release/lib/libaws-c-event-stream.so*                          ~/libarrow-s3/
cp -d ~/aws-sdk-cpp/release/lib/libs2n.so*                                         ~/libarrow-s3/
cp -d ~/aws-sdk-cpp/release/crt/aws-crt-cpp/libaws-crt-cpp.so                        ~/libarrow-s3/
cp -d ~/aws-sdk-cpp/release/crt/aws-crt-cpp/crt/aws-c-common/libaws-c-common.so*     ~/libarrow-s3/
cp -d ~/aws-sdk-cpp/release/crt/aws-crt-cpp/crt/aws-checksums/libaws-checksums.so*   ~/libarrow-s3/
cp -d ~/aws-sdk-cpp/release/crt/aws-crt-cpp/crt/aws-c-io/libaws-c-io.so*             ~/libarrow-s3/
cp -d ~/aws-sdk-cpp/release/crt/aws-crt-cpp/crt/aws-c-mqtt/libaws-c-mqtt.so*         ~/libarrow-s3/
cp -d ~/aws-sdk-cpp/release/crt/aws-crt-cpp/crt/aws-c-cal/libaws-c-cal.so*           ~/libarrow-s3/
cp -d ~/aws-sdk-cpp/release/crt/aws-crt-cpp/crt/aws-checksums/libaws-checksums.so*   ~/libarrow-s3/
cp -d ~/aws-sdk-cpp/release/crt/aws-crt-cpp/crt/aws-c-s3/libaws-c-s3.so*             ~/libarrow-s3/
cp -d ~/aws-sdk-cpp/release/crt/aws-crt-cpp/crt/aws-c-common/libaws-c-common.so*     ~/libarrow-s3/
cp -d ~/aws-sdk-cpp/release/crt/aws-crt-cpp/crt/aws-c-http/libaws-c-http.so*         ~/libarrow-s3/
cp -d ~/aws-sdk-cpp/release/crt/aws-crt-cpp/crt/aws-c-sdkutils/libaws-c-sdkutils.so* ~/libarrow-s3/
cp -d ~/aws-sdk-cpp/release/crt/aws-crt-cpp/crt/aws-c-auth/libaws-c-auth.so*         ~/libarrow-s3/
cp -d ~/aws-sdk-cpp/release/crt/aws-crt-cpp/crt/aws-c-compression/libaws-c-compression.so* ~/libarrow-s3/

构建生成的 so 文件包含空 RPATH,使用 patchelf 去除 (EL系统):

cd ~/libarrow-s3/
patchelf --remove-rpath libarrow.so.1800.0.0
patchelf --remove-rpath libparquet.so.1800.0.0
patchelf --remove-rpath libaws-cpp-sdk-core.so
patchelf --remove-rpath libaws-cpp-sdk-s3.so

将这些 so 文件整体打包为一个 libarrow-s3 软件包:

cd ~/rpmbuild/SPECS
rpmbuild -ba ~/rpmbuild/SPECS/libarrow-s3.spec
sudo rpm -ivh ~/rpmbuild/RPMS/x86_64/libarrow-s3-17.0.0-1PIGSTY.*

Last modified 2024-08-04: add extensions (25d8a2b6)