Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon

148 Views

August 04, 21

スライド概要

Presentation slide in ApacheCon Asia 2021 (https://www.apachecon.com/acasia2021/index.html) held on August 8, 2021.

profile-image

エンジニア・デザイナー向けのヤフー公式アカウント。イベント/登壇情報/ブログ記事など、ヤフーの技術・デザインに関わる情報を発信します。

シェア

埋め込む »CMSなどでJSが使えない場合

関連スライド

各ページのテキスト
1.

The picture can't be displayed. Big Data Technical tips for secure Apache Hadoop cluster Akira Ajisaka, Kei Kori Yahoo Japan Corporation

2.

Akira Ajisaka (@ajis_ka) • Software Engineer in Hadoop team @ Yahoo! JAPAN – Upgraded HDFS to 3.3.0 and enabled RBF – R&D for more secure Hadoop cluster than just enabling Kerberos auth • Apache Hadoop committer/PMC – ~800 commits in various components in 6 years – Handled and announced several CVEs – Manages build and QA environment

3.

Kei KORI (@2k0ri) • Data Platform Engineer in Hadoop team @ Yahoo! JAPAN – Built upgrading to and continuous delivery for HDFS 3.3.0 – Research of operation for more secure Hadoop cluster • Kubernetes admin for Hadoop client environment – Migrates users from VM/BM to cloud native way – Integrates ML/DL workloads with Hadoop ecosystem

4.

Session Overview 4

5.

Session Overview Prerequisites: • Hadoop is not secure by default • Kerberos authentication is required This talk is to introduce further details in practice: • Wire encryption in Hadoop ecosystem • HDFS transparent data encryption at rest • Other considerations

6.

Wire encryption in Hadoop ecosystem 6

7.

Background For making Hadoop ecosystem more secure than perimeter security • Not only authenticate but encrypt communications • Protection and mitigation from internal threats like packet sniffing • Part of security compliance like NIST SP800-171

8.

Overview: wire encryption types between components • HTTP encryption – HDFS, YARN, MapReduce, KMS, HttpFS, Spark, Hive, Oozie, Livy • RPC encryption – HDFS, YARN, MapReduce, KMS, Spark, Hive, Oozie, ZooKeeper • Block data transfer encryption – HDFS • Shuffle encryption – MapReduce, Spark, Tez

9.

HTTP encryption for Hadoop • dfs.http.policy: HTTPS_ONLY in hdfs-site, yarn.http.policy: HTTPS_ONLY in yarn-site, mapreduce.jobhistory.http.policy: HTTPS_ONLY in mapred-site etc. – – • yarn.timeline-service.webapp.https.address in yarn-site, mapreduce.jobhistory.webapp.https.address in mapred-site – • Enable TLS on WebUI/REST API endpoints HTTP_AND_HTTPS while rolling update endpoints Set History/Timeline Server endpoints with HTTPS Storing certs and passphrases using Hadoop Credential Provider into hadoop.security.credential.provider.path – – Separates permissions from configs Prevents exposure outside of hadoop.security.sensitive-config-keys filtering

10.

RPC encryption for Hadoop • hadoop.rpc.protection: privacy in core-site – Encrypts RPC incl. Kerberos authentication on SASL layer – Propagates to hadoop.security.saslproperties.resolver.class, dfs.data.transfer.saslproperties.resolver.class and dfs.data.transfer.protection • hadoop.rpc.protection: privacy,authentication while rolling update whole Hadoop servers/clients – Accepts falling back to non-encrypted RPC

11.

Block data transfer encryption for Hadoop • dfs.encrypt.data.transfer: true, dfs.encrypt.data.transfer.cipher.suites: AES/CTR/NoPadding in hdfs-site – Only encrypts payload between HDFS client and DataNodes • Rolling update is not supported within configs – Needs managing list of encrypted nodes or extend/implement own dfs.trustedchannel.resolver.class – Trusted nodes by dfs.trustedchannel.resolver.class are forced to transfer without encryption regardless of its encryption status

12.

Encryption for Spark In spark-defaults: • HTTP encryption – spark.ssl.sparkHistory.enabled true • Switches protocol on 1 port, does not support HTTP_AND_HTTPS – spark.yarn.historyServer.address https://... • RPC encryption – spark.authenticate: true • Also in yarn-site – spark.authenticate.enableSaslEncryption true – spark.network.sasl.serverAlwaysEncrypt true • After all Spark components recognized enableSaslEncryption • Shuffle encryption – spark.network.crypto.enabled true – spark.io.encryption.enabled true • Encrypts spilled caches and RDDs on local disks

13.

Encryption for Hive • • • • hive.server2.thrift.sasl.qop: auth-conf in hive-site – – Encrypts JDBC between client and HiveServer2 binary mode And Thrift between clients and Hive Metastore – – Only for HS2 http mode HS2 binary mode cannot enable both TLS and SASL – Tez: hive.server2.use.SSL: true in hive-site Encryption for JDBC between HS2/Hive Metastore and remote RDBMS Shuffle encryption tez.runtime.shuffle.ssl.enable: true, tez.runtime.shuffle.keep-alive.enabled: true in tez-site – – MapReduce: mapreduce.ssl.enabled: true, mapreduce.shuffle.ssl.enabled: true in mapred-site Requires server certs for all NodeManagers

14.

Challenges in HTTP encryption: for Application Master / Spark Driver • Server certs for ApplicationMaster / SparkDriver need to be readable by the user who submitted it – ApplicationMaster and SparkDriver run as the user – WebApplicationProxy between ResourceManager and ApplicationMaster relies on this encryption • Applications support TLS and can bundle certs since – Spark 3.0.0: SPARK-24621 – MapReduce 3.3.0: MAPREDUCE-4669 – Tez: not supported yet

15.

Encryption for ZooKeeper server • Authenticate with SASL, encrypt with TLS – ZooKeeper doen not respect SASL QOP • Requires ZooKeeper 3.5.6 or above for servers/quorums – serverCnxnFactory=org.apache.zookeeper.server.Nett yServerCnxnFactory – sslQuorum=true – ssl.clientAuth=NONE – ssl.quorum.clientAuth=NONE • Needs ZOOKEEPER-4276 to follow Upgrading existing non-TLS cluster with no downtime – Makes ZK can serve only with secureClientPort

16.

Encryption for ZooKeeper client • Also Requires ZooKeeper 3.5.6 or above for clients -Dzookeeper.client.secure=true -Dzookeeper.clientCnxnSocket= org.apache.zookeeper.ClientCnxnSocketNetty in client JVM args – HADOOP_OPTS environment variable – mapreduce.admin.map.child.java.opts, mapreduce.admin.reduce.child.java.opts in mapred-site for Oozie Coordinator MapReduce jobs • Needs to replace and update ZooKeeper jars in all components which communicate with ZooKeeper – ZKFC, ResourceManager, Hive clients incl. HS2, Oozie and Livy – Apache Curator also be updated to 4.2.0, Netty from 4.0 to 4.1

17.

Enforcing Kerberos AuthN/Z for ZooKeeper • Requires ZooKeeper 3.6.0 or above for servers – 3.6.0+: zookeeper.sessionRequireClientSASLAuth=true – 3.7.0+: enforce.auth.enabled=true enforce.auth.schemes=sasl • Oozie Hive action will not work with forcing ZK SASL – when acquiring the lock for Hive Metastore – Has no mechanisms to delegate authentication or impersonation for ZooKeeper – Using HiveServer2 / Oozie Hive2 action solve it

18.

HDFS transparent data encryption (TDE) at rest 18

19.

Background HDFS blocks are written to local filesystem of the DataNodes • the data is not encrypted by default • encryption is required in several use cases Encryption can be done at several layers: • Application: most secure, but hardest to do • Database: most databases have this, but may incur performance penalties • Filesystem: high performance, transparent, but may not be flexible • Disk: only really protects against physical theft HDFS TDE fits between database and filesystem level

20.

Overview: encryption/decryption is transparent to the clients

21.

KeyProvider: Where KEK is saved Implementations of KeyProvider API • Hadoop KMS: JavaKeyStoreProvider – JCEKS files in Hadoop compatible filesystems (localFS, HDFS, cloud storage) – Not recommended • Apache Ranger KMS: RangerKeyStoreProvider – RDBMS – master key can be stored in Luna HSM (optional) – HSM is required in some use cases • PCI-DSS, FIPS 140-2

22.

Extending KeyProvider API is not difficult • Mandatory methods for HDFS TDE • Optional methods (nice to have for operation) • Use cases: – getKeyVersion, getCurrentKey, getMetadata – getKeys, getKeysMetadata, getKeyVersions, createKey, deleteKey, rollNewVersion – If not implemented, you need to create/delete/list/roll keys in some way – LinkedIn integrated with its own key management service, LiKMS https://engineering.linkedin.com/blog/2021/the-exabyte-club-linkedin-s-journey-of-scaling-the-hadoop-distr – Yahoo! JAPAN also integrated with our own credential store by only ~500 LOC (including test code)

23.

KeyProvider is actually stable, can be used safely • KeyProvider is @Public and @Unstable – @Unstable in Hadoop means "incompatible changes are allowed at any time" • Actually, the API is very stable – No incompatible changes – Ranger uses it since 2015: RANGER-247 • Provided a patch to mark it stable – HADOOP-17544

24.
[beta]
Hadoop KMS: Where KEK is
cached and performs
authorization
•
•

KMS interacts with HDFS clients, NameNodes, and KeyProvider
KMS have its own ACLs separated from HDFS ACLs
– An attacker cannot decrypt data even if HDFS ACLs are compromised
– If 'usera' reads/writes data in the encryption zone with 'keya', the
configuration in kms-acls.xml will be:
<property>
<name>key.acl.keya.DECRYPT_EEK</name>
<value>usera</value>
</property>

– The configuration is hot-reloaded

•

For HA and scalability, multiple KMS instances are supported

25.

How to deploy multiple KMS instances Two Approaches: 1. Behind a load-balancer or VIP 2. Using LoadBalancingKMSClientProvider – Implicitly used when multiple URIs are specified in hadoop.security.key.provider.path If you have a LB or VIP, use it • No configuration change to scale-out/decommission • LB saves clients' retry cost – LoadBalancingKMSClientProvider first try to connect to a KMS, if fails, then connect to another KMS

26.

How to configure multiple KMS instances • Delegation Token must be synchronized – Use ZKDelegationTokenSecretManager – Documented an example configuration: HADOOP-17794 • hadoop.security.token.service.use_ip – If true (default), fails to validate SSL certificates in multihomed environment – Documented: HADOOP-12665

27.

Tuning Hadoop KMS • Documented and discussed in HADOOP-15743 – – – – Reduce SSL session cache size and TTL Tuning https idle timeout Increase max file descriptors etc. • This tuning is effective in HttpFS as well – Both KMS/HttpFS use Jetty via HttpServer2

28.

Recap: HDFS TDE • Careful configuration required – – – – – How to save KEK Running multiple KMS instances KMS Tuning Where to create encryption zones ACLs (including key ACLs and impersonation) • They are not straightforward despite the long time since the feature was developed

29.

Other considerations 29

30.

Updating SSL certificates • Hadoop >= 3.3.1 allows updating SSL certificates without downtime: HADOOP-16524 – Use hot-reload feature in Jetty – Except DataNode since DN don't rely on Jetty • Useful especially for NameNode because it takes > 30 minutes to restart in large cluster

31.

Other considerations • It is important to be ready to upgrade at any time – Sometimes CVEs have been published and the vendors warn users to upgrade • Security requirements may increase later, so be prepared for that early • Operational considerations are also necessary – Not only the cluster configuration but also the operations will be change

32.

Conclusion & Future work We introduced many technical tips for secure Hadoop cluster • • However, they might change in the future Need to catch up with the OSS community Future work • • How to enable SSL/TLS in ApplicationMaster & Spark Driver Web UIs Impersonation does not work correctly in KMSClientProvider: HDFS-13697

33.

THANK YOU QUESTIONS? @aajisaka @2k0ri