Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon


August 04, 21


Presentation slide in ApacheCon Asia 2021 ( held on August 8, 2021.




埋め込む »CMSなどでJSが使えない場合



The picture can't be displayed. Big Data Technical tips for secure Apache Hadoop cluster Akira Ajisaka, Kei Kori Yahoo Japan Corporation


Akira Ajisaka (@ajis_ka) • Software Engineer in Hadoop team @ Yahoo! JAPAN – Upgraded HDFS to 3.3.0 and enabled RBF – R&D for more secure Hadoop cluster than just enabling Kerberos auth • Apache Hadoop committer/PMC – ~800 commits in various components in 6 years – Handled and announced several CVEs – Manages build and QA environment


Kei KORI (@2k0ri) • Data Platform Engineer in Hadoop team @ Yahoo! JAPAN – Built upgrading to and continuous delivery for HDFS 3.3.0 – Research of operation for more secure Hadoop cluster • Kubernetes admin for Hadoop client environment – Migrates users from VM/BM to cloud native way – Integrates ML/DL workloads with Hadoop ecosystem


Session Overview 4


Session Overview Prerequisites: • Hadoop is not secure by default • Kerberos authentication is required This talk is to introduce further details in practice: • Wire encryption in Hadoop ecosystem • HDFS transparent data encryption at rest • Other considerations


Wire encryption in Hadoop ecosystem 6


Background For making Hadoop ecosystem more secure than perimeter security • Not only authenticate but encrypt communications • Protection and mitigation from internal threats like packet sniffing • Part of security compliance like NIST SP800-171


Overview: wire encryption types between components • HTTP encryption – HDFS, YARN, MapReduce, KMS, HttpFS, Spark, Hive, Oozie, Livy • RPC encryption – HDFS, YARN, MapReduce, KMS, Spark, Hive, Oozie, ZooKeeper • Block data transfer encryption – HDFS • Shuffle encryption – MapReduce, Spark, Tez


HTTP encryption for Hadoop • dfs.http.policy: HTTPS_ONLY in hdfs-site, yarn.http.policy: HTTPS_ONLY in yarn-site, mapreduce.jobhistory.http.policy: HTTPS_ONLY in mapred-site etc. – – • yarn.timeline-service.webapp.https.address in yarn-site, mapreduce.jobhistory.webapp.https.address in mapred-site – • Enable TLS on WebUI/REST API endpoints HTTP_AND_HTTPS while rolling update endpoints Set History/Timeline Server endpoints with HTTPS Storing certs and passphrases using Hadoop Credential Provider into – – Separates permissions from configs Prevents exposure outside of filtering


RPC encryption for Hadoop • privacy in core-site – Encrypts RPC incl. Kerberos authentication on SASL layer – Propagates to, and • privacy,authentication while rolling update whole Hadoop servers/clients – Accepts falling back to non-encrypted RPC


Block data transfer encryption for Hadoop • true, AES/CTR/NoPadding in hdfs-site – Only encrypts payload between HDFS client and DataNodes • Rolling update is not supported within configs – Needs managing list of encrypted nodes or extend/implement own dfs.trustedchannel.resolver.class – Trusted nodes by dfs.trustedchannel.resolver.class are forced to transfer without encryption regardless of its encryption status


Encryption for Spark In spark-defaults: • HTTP encryption – spark.ssl.sparkHistory.enabled true • Switches protocol on 1 port, does not support HTTP_AND_HTTPS – spark.yarn.historyServer.address https://... • RPC encryption – spark.authenticate: true • Also in yarn-site – spark.authenticate.enableSaslEncryption true – true • After all Spark components recognized enableSaslEncryption • Shuffle encryption – true – true • Encrypts spilled caches and RDDs on local disks


Encryption for Hive • • • • hive.server2.thrift.sasl.qop: auth-conf in hive-site – – Encrypts JDBC between client and HiveServer2 binary mode And Thrift between clients and Hive Metastore – – Only for HS2 http mode HS2 binary mode cannot enable both TLS and SASL – Tez: hive.server2.use.SSL: true in hive-site Encryption for JDBC between HS2/Hive Metastore and remote RDBMS Shuffle encryption tez.runtime.shuffle.ssl.enable: true, tez.runtime.shuffle.keep-alive.enabled: true in tez-site – – MapReduce: mapreduce.ssl.enabled: true, mapreduce.shuffle.ssl.enabled: true in mapred-site Requires server certs for all NodeManagers


Challenges in HTTP encryption: for Application Master / Spark Driver • Server certs for ApplicationMaster / SparkDriver need to be readable by the user who submitted it – ApplicationMaster and SparkDriver run as the user – WebApplicationProxy between ResourceManager and ApplicationMaster relies on this encryption • Applications support TLS and can bundle certs since – Spark 3.0.0: SPARK-24621 – MapReduce 3.3.0: MAPREDUCE-4669 – Tez: not supported yet


Encryption for ZooKeeper server • Authenticate with SASL, encrypt with TLS – ZooKeeper doen not respect SASL QOP • Requires ZooKeeper 3.5.6 or above for servers/quorums – serverCnxnFactory=org.apache.zookeeper.server.Nett yServerCnxnFactory – sslQuorum=true – ssl.clientAuth=NONE – ssl.quorum.clientAuth=NONE • Needs ZOOKEEPER-4276 to follow Upgrading existing non-TLS cluster with no downtime – Makes ZK can serve only with secureClientPort


Encryption for ZooKeeper client • Also Requires ZooKeeper 3.5.6 or above for clients -Dzookeeper.clientCnxnSocket= org.apache.zookeeper.ClientCnxnSocketNetty in client JVM args – HADOOP_OPTS environment variable –, in mapred-site for Oozie Coordinator MapReduce jobs • Needs to replace and update ZooKeeper jars in all components which communicate with ZooKeeper – ZKFC, ResourceManager, Hive clients incl. HS2, Oozie and Livy – Apache Curator also be updated to 4.2.0, Netty from 4.0 to 4.1


Enforcing Kerberos AuthN/Z for ZooKeeper • Requires ZooKeeper 3.6.0 or above for servers – 3.6.0+: zookeeper.sessionRequireClientSASLAuth=true – 3.7.0+: enforce.auth.enabled=true enforce.auth.schemes=sasl • Oozie Hive action will not work with forcing ZK SASL – when acquiring the lock for Hive Metastore – Has no mechanisms to delegate authentication or impersonation for ZooKeeper – Using HiveServer2 / Oozie Hive2 action solve it


HDFS transparent data encryption (TDE) at rest 18


Background HDFS blocks are written to local filesystem of the DataNodes • the data is not encrypted by default • encryption is required in several use cases Encryption can be done at several layers: • Application: most secure, but hardest to do • Database: most databases have this, but may incur performance penalties • Filesystem: high performance, transparent, but may not be flexible • Disk: only really protects against physical theft HDFS TDE fits between database and filesystem level


Overview: encryption/decryption is transparent to the clients


KeyProvider: Where KEK is saved Implementations of KeyProvider API • Hadoop KMS: JavaKeyStoreProvider – JCEKS files in Hadoop compatible filesystems (localFS, HDFS, cloud storage) – Not recommended • Apache Ranger KMS: RangerKeyStoreProvider – RDBMS – master key can be stored in Luna HSM (optional) – HSM is required in some use cases • PCI-DSS, FIPS 140-2


Extending KeyProvider API is not difficult • Mandatory methods for HDFS TDE • Optional methods (nice to have for operation) • Use cases: – getKeyVersion, getCurrentKey, getMetadata – getKeys, getKeysMetadata, getKeyVersions, createKey, deleteKey, rollNewVersion – If not implemented, you need to create/delete/list/roll keys in some way – LinkedIn integrated with its own key management service, LiKMS – Yahoo! JAPAN also integrated with our own credential store by only ~500 LOC (including test code)


KeyProvider is actually stable, can be used safely • KeyProvider is @Public and @Unstable – @Unstable in Hadoop means "incompatible changes are allowed at any time" • Actually, the API is very stable – No incompatible changes – Ranger uses it since 2015: RANGER-247 • Provided a patch to mark it stable – HADOOP-17544

Hadoop KMS: Where KEK is
cached and performs

KMS interacts with HDFS clients, NameNodes, and KeyProvider
KMS have its own ACLs separated from HDFS ACLs
– An attacker cannot decrypt data even if HDFS ACLs are compromised
– If 'usera' reads/writes data in the encryption zone with 'keya', the
configuration in kms-acls.xml will be:

– The configuration is hot-reloaded


For HA and scalability, multiple KMS instances are supported


How to deploy multiple KMS instances Two Approaches: 1. Behind a load-balancer or VIP 2. Using LoadBalancingKMSClientProvider – Implicitly used when multiple URIs are specified in If you have a LB or VIP, use it • No configuration change to scale-out/decommission • LB saves clients' retry cost – LoadBalancingKMSClientProvider first try to connect to a KMS, if fails, then connect to another KMS


How to configure multiple KMS instances • Delegation Token must be synchronized – Use ZKDelegationTokenSecretManager – Documented an example configuration: HADOOP-17794 • – If true (default), fails to validate SSL certificates in multihomed environment – Documented: HADOOP-12665


Tuning Hadoop KMS • Documented and discussed in HADOOP-15743 – – – – Reduce SSL session cache size and TTL Tuning https idle timeout Increase max file descriptors etc. • This tuning is effective in HttpFS as well – Both KMS/HttpFS use Jetty via HttpServer2


Recap: HDFS TDE • Careful configuration required – – – – – How to save KEK Running multiple KMS instances KMS Tuning Where to create encryption zones ACLs (including key ACLs and impersonation) • They are not straightforward despite the long time since the feature was developed


Other considerations 29


Updating SSL certificates • Hadoop >= 3.3.1 allows updating SSL certificates without downtime: HADOOP-16524 – Use hot-reload feature in Jetty – Except DataNode since DN don't rely on Jetty • Useful especially for NameNode because it takes > 30 minutes to restart in large cluster


Other considerations • It is important to be ready to upgrade at any time – Sometimes CVEs have been published and the vendors warn users to upgrade • Security requirements may increase later, so be prepared for that early • Operational considerations are also necessary – Not only the cluster configuration but also the operations will be change


Conclusion & Future work We introduced many technical tips for secure Hadoop cluster • • However, they might change in the future Need to catch up with the OSS community Future work • • How to enable SSL/TLS in ApplicationMaster & Spark Driver Web UIs Impersonation does not work correctly in KMSClientProvider: HDFS-13697


THANK YOU QUESTIONS? @aajisaka @2k0ri