EMR Serverless Spark 执行器超时
2022-09-28
2566
我有一个 EMR Serverless 应用程序,由于某种原因,它陷入了执行超时。我已经测试了所有 s3 连接,并且它正在运行。问题发生在执行 spark 表中的查询期间。
EMR 版本是:emr-6.7.0 同样的作业可以在 k8s 中的 spark 3.1.1 版本上运行,可能与版本有关。
我的 spark 会话设置:
spark = (SparkSession.builder
.config("spark.hadoop.fs.s3a.fast.upload", True)
.config("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
.config("spark.sql.legacy.parquet.datetimeRebaseModeInWrite", "CORRECTED")
.config("spark.sql.autoBroadcastJoinThreshold", -1)
.config("spark.sql.shuffle.partitions", "1000")
.config("spark.sql.adaptive.enabled", "true")
.config("spark.sql.adaptive.coalescePartitions.enabled", "true")
.config("spark.sql.adaptive.advisoryPartitionSizeInBytes", "268435456")
.config("spark.jars.packages", "org.apache.hadoop:hadoop-aws:3.2.0")
.config("spark.jars.packages", "mysql:mysql-connector-java:8.0.17")
.enableHiveSupport().getOrCreate()
)
驱动程序日志:
Ivy Default Cache set to: /home/hadoop/.ivy2/cache
The jars for the packages stored in: /home/hadoop/.ivy2/jars
org.apache.hadoop#hadoop-aws added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-b943cb44-441b-41b2-8ea1-c44496d2e550;1.0
confs: [default]
found org.apache.hadoop#hadoop-aws;3.2.0 in central
found com.amazonaws#aws-java-sdk-bundle;1.11.375 in central
downloading https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.2.0/hadoop-aws-3.2.0.jar ...
[SUCCESSFUL ] org.apache.hadoop#hadoop-aws;3.2.0!hadoop-aws.jar (20ms)
downloading https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.11.375/aws-java-sdk-bundle-1.11.375.jar ...
[SUCCESSFUL ] com.amazonaws#aws-java-sdk-bundle;1.11.375!aws-java-sdk-bundle.jar (960ms)
:: resolution report :: resolve 823ms :: artifacts dl 984ms
:: modules in use:
com.amazonaws#aws-java-sdk-bundle;1.11.375 from central in [default]
org.apache.hadoop#hadoop-aws;3.2.0 from central in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 2 | 2 | 2 | 0 || 2 | 2 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-b943cb44-441b-41b2-8ea1-c44496d2e550
confs: [default]
2 artifacts copied, 0 already retrieved (96887kB/80ms)
22/09/28 12:28:25 INFO SparkContext: Running Spark version 3.2.1-amzn-0
22/09/28 12:28:25 INFO ResourceUtils: ==============================================================
22/09/28 12:28:25 INFO ResourceUtils: No custom resources configured for spark.driver.
22/09/28 12:28:25 INFO ResourceUtils: ==============================================================
22/09/28 12:28:25 INFO SparkContext: Submitted application: spark_segmentacao_caminhoneiros.py
22/09/28 12:28:25 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 4, script: , vendor: , memory -> name: memory, amount: 14336, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
22/09/28 12:28:25 INFO ResourceProfile: Limiting resource is cpus at 4 tasks per executor
22/09/28 12:28:25 INFO ResourceProfileManager: Added ResourceProfile id: 0
22/09/28 12:28:25 INFO SecurityManager: Changing view acls to: hadoop
22/09/28 12:28:25 INFO SecurityManager: Changing modify acls to: hadoop
22/09/28 12:28:25 INFO SecurityManager: Changing view acls groups to:
22/09/28 12:28:25 INFO SecurityManager: Changing modify acls groups to:
22/09/28 12:28:25 INFO SecurityManager: SecurityManager: authentication enabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set()
22/09/28 12:28:26 INFO Utils: Successfully started service 'sparkDriver' on port 33303.
22/09/28 12:28:26 INFO SparkEnv: Registering MapOutputTracker
22/09/28 12:28:26 INFO SparkEnv: Registering BlockManagerMaster
22/09/28 12:28:26 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
22/09/28 12:28:26 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
22/09/28 12:28:26 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
22/09/28 12:28:26 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-22f8e599-8bdc-4d65-a5c1-f9ab0bf5f01c
22/09/28 12:28:26 INFO MemoryStore: MemoryStore started with capacity 7.3 GiB
22/09/28 12:28:26 INFO SparkEnv: Registering OutputCommitCoordinator
22/09/28 12:28:26 INFO SubResultCacheManager: Sub-result caches are disabled.
22/09/28 12:28:26 INFO log: Logging initialized @8694ms to org.sparkproject.jetty.util.log.Slf4jLog
22/09/28 12:28:26 INFO Server: jetty-9.4.43.v20210629; built: 2021-06-30T11:07:22.254Z; git: 526006ecfa3af7f1a27ef3a288e2bef7ea9dd7e8; jvm 1.8.0_342-b07
22/09/28 12:28:26 INFO Server: Started @8797ms
22/09/28 12:28:26 INFO AbstractConnector: Started ServerConnector@2f0a0570{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
22/09/28 12:28:26 INFO Utils: Successfully started service 'SparkUI' on port 4040.
22/09/28 12:28:26 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1afe3ab7{/jobs,null,AVAILABLE,@Spark}
22/09/28 12:28:26 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@4318eaf1{/jobs/json,null,AVAILABLE,@Spark}
22/09/28 12:28:26 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@31a7233b{/jobs/job,null,AVAILABLE,@Spark}
22/09/28 12:28:26 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@2db507b1{/jobs/job/json,null,AVAILABLE,@Spark}
22/09/28 12:28:26 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@69ab6402{/stages,null,AVAILABLE,@Spark}
22/09/28 12:28:26 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@34f8396d{/stages/json,null,AVAILABLE,@Spark}
22/09/28 12:28:26 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@e3b0050{/stages/stage,null,AVAILABLE,@Spark}
22/09/28 12:28:26 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1e50a487{/stages/stage/json,null,AVAILABLE,@Spark}
22/09/28 12:28:26 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@58d7e2db{/stages/pool,null,AVAILABLE,@Spark}
22/09/28 12:28:26 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@73151501{/stages/pool/json,null,AVAILABLE,@Spark}
22/09/28 12:28:26 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1a9ef059{/storage,null,AVAILABLE,@Spark}
22/09/28 12:28:26 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@7e3d6570{/storage/json,null,AVAILABLE,@Spark}
22/09/28 12:28:26 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@2e69b2c6{/storage/rdd,null,AVAILABLE,@Spark}
22/09/28 12:28:26 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@40447208{/storage/rdd/json,null,AVAILABLE,@Spark}
22/09/28 12:28:26 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@2431cfcb{/environment,null,AVAILABLE,@Spark}
22/09/28 12:28:26 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@2e719959{/environment/json,null,AVAILABLE,@Spark}
22/09/28 12:28:26 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@2ec050df{/executors,null,AVAILABLE,@Spark}
22/09/28 12:28:26 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@72d76b63{/executors/json,null,AVAILABLE,@Spark}
22/09/28 12:28:26 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@7670552a{/executors/threadDump,null,AVAILABLE,@Spark}
22/09/28 12:28:26 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1b0420c5{/executors/threadDump/json,null,AVAILABLE,@Spark}
22/09/28 12:28:26 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@60bcb746{/static,null,AVAILABLE,@Spark}
22/09/28 12:28:26 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@6aea04c3{/,null,AVAILABLE,@Spark}
22/09/28 12:28:26 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@476f8bfa{/api,null,AVAILABLE,@Spark}
22/09/28 12:28:26 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@652b8b55{/jobs/job/kill,null,AVAILABLE,@Spark}
22/09/28 12:28:26 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@4d0d77ae{/stages/stage/kill,null,AVAILABLE,@Spark}
22/09/28 12:28:26 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://[2600:1f18:1837:bf02:a556:ccd:86d7:a6c9]:4040
22/09/28 12:28:26 INFO SparkContext: Added JAR file:/tmp/spark-bc069368-d1ab-4d24-a4e3-f7a8634a3d52/uber-jars-1.0-SNAPSHOT.jar at spark://[2600:1f18:1837:bf02:a556:ccd:86d7:a6c9]:33303/jars/uber-jars-1.0-SNAPSHOT.jar with timestamp 1664368105728
22/09/28 12:28:26 INFO SparkContext: Added JAR file:///home/hadoop/.ivy2/jars/org.apache.hadoop_hadoop-aws-3.2.0.jar at spark://[2600:1f18:1837:bf02:a556:ccd:86d7:a6c9]:33303/jars/org.apache.hadoop_hadoop-aws-3.2.0.jar with timestamp 1664368105728
22/09/28 12:28:26 INFO SparkContext: Added JAR file:///home/hadoop/.ivy2/jars/com.amazonaws_aws-java-sdk-bundle-1.11.375.jar at spark://[2600:1f18:1837:bf02:a556:ccd:86d7:a6c9]:33303/jars/com.amazonaws_aws-java-sdk-bundle-1.11.375.jar with timestamp 1664368105728
22/09/28 12:28:26 INFO SparkContext: Added file file:/tmp/spark-bc069368-d1ab-4d24-a4e3-f7a8634a3d52/varname.zip at spark://[2600:1f18:1837:bf02:a556:ccd:86d7:a6c9]:33303/files/varname.zip with timestamp 1664368105728
22/09/28 12:28:26 INFO Utils: Copying /tmp/spark-bc069368-d1ab-4d24-a4e3-f7a8634a3d52/varname.zip to /tmp/spark-8a3a402d-55f0-4a4f-a4d1-ce318ac97655/userFiles-5a356bf3-38d4-424a-a72e-036ab107a80c/varname.zip
22/09/28 12:28:26 INFO SparkContext: Added file file:///home/hadoop/.ivy2/jars/org.apache.hadoop_hadoop-aws-3.2.0.jar at spark://[2600:1f18:1837:bf02:a556:ccd:86d7:a6c9]:33303/files/org.apache.hadoop_hadoop-aws-3.2.0.jar with timestamp 1664368105728
22/09/28 12:28:26 INFO Utils: Copying /home/hadoop/.ivy2/jars/org.apache.hadoop_hadoop-aws-3.2.0.jar to /tmp/spark-8a3a402d-55f0-4a4f-a4d1-ce318ac97655/userFiles-5a356bf3-38d4-424a-a72e-036ab107a80c/org.apache.hadoop_hadoop-aws-3.2.0.jar
22/09/28 12:28:26 INFO SparkContext: Added file file:///home/hadoop/.ivy2/jars/com.amazonaws_aws-java-sdk-bundle-1.11.375.jar at spark://[2600:1f18:1837:bf02:a556:ccd:86d7:a6c9]:33303/files/com.amazonaws_aws-java-sdk-bundle-1.11.375.jar with timestamp 1664368105728
22/09/28 12:28:26 INFO Utils: Copying /home/hadoop/.ivy2/jars/com.amazonaws_aws-java-sdk-bundle-1.11.375.jar to /tmp/spark-8a3a402d-55f0-4a4f-a4d1-ce318ac97655/userFiles-5a356bf3-38d4-424a-a72e-036ab107a80c/com.amazonaws_aws-java-sdk-bundle-1.11.375.jar
22/09/28 12:28:27 INFO Utils: Using initial executors = 3, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
22/09/28 12:28:27 INFO ExecutorContainerAllocator: Set total expected execs to {0=3}
22/09/28 12:28:27 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 34635.
22/09/28 12:28:27 INFO NettyBlockTransferService: Server created on [2600:1f18:1837:bf02:a556:ccd:86d7:a6c9]:34635
22/09/28 12:28:27 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
22/09/28 12:28:27 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, [2600:1f18:1837:bf02:a556:ccd:86d7:a6c9], 34635, None)
22/09/28 12:28:27 INFO BlockManagerMasterEndpoint: Registering block manager [2600:1f18:1837:bf02:a556:ccd:86d7:a6c9]:34635 with 7.3 GiB RAM, BlockManagerId(driver, [2600:1f18:1837:bf02:a556:ccd:86d7:a6c9], 34635, None)
22/09/28 12:28:27 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, [2600:1f18:1837:bf02:a556:ccd:86d7:a6c9], 34635, None)
22/09/28 12:28:27 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, [2600:1f18:1837:bf02:a556:ccd:86d7:a6c9], 34635, None)
22/09/28 12:28:27 INFO ExecutorContainerAllocator: Going to request 3 executors for ResourceProfile Id: 0, target: 3 already provisioned: 0.
22/09/28 12:28:27 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@74d28ce8{/metrics/json,null,AVAILABLE,@Spark}
22/09/28 12:28:27 INFO DefaultEmrServerlessRMClient: Creating containers with container role SPARK_EXECUTOR and keys: Set(1, 2, 3)
22/09/28 12:28:27 INFO SingleEventLogFileWriter: Logging events to file:/var/log/spark/apps/00f4ck9kasg9e001.inprogress
22/09/28 12:28:27 INFO Utils: Using initial executors = 3, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
22/09/28 12:28:27 WARN ExecutorAllocationManager: Dynamic allocation without a shuffle service is an experimental feature.
22/09/28 12:28:27 INFO ExecutorContainerAllocator: Set total expected execs to {0=3}
22/09/28 12:28:27 INFO DefaultEmrServerlessRMClient: Containers created with container role SPARK_EXECUTOR. key to container id map: Map(2 -> b6c1c208-d6ae-f116-456c-a70e62753a3e, 1 -> eec1c208-d6a4-a06f-416f-d3542eb67229, 3 -> 20c1c208-d6b9-a01b-3d7d-4e5d1ab9d5ee)
22/09/28 12:28:32 INFO EmrServerlessClusterSchedulerBackend$EmrServerlessDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (2600:1f18:1837:bf02:751c:4b79:c015:1299:36790) with ID 2, ResourceProfileId 0
22/09/28 12:28:32 INFO ExecutorMonitor: New executor 2 has registered (new total is 1)
22/09/28 12:28:32 INFO EmrServerlessClusterSchedulerBackend$EmrServerlessDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (2600:1f18:1837:bf02:5500:4064:5306:1a1b:54690) with ID 3, ResourceProfileId 0
22/09/28 12:28:32 INFO ExecutorMonitor: New executor 3 has registered (new total is 2)
22/09/28 12:28:33 INFO BlockManagerMasterEndpoint: Registering block manager [2600:1f18:1837:bf02:751c:4b79:c015:1299]:37079 with 7.9 GiB RAM, BlockManagerId(2, [2600:1f18:1837:bf02:751c:4b79:c015:1299], 37079, None)
22/09/28 12:28:33 INFO BlockManagerMasterEndpoint: Registering block manager [2600:1f18:1837:bf02:5500:4064:5306:1a1b]:40287 with 7.9 GiB RAM, BlockManagerId(3, [2600:1f18:1837:bf02:5500:4064:5306:1a1b], 40287, None)
22/09/28 12:28:57 INFO EmrServerlessClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000000000(ns)
22/09/28 12:28:57 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.
22/09/28 12:28:57 INFO SharedState: Warehouse path is 'file:/home/hadoop/spark-warehouse'.
22/09/28 12:28:57 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@7ab37b5f{/SQL,null,AVAILABLE,@Spark}
22/09/28 12:28:57 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@3223cfe1{/SQL/json,null,AVAILABLE,@Spark}
22/09/28 12:28:57 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@76ac5ba2{/SQL/execution,null,AVAILABLE,@Spark}
22/09/28 12:28:57 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@105afb2a{/SQL/execution/json,null,AVAILABLE,@Spark}
22/09/28 12:28:57 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@7877cc29{/static/sql,null,AVAILABLE,@Spark}
22/09/28 12:28:57 WARN SQLConf: The SQL config 'spark.sql.legacy.parquet.datetimeRebaseModeInWrite' has been deprecated in Spark v3.2 and may be removed in the future. Use 'spark.sql.parquet.datetimeRebaseModeInWrite' instead.
22/09/28 12:28:57 WARN SQLConf: The SQL config 'spark.sql.legacy.parquet.datetimeRebaseModeInWrite' has been deprecated in Spark v3.2 and may be removed in the future. Use 'spark.sql.parquet.datetimeRebaseModeInWrite' instead.
22/09/28 12:28:58 WARN SQLConf: The SQL config 'spark.sql.legacy.parquet.datetimeRebaseModeInWrite' has been deprecated in Spark v3.2 and may be removed in the future. Use 'spark.sql.parquet.datetimeRebaseModeInWrite' instead.
22/09/28 12:28:58 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
22/09/28 12:29:08 WARN package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.
22/09/28 12:29:11 WARN SQLConf: The SQL config 'spark.sql.legacy.parquet.datetimeRebaseModeInWrite' has been deprecated in Spark v3.2 and may be removed in the future. Use 'spark.sql.parquet.datetimeRebaseModeInWrite' instead.
22/09/28 12:29:12 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
22/09/28 12:29:12 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
22/09/28 12:29:16 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
22/09/28 12:29:16 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore [email protected]
22/09/28 12:29:16 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
22/09/28 12:29:17 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
22/09/28 12:36:26 WARN HeartbeatReceiver: Removing executor 26 with no recent heartbeats: 176265 ms exceeds timeout 120000 ms
22/09/28 12:39:26 WARN HeartbeatReceiver: Removing executor 3 with no recent heartbeats: 161128 ms exceeds timeout 120000 ms
22/09/28 12:39:26 ERROR TaskSchedulerImpl: Lost executor 3 on [2600:1f18:1837:bf02:5500:4064:5306:1a1b]: Executor heartbeat timed out after 161128 ms
22/09/28 12:39:26 WARN TaskSetManager: Lost task 0.0 in stage 26.0 (TID 1321) ([2600:1f18:1837:bf02:5500:4064:5306:1a1b] executor 3): ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 161128 ms
执行器 26 日志:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/spark/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
22/09/28 12:32:25 INFO CoarseGrainedExecutorBackend: Started daemon with process name: [email protected]
22/09/28 12:32:25 INFO SignalUtils: Registering signal handler for TERM
22/09/28 12:32:25 INFO SignalUtils: Registering signal handler for HUP
22/09/28 12:32:25 INFO SignalUtils: Registering signal handler for INT
22/09/28 12:32:25 INFO SecurityManager: Changing view acls to: hadoop
22/09/28 12:32:25 INFO SecurityManager: Changing modify acls to: hadoop
22/09/28 12:32:25 INFO SecurityManager: Changing view acls groups to:
22/09/28 12:32:25 INFO SecurityManager: Changing modify acls groups to:
22/09/28 12:32:25 INFO SecurityManager: SecurityManager: authentication enabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set()
22/09/28 12:32:26 INFO TransportClientFactory: Successfully created connection to /2600:1f18:1837:bf02:a556:ccd:86d7:a6c9:33303 after 134 ms (55 ms spent in bootstraps)
22/09/28 12:32:26 INFO SecurityManager: Changing view acls to: hadoop
22/09/28 12:32:26 INFO SecurityManager: Changing modify acls to: hadoop
22/09/28 12:32:26 INFO SecurityManager: Changing view acls groups to:
22/09/28 12:32:26 INFO SecurityManager: Changing modify acls groups to:
22/09/28 12:32:26 INFO SecurityManager: SecurityManager: authentication enabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set()
22/09/28 12:32:26 INFO TransportClientFactory: Successfully created connection to /2600:1f18:1837:bf02:a556:ccd:86d7:a6c9:33303 after 5 ms (3 ms spent in bootstraps)
22/09/28 12:32:26 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-37e26761-99b2-4b65-94b7-4df6bf9905ea
22/09/28 12:32:26 INFO MemoryStore: MemoryStore started with capacity 7.9 GiB
22/09/28 12:32:26 INFO SubResultCacheManager: Sub-result caches are disabled.
22/09/28 12:32:26 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@[2600:1f18:1837:bf02:a556:ccd:86d7:a6c9]:33303
22/09/28 12:32:26 INFO ResourceUtils: ==============================================================
22/09/28 12:32:26 INFO ResourceUtils: No custom resources configured for spark.executor.
22/09/28 12:32:26 INFO ResourceUtils: ==============================================================
22/09/28 12:32:26 INFO CoarseGrainedExecutorBackend: Successfully registered with driver
22/09/28 12:32:26 INFO Executor: Starting executor ID 26 on host [2600:1f18:1837:bf02:4600:e58e:ddf0:59df]
22/09/28 12:32:26 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43365.
22/09/28 12:32:26 INFO NettyBlockTransferService: Server created on [2600:1f18:1837:bf02:4600:e58e:ddf0:59df]:43365
22/09/28 12:32:26 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
22/09/28 12:32:26 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(26, [2600:1f18:1837:bf02:4600:e58e:ddf0:59df], 43365, None)
22/09/28 12:32:26 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(26, [2600:1f18:1837:bf02:4600:e58e:ddf0:59df], 43365, None)
22/09/28 12:32:26 INFO BlockManager: Initialized BlockManager: BlockManagerId(26, [2600:1f18:1837:bf02:4600:e58e:ddf0:59df], 43365, None)
22/09/28 12:32:26 INFO Executor: Fetching spark://[2600:1f18:1837:bf02:a556:ccd:86d7:a6c9]:33303/files/com.amazonaws_aws-java-sdk-bundle-1.11.375.jar with timestamp 1664368105728
22/09/28 12:32:26 INFO TransportClientFactory: Successfully created connection to /2600:1f18:1837:bf02:a556:ccd:86d7:a6c9:33303 after 5 ms (3 ms spent in bootstraps)
22/09/28 12:32:26 INFO Utils: Fetching spark://[2600:1f18:1837:bf02:a556:ccd:86d7:a6c9]:33303/files/com.amazonaws_aws-java-sdk-bundle-1.11.375.jar to /tmp/spark-91625202-5612-49eb-b355-3f637abe1934/fetchFileTemp1966826708488121221.tmp
22/09/28 12:32:27 INFO PlatformInfo: Unable to read clusterId from http://localhost:8321/configuration, trying extra instance data file: /var/lib/instance-controller/extraInstanceData.json
22/09/28 12:32:27 INFO PlatformInfo: Unable to read clusterId from /var/lib/instance-controller/extraInstanceData.json, trying EMR job-flow data file: /var/lib/info/job-flow.json
22/09/28 12:32:27 INFO PlatformInfo: Unable to read clusterId from /var/lib/info/job-flow.json, out of places to look
22/09/28 12:32:27 INFO DefaultAWSCredentialsProviderFactory: Unable to create provider using constructor: DefaultAWSCredentialsProviderChain(java.net.URI, org.apache.hadoop.conf.Configuration)
22/09/28 12:32:27 INFO ClientConfigurationFactory: Set initial getObject socket timeout to 2000 ms.
22/09/28 12:32:27 INFO CoarseGrainedExecutorBackend: eagerFSInit: Eagerly initialized FileSystem at s3://does/not/exist in 1165 ms
22/09/28 12:32:29 INFO Utils: Copying /tmp/spark-91625202-5612-49eb-b355-3f637abe1934/-1178519531664368105728_cache to /home/hadoop/./com.amazonaws_aws-java-sdk-bundle-1.11.375.jar
22/09/28 12:32:29 INFO Executor: Fetching spark://[2600:1f18:1837:bf02:a556:ccd:86d7:a6c9]:33303/files/varname.zip with timestamp 1664368105728
22/09/28 12:32:29 INFO Utils: Fetching spark://[2600:1f18:1837:bf02:a556:ccd:86d7:a6c9]:33303/files/varname.zip to /tmp/spark-91625202-5612-49eb-b355-3f637abe1934/fetchFileTemp6614333905641289896.tmp
22/09/28 12:32:29 INFO Utils: Copying /tmp/spark-91625202-5612-49eb-b355-3f637abe1934/-9167713411664368105728_cache to /home/hadoop/./varname.zip
22/09/28 12:32:29 INFO Executor: Fetching spark://[2600:1f18:1837:bf02:a556:ccd:86d7:a6c9]:33303/files/org.apache.hadoop_hadoop-aws-3.2.0.jar with timestamp 1664368105728
22/09/28 12:32:29 INFO Utils: Fetching spark://[2600:1f18:1837:bf02:a556:ccd:86d7:a6c9]:33303/files/org.apache.hadoop_hadoop-aws-3.2.0.jar to /tmp/spark-91625202-5612-49eb-b355-3f637abe1934/fetchFileTemp4499682470385493011.tmp
22/09/28 12:32:29 INFO Utils: Copying /tmp/spark-91625202-5612-49eb-b355-3f637abe1934/8736416361664368105728_cache to /home/hadoop/./org.apache.hadoop_hadoop-aws-3.2.0.jar
22/09/28 12:32:29 INFO Executor: Fetching spark://[2600:1f18:1837:bf02:a556:ccd:86d7:a6c9]:33303/jars/uber-jars-1.0-SNAPSHOT.jar with timestamp 1664368105728
22/09/28 12:32:29 INFO Utils: Fetching spark://[2600:1f18:1837:bf02:a556:ccd:86d7:a6c9]:33303/jars/uber-jars-1.0-SNAPSHOT.jar to /tmp/spark-91625202-5612-49eb-b355-3f637abe1934/fetchFileTemp8998725698456889956.tmp
22/09/28 12:32:32 INFO Utils: Copying /tmp/spark-91625202-5612-49eb-b355-3f637abe1934/-13557803421664368105728_cache to /home/hadoop/./uber-jars-1.0-SNAPSHOT.jar
22/09/28 12:32:32 INFO Executor: Adding file:/home/hadoop/./uber-jars-1.0-SNAPSHOT.jar to class loader
22/09/28 12:32:32 INFO Executor: Fetching spark://[2600:1f18:1837:bf02:a556:ccd:86d7:a6c9]:33303/jars/com.amazonaws_aws-java-sdk-bundle-1.11.375.jar with timestamp 1664368105728
22/09/28 12:32:32 INFO Utils: Fetching spark://[2600:1f18:1837:bf02:a556:ccd:86d7:a6c9]:33303/jars/com.amazonaws_aws-java-sdk-bundle-1.11.375.jar to /tmp/spark-91625202-5612-49eb-b355-3f637abe1934/fetchFileTemp2300668029112739372.tmp
22/09/28 12:32:35 INFO Utils: /tmp/spark-91625202-5612-49eb-b355-3f637abe1934/-1974999101664368105728_cache has been previously copied to /home/hadoop/./com.amazonaws_aws-java-sdk-bundle-1.11.375.jar
22/09/28 12:32:35 INFO Executor: Adding file:/home/hadoop/./com.amazonaws_aws-java-sdk-bundle-1.11.375.jar to class loader
22/09/28 12:32:35 INFO Executor: Fetching spark://[2600:1f18:1837:bf02:a556:ccd:86d7:a6c9]:33303/jars/org.apache.hadoop_hadoop-aws-3.2.0.jar with timestamp 1664368105728
22/09/28 12:32:35 INFO Utils: Fetching spark://[2600:1f18:1837:bf02:a556:ccd:86d7:a6c9]:33303/jars/org.apache.hadoop_hadoop-aws-3.2.0.jar to /tmp/spark-91625202-5612-49eb-b355-3f637abe1934/fetchFileTemp8882461784948851806.tmp
22/09/28 12:32:35 INFO Utils: /tmp/spark-91625202-5612-49eb-b355-3f637abe1934/14299127831664368105728_cache has been previously copied to /home/hadoop/./org.apache.hadoop_hadoop-aws-3.2.0.jar
22/09/28 12:32:35 INFO Executor: Adding file:/home/hadoop/./org.apache.hadoop_hadoop-aws-3.2.0.jar to class loader
22/09/28 12:33:37 ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
22/09/28 12:33:37 INFO MemoryStore: MemoryStore cleared
22/09/28 12:33:37 INFO BlockManager: BlockManager stopped
22/09/28 12:33:37 INFO ShutdownHookManager: Shutdown hook called
22/09/28 12:33:37 INFO ShutdownHookManager: Deleting directory /tmp/spark-91625202-5612-49eb-b355-3f637abe1934
2个回答
根据您提供的日志,您在创建 EMR 应用程序时没有配置执行器内存?
ERROR CoarseGrainedExecutorBackend:收到信号项
此错误仅表示您的执行器已被终止,但未提及背后的原因。由于您的日志显示您的执行器在数据提取期间被终止,而不是在数据提取或执行器初始化开始时被终止,因此我怀疑您的执行器是由于内存不足 (OOM) 而被终止的。尝试在创建 EMR 应用程序时增加执行器内存。另一方面,您可以检查作业中是否存在任何数据倾斜,因为它也可能触发 OOM。
Jonathan
2022-09-28
如果您在无服务器环境中运行(如果不使用 hudi,请跳过 hudi 位),请确保在 Spark 配置中设置以下内容,以实现更快、更混乱的处理:
"hoodie.index.type": "GLOBAL_BLOOM", # Ensure unique by record key
"hoodie.insert.shuffle.parallelism": "100",
"hoodie.upsert.shuffle.parallelism": "100",
"hoodie.delete.shuffle.parallelism": "100",
"hoodie.bulkinsert.shuffle.parallelism": "100",
"hoodie.memory.merge.max.size": "2004857600000"
并在 spark 提交参数中:
"sparkSubmit": {
"entryPoint": f"s3://{config['bucket_name']}/scripts/my_spark_script.py",
"entryPointArguments": [],
"sparkSubmitParameters": f"\
--conf spark.jars=/usr/lib/hudi/hudi-spark-bundle.jar \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--conf spark.hadoop.hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory \
--conf spark.emr-serverless.driver.disk=100G \
--conf spark.emr-serverless.executor.disk=100G \
--conf spark.executor.memory=26G \
--conf spark.driver.memory=26G"
}
Elena Manole
2023-06-20