ChatGPT解决这个技术问题 Extra ChatGPT

Is it possible to get the current spark context settings in PySpark?

I'm trying to get the path to spark.worker.dir for the current sparkcontext.

If I explicitly set it as a config param, I can read it back out of SparkConf, but is there anyway to access the complete config (including all defaults) using PySpark?

No - you can get the conf object but not the things you'd looking for. Defaults are not available through SparkConf (they're hardcoded in the sources). And spark.worker.dir sounds like a configuration for the Worker daemon, not something your app would see.
My answer directly addresses your question : please provide feedback
Landed here trying to find out the value for spark.default.parallelism. It is at sc.defaultParallelism. One can do dir(sc) in PySpark to see what's available in sc.

K
Kevad

Spark 2.1+

spark.sparkContext.getConf().getAll() where spark is your sparksession (gives you a dict with all configured settings)


@hhantyal no. When the question was asked there was no spark2.1. The top answer works for all versions of spark, especially old ones
for spark 2.4.0, it returns a list of tuples instead of a dict
@Kevad we are using a Spark 2.4, so can you please throw some light on the following code spark.sparkContext.getConf().getAll() spark - The SparkSession sparkContext - (As we already have the SparkSession from verion 2.0+ what does this sparkContext imply) Could you please help me get a deeper insight on this ?
returns tuples not dict
I don't thinks this statement also return all the hadoop configuration.
S
Sairam Krish

Yes: sc.getConf().getAll()

Which uses the method:

SparkConf.getAll()

as accessed by

SparkContext.sc.getConf()

But it does work:

In [4]: sc.getConf().getAll()
Out[4]:
[(u'spark.master', u'local'),
 (u'spark.rdd.compress', u'True'),
 (u'spark.serializer.objectStreamReset', u'100'),
 (u'spark.app.name', u'PySparkShell')]

also, note that the underscore means that the package developers think that accessing this data element isn't a great idea.
"Note that only values explicitly specified through spark-defaults.conf, SparkConf, or the command line will appear. For all other configuration properties, you can assume the default value is used." (see spark.apache.org/docs/latest/…)
@asmaier any idea how I can get these non-appearing ones to appear in python without having to go to a web page? E.g. how do I get the value of "spark.default.parallelism"?
error: variable _conf in class SparkContext cannot be accessed in org.apache.spark.SparkContext - that's what spark-shell answers in Spark 2.4. Has this variable gone private since the answer?
This answer was edited to use .getConf instead of ._conf, which makes the part about "Note the Underscore..." not make sense anymore.
e
ecesena

Spark 1.6+

sc.getConf.getAll.foreach(println)

1.6.3: >>> sc.getConf.getAll.foreach(println) AttributeError: 'SparkContext' object has no attribute 'getConf'
@dovka - I used the same sc.getConf.getAll.foreach(println) as suggested by @ecesena and it worked fine for me (in Scala) - Perhaps the syntax is not for Python?
Not in pyspark 1.6.0 as you can see here: spark.apache.org/docs/1.6.0/api/python/…
b
bob

update configuration in Spark 2.3.1

To change the default spark configurations you can follow these steps:

Import the required classes

from pyspark.conf import SparkConf
from pyspark.sql import SparkSession

Get the default configurations

spark.sparkContext._conf.getAll()

Update the default configurations

conf = spark.sparkContext._conf.setAll([('spark.executor.memory', '4g'), ('spark.app.name', 'Spark Updated Conf'), ('spark.executor.cores', '4'), ('spark.cores.max', '4'), ('spark.driver.memory','4g')])

Stop the current Spark Session

spark.sparkContext.stop()

Create a Spark Session

spark = SparkSession.builder.config(conf=conf).getOrCreate()

Hello Bob, I got a question about this. If you get the config via: spark.sparkContext._conf.getAll() How can you then use that result to update the config with including new settings. I think this would be a nice addition to your answer.
@PaulVelthuis: to include new settings you need to restart the spark context with your updated conf. its there in answer, after updating the conf, we stopped the context and started again with new conf.
a
asmaier

For a complete overview of your Spark environment and configuration I found the following code snippets useful:

SparkContext:

for item in sorted(sc._conf.getAll()): print(item)

Hadoop Configuration:

hadoopConf = {}
iterator = sc._jsc.hadoopConfiguration().iterator()
while iterator.hasNext():
    prop = iterator.next()
    hadoopConf[prop.getKey()] = prop.getValue()
for item in sorted(hadoopConf.items()): print(item)

Environment variables:

import os
for item in sorted(os.environ.items()): print(item)

D
David C.

Simply running

sc.getConf().getAll()

should give you a list with all settings.


D
DGrady

Unfortunately, no, the Spark platform as of version 2.3.1 does not provide any way to programmatically access the value of every property at run time. It provides several methods to access the values of properties that were explicitly set through a configuration file (like spark-defaults.conf), set through the SparkConf object when you created the session, or set through the command line when you submitted the job, but none of these methods will show the default value for a property that was not explicitly set. For completeness, the best options are:

The Spark application’s web UI, usually at http://:4040, has an “Environment” tab with a property value table.

The SparkContext keeps a hidden reference to its configuration in PySpark, and the configuration provides a getAll method: spark.sparkContext._conf.getAll().

Spark SQL provides the SET command that will return a table of property values: spark.sql("SET").toPandas(). You can also use SET -v to include a column with the property’s description.

(These three methods all return the same data on my cluster.)


x
xuanyue

For Spark 2+ you can also use when using scala

spark.conf.getAll; //spark as spark session 

M
Mehdi LAMRANI

You can use:

sc.sparkContext.getConf.getAll

For example, I often have the following at the top of my Spark programs:

logger.info(sc.sparkContext.getConf.getAll.mkString("\n"))

A
Aydin K.

Just for the records the analogous java version:

Tuple2<String, String> sc[] = sparkConf.getAll();
for (int i = 0; i < sc.length; i++) {
    System.out.println(sc[i]);
}

w
whisperstream

Not sure if you can get all the default settings easily, but specifically for the worker dir, it's quite straigt-forward:

from pyspark import SparkFiles
print SparkFiles.getRootDirectory()

S
Subash

Suppose I want to increase the driver memory in runtime using Spark Session:

s2 = SparkSession.builder.config("spark.driver.memory", "29g").getOrCreate()

Now I want to view the updated settings:

s2.conf.get("spark.driver.memory")

To get all the settings, you can make use of spark.sparkContext._conf.getAll()

https://i.stack.imgur.com/eDGx1.jpg

Hope this helps


C
Code run

If you want to see the configuration in data bricks use the below command

spark.sparkContext._conf.getAll()

A
Amir Maleki

I would suggest you try the method below in order to get the current spark context settings.

SparkConf.getAll()

as accessed by

SparkContext.sc._conf

Get the default configurations specifically for Spark 2.1+

spark.sparkContext.getConf().getAll() 

Stop the current Spark Session

spark.sparkContext.stop()

Create a Spark Session

spark = SparkSession.builder.config(conf=conf).getOrCreate()