I am using spark-csv to load data into a DataFrame. I want to do a simple query and display the content:
val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("my.csv")
df.registerTempTable("tasks")
results = sqlContext.sql("select col from tasks");
results.show()
The col seems truncated:
scala> results.show();
+--------------------+
| col|
+--------------------+
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-06 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:21:...|
|2015-11-16 07:21:...|
|2015-11-16 07:21:...|
+--------------------+
How do I show the full content of the column?
results.show(20, false)
will not truncate. Check the source
20
is the default number of rows displayed when show()
is called without any arguments.
If you put results.show(false)
, results will not be truncated
false
applies here, too.
results.show(20, False)
. The one you have mentioned will give error.
scala
both the options are valid. results.show(false)
and results.show(20, false)
Below code would help to view all rows without truncation in each column
df.show(df.count(), False)
df
to be collected twice?
df.count()
in order to save on the requirement to persist. For example, if the row count of df is 1000, you could do df.show(1000000, false)
and it will work. Tried the following and it worked: scala> println(df.count) res2: Long = 987 scala> df.show(990)
The other solutions are good. If these are your goals:
No truncation of columns, No loss of rows, Fast and Efficient
These two lines are useful ...
df.persist
df.show(df.count, false) // in Scala or 'False' in Python
By persisting, the 2 executor actions, count and show, are faster & more efficient when using persist
or cache
to maintain the interim underlying dataframe structure within the executors. See more about persist and cache.
results.show(20, False)
or results.show(20, false)
depending on whether you are running it on Java/Scala/Python
In Pyspark we can use
df.show(truncate=False) this will display the full content of the columns without truncation.
df.show(5,truncate=False) this will display the full content of the first five rows.
The following answer applies to a Spark Streaming application.
By setting the "truncate" option to false, you can tell the output sink to display the full column.
val query = out.writeStream
.outputMode(OutputMode.Update())
.format("console")
.option("truncate", false)
.trigger(Trigger.ProcessingTime("5 seconds"))
.start()
In c# Option("truncate", false)
does not truncate data in the output.
StreamingQuery query = spark
.Sql("SELECT * FROM Messages")
.WriteStream()
.OutputMode("append")
.Format("console")
.Option("truncate", false)
.Start();
results.show(false)
will show you the full column content.
Show method by default limit to 20, and adding a number before false
will show more rows.
Within Databricks you can visualize the dataframe in a tabular format. With the command:
display(results)
It will look like
https://i.stack.imgur.com/g0Svi.png
results.show(20,false)
did the trick for me in Scala.
Try df.show(20,False)
Notice that if you do not specify the number of rows you want to show, it will show 20 rows but will execute all your dataframe which will take more time !
try this command :
df.show(df.count())
df
to be collected twice?
Tried this in pyspark
df.show(truncate=0)
In Spark Pythonic way, remember:
if you have to display data from a dataframe, use show(truncate=False) method.
else if you have to display data from a Stream dataframe view (Structured Streaming), use the writeStream.format("console").option("truncate", False).start() methods with option.
Hope it could helps someone.
I use the plugin Chrome extension works pretty well:
[https://userstyles.org/styles/157357/jupyter-notebook-wide][1]
Try this in scala:
df.show(df.count.toInt, false)
The show method accepts an integer and a Boolean value but df.count returns Long...so type casting is required
PYSPARK
In the below code, df
is the name of dataframe. 1st parameter is to show all rows in the dataframe dynamically rather than hardcoding a numeric value. The 2nd parameter will take care of displaying full column contents since the value is set as False
.
df.show(df.count(),False)
https://i.stack.imgur.com/Lu6Dv.png
SCALA
In the below code, df
is the name of dataframe. 1st parameter is to show all rows in the dataframe dynamically rather than hardcoding a numeric value. The 2nd parameter will take care of displaying full column contents since the value is set as false
.
df.show(df.count().toInt,false)
https://i.stack.imgur.com/Bv9Kw.png
Success story sharing
dataFrame.writeStream.outputMode("append").format("console").option("truncate", "false").start()