I tried df.orderBy("col1").show(10)
but it sorted in ascending order. df.sort("col1").show(10)
also sorts in ascending order. I looked on stackoverflow and the answers I found were all outdated or referred to RDDs. I'd like to use the native dataframe in spark.
You can also sort the column by importing the spark sql functions
import org.apache.spark.sql.functions._
df.orderBy(asc("col1"))
Or
import org.apache.spark.sql.functions._
df.sort(desc("col1"))
importing sqlContext.implicits._
import sqlContext.implicits._
df.orderBy($"col1".desc)
Or
import sqlContext.implicits._
df.sort($"col1".desc)
It's in org.apache.spark.sql.DataFrame
for sort
method:
df.sort($"col1", $"col2".desc)
Note $
and .desc
inside sort
for the column to sort the results by.
import org.apache.spark.sql.functions._
and import sqlContext.implicits._
also get you a lot of nice functionality.
df.sort($"Time1", $"Time2".desc) SyntaxError: invalid syntax
at the $ symbol
PySpark only
I came across this post when looking to do the same in PySpark. The easiest way is to just add the parameter ascending=False:
df.orderBy("col1", ascending=False).show(10)
Reference: http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.orderBy
import org.apache.spark.sql.functions.desc
df.orderBy(desc("columnname1"),desc("columnname2"),asc("columnname3"))
df.sort($"ColumnName".desc).show()
In the case of Java:
If we use DataFrames
, while applying joins (here Inner join), we can sort (in ASC) after selecting distinct elements in each DF as:
Dataset<Row> d1 = e_data.distinct().join(s_data.distinct(), "e_id").orderBy("salary");
where e_id
is the column on which join is applied while sorted by salary in ASC.
Also, we can use Spark SQL as:
SQLContext sqlCtx = spark.sqlContext();
sqlCtx.sql("select * from global_temp.salary order by salary desc").show();
where
spark -> SparkSession
salary -> GlobalTemp View.
Success story sharing
asc
keyword is not necessary:..orderBy("col1", "col2")
.