Introducing DataFrames in Spark for Large Scale Data Science !

Spark to introduce a new DataFrame API designed to make big data processing even easier for a wider audience.

As Spark continues to grow, we want to enable wider audiences beyond “Big Data” engineers to leverage the power of distributed processing. The new DataFrames API was created with this goal in mind. This API is inspired by data frames in R and Python (Pandas), but designed from the ground-up to support modern big data and data science applications. As an extension to the existing RDD API, DataFrames feature:

Ability to scale from kilobytes of data on a single laptop to petabytes on a large cluster
Support for a wide array of data formats and storage systems
State-of-the-art optimization and code generation through the Spark SQL Catalyst optimizer
Seamless integration with all big data tooling and infrastructure via Spark
APIs for Python, Java, Scala, and R (in development via SparkR)

https://databricks.com/blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-data-science.html