Using Spark Scala APIs

  1. Create a SnappySession
    SnappySession extends the SparkSession so you can mutate data, get much higher performance, etc.

    scala> val snappy = new org.apache.spark.sql.SnappySession(spark.sparkContext)
    // Import Snappy extensions
    scala> import snappy.implicits._
    
  2. Create a dataset using the Spark APIs

    scala> val ds = Seq((1,"a"), (2, "b"), (3, "c")).toDS()
    
  3. Define a schema for the table

    scala>  import org.apache.spark.sql.types._
    scala>  val tableSchema = StructType(Array(StructField("CustKey", IntegerType, false),
              StructField("CustName", StringType, false)))
    
  4. Create a "column" table with a simple schema [String, Int] and default options
    For detailed option refer to the Row and Column Tables section.

    // Column tables manage data is columnar form and offer superior performance for analytic class queries.
    scala>  snappy.createTable(tableName = "colTable",
              provider = "column", // Create a SnappyData Column table
              schema = tableSchema,
              options = Map.empty[String, String], // Map for options
              allowExisting = false)
    

    SnappyData (SnappySession) extends SparkSession, so you can simply use all the Spark's APIs.

  5. Insert the created DataSet to the column table "colTable"

    scala>  ds.write.insertInto("colTable")
    // Check the total row count.
    scala>  snappy.table("colTable").count
    
  6. Create a row object using Spark's API and insert the row into the table
    Unlike Spark DataFrames SnappyData column tables are mutable. You can insert new rows to a column table.

    // Insert a new record
    scala>  import org.apache.spark.sql.Row
    scala>  snappy.insert("colTable", Row(10, "f"))
    // Check the total row count after inserting the row
    scala>  snappy.table("colTable").count
    
  7. Create a "row" table with a simple schema [String, Int] and default options
    For detailed option refer to the Row and Column Tables section.

    // Row formatted tables are better when datasets constantly change or access is selective (like based on a key)
    scala>  snappy.createTable(tableName = "rowTable",
    provider = "row",
    schema = tableSchema,
    options = Map.empty[String, String],
    allowExisting = false)
    
  8. Insert the created DataSet to the row table "rowTable"

    scala>  ds.write.insertInto("rowTable")
    // Check the row count
    scala>  snappy.table("rowTable").count
    
  9. Insert a new record

    scala>  snappy.insert("rowTable", Row(4, "d"))
    //Check the row count now
    scala>  snappy.table("rowTable").count
    
  10. Change some data in the row table

    // Updating a row for customer with custKey = 1
    scala>  snappy.update(tableName = "rowTable", filterExpr = "CUSTKEY=1",
                    newColumnValues = Row("d"), updateColumns = "CUSTNAME")
    
    scala>  snappy.table("rowTable").orderBy("CUSTKEY").show
    
    // Delete the row for customer with custKey = 1
    scala>  snappy.delete(tableName = "rowTable", filterExpr = "CUSTKEY=1")
    
    // Drop the existing tables
    scala>  snappy.dropTable("rowTable", ifExists = true)
    scala>  snappy.dropTable("colTable", ifExists = true)