What is Diyotta’s run-time architecture for data processing on Hadoop using Spark?

What is Diyotta’s run-time architecture for data processing on Hadoop using Spark?

Advanced User Asked on July 15, 2019 in Architecture.
Add Comment
1 Answer(s)

Diyotta’s spark run-time architecture for Hadoop platform is outlined below

  • Extraction of source data as a flat file in controller or agent file system
  • Transfer of file to HDFS through HDFS file system API commands
  • Obtain spark session with hive support enabled for the session
  • Get spark context using the spark session
  • Load source data in HDFS to Spark RDD
  • Apply row formatting with the schema supplied from Data objects in Diyotta and create data frame in SQL context
  • Register the data frame as spark temp table/view in SQL Context
  • Apply transformations, if any, through SQL on the temp table data and store transformed data in another spark temp table/view in SQL Context
  • Insert into hive target table selecting from SQL Context temp table
  • If target is HDFS, persist the transformed data in SQL context table to HDFS file

    RE: What is Diyotta’s run-time architecture for data processing on Hadoop using Spark?

Expert Answered on July 15, 2019.
Add Comment

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.