Using R for Big Data with Spark – Training DVD

Number of Videos: 2.5 hours – 20 lessons Ships on: DVD-ROM User Level: Intermediate Data analysts familiar with R will learn to leverage the power of Spark, distributed computing and cloud storage in this course that shows you how to use your R skills in a big data environment. You’ll learn to create Spark clusters

Number of Videos: 2.5 hours – 20 lessons
Ships on: DVD-ROM
User Level: Intermediate

Data analysts familiar with R will learn to leverage the power of Spark, distributed computing and cloud storage in this course that shows you how to use your R skills in a big data environment. You’ll learn to create Spark clusters on the Amazon Web Services (AWS) platform; perform cluster based data modeling using Gaussian generalized linear models, binomial generalized linear models, Naive Bayes, and K-means modeling; access data from S3 Spark DataFrames and other formats like CSV, Json, and HDFS; and do cluster based data manipulation operations with tools like SparkR and SparkSQL. By course end, you’ll be capable of working with massive data sets not possible on a single computer. This hands-on class requires each learner to set-up their own extremely low-cost, easily terminated AWS account.

Discover how to use your R skills in a big data distributed cloud computing cluster environment Gain hands-on experience setting up Spark clusters on Amazon’s AWS cloud services platform Understand how to control a cloud instance on AWS using SSH or PuTTY Explore basic distributed modeling techniques like GLM, Naive Bayes, and K-means Learn to do cloud based data manipulation and processing using SparkR and SparkSQL Understand how to access data from the CSV, Json, HDFS, and S3 formats Manuel Amunategui is a data science practitioner, consultant, teacher, and author with 16+ years of data science experience. A former quantitative analyst for a Wall Street brokerage firm, he now serves as the lead data scientist for Providence Health & Services in Portland, Oregon. In his free time, Manuel does competitive data modeling on Kaggle.com, CrowdANALYTIX.com, Datascience.net, and DrivenData.org.

Product Features

  • Learn Using R for Big Data with Spark from a professional trainer from your own desk.
  • Visual training method, offering users increased retention and accelerated learning
  • Breaks even the most complex applications down into simplistic steps.
  • Easy to follow step-by-step lessons, ideal for all
  • Comes with Extensive Working Files!