Prashanth Babu

Ramblings on Hadoop Ecosytem, Java, etc.

DataStax Cassandra Online Courses

I recently attended few online courses of DataStax around Cassandra. This post is about my experiences and review of these courses.

Late in February, I came across this tweet in my timeline.

Tim Berglund ‏@tlberglund: When I first started at @datastax, I never dreamed we’d get here. But we did! All online training is now free: http://datastax.com/training

Out of curiosity I checked the link and was kinda shocked to see so many wonderful courses being offered online for free. I was just starting to work on Cassandra then and this was a great chance for me to learn more and hone Cassandra for my regular Big Data work and also for overall understanding of one of the best NoSQL databases.

As I am a Big Data and Hadoop Ecosystem guy, I am more interested in getting to know how to go around Cassandra, use Cassandra for data pipelines and of course installation, tuning and tweaking of Cassandra for throughput. So, I enrolled myself for the following 3 courses:

  1. DS201: Cassandra Core Concepts Skills and Tools.
  2. DS210: DataStax Enterprise Operations and Performance.
  3. DS320: DataStax Enterprise Analytics: Spark and Cassandra.

Below is a quick review of the objectives and my experiences of these courses I attended and a bit of feedback to DataStax Training team.

All the courses were online and were on WebEx, which really helps people to attend from anywhere without using any custom software or tools except for browser.

DS201: Cassandra Core Concepts Skills and Tools

  • Course details: DataStax course page.
  • Instructor: Ron Cohen.
  • Date: 02nd March to 05th March, 2015.
  • Duration: 3 hours per day.
  • Time: 10:00 AM to 01:00 PM GMT.

This course, as the name indicates is more for basics and concepts of Cassandra. The course material and the way Ron went about building the course was really great.

Training team sent the participants a link for the CassandraVM 2 days before the class. We were also provided a download link for the slides and workbooks of the course before the start of the course. Slides are the handouts of the course; while workbooks are the exercises which you need to do at the end of the day at your convenience on the CassandraVM.

This course rightly set the tone for the rest of the courses I wanted to attend. And the exercises helped me too.

Ron was very understanding and he spent quite a good amount of time clarifying a number of questions during the course.

DS210: DataStax Enterprise Operations and Performance.

  • Course details: DataStax course page.
  • Instructor: Kiyu Gabriel.
  • Date: 16th March to 19th March, 2015.
  • Duration: 3 hours per day for 4 days.
  • Time: 05:00 PM to 08:00 PM GMT.

Course details and objectives can be found for this course.

This course goes a bit in depth and talks a lot about tuning and tweaking the Cassandra installation for better performance. Talks extensively about how to tweak, what to tweak in Cassandra, replication factor and a lot more. Most fascinating topic was on the last day on “Understanding Performance Tuning”. The day before the training, I received the course decks and exercises related link.

Its unfortunate that I missed few of the sessions due to work conflicts and neither I could do the exercises. So, I am planning to attend this course again in the near future to cover the missed out sessions.

DS320: DataStax Enterprise Analytics: Spark and Cassandra

  • Course details: DataStax course page.
  • Instructor: Artem Chebotko.
  • Date: 23rd March to 26th March, 2015.
  • Duration: 3 hours per day for 4 days.
  • Time: 03:00 PM to 06:00 PM GMT.

We received the course decks and the exercises, few hours before the course started. And from then on, it was really a ride. The way the course is structured and the way Artem took the participants was really wonderful. This is, by far, probably the best online course I have ever attended.

At the end of first day, each course participant also received access to a decently configured 3-node DataStax Enterprise Cluster hosted on AWS EC2 for which DataStax would have paid a lot of money to Amazon. This was really a great thing since this cluser has DataStax Cassandra and also Spark installed and configured on it for our usage. And this was all for us for 4 days. Config of each of the machines was: 4 GB RAM, 30 GB HDD and 14.04 Ubuntu 64-bit Server Edition, which was very decent enough for all our data pipelines. These machines also came with a folder for the exercises and source code for the data generators and data on it.

Needless to say I tried all the course code content and also exercises for each of the days. I played quite a bit with the code snippets from the class. But I have had issues in getting my code right for exercises of Spark PairRDDs, Spark Streaming and Spark SQL. I somehow managed, as-in I tried following the solutions of these exercises and tried understanding the code. But I would have liked to spend more time to develop my own code for these exercises. Someday once understand how these work, I would like to revisit these exercises with a fresh pair of eyes.

Feedback to DataStax training team

After each of the courses, DataStax sends all the participants a survey from SurveyMonkey for their feedback loop and betterment of the courses.

Though all the courses were really good, there are few areas to improve upon for these courses to become perfect courses and for the participants to get more value from the courses and effort and money spent by DataStax.

First and foremost, I thought a lot of content is being crammed into 4 days of training [with 3 hours per day], which is not only difficult to cover; but also tough to digest. Adding to that, the participants usually have a lot of questions. So that too consumes a lot of time. I would think they might have to extend the course by a day and also make it 3.5 or 4 hours per day. That way instructor gets a bit more time to cover topics in depth instead of just skimming. After all, these courses are 3 days in person; effectively ~24 hours. So going by that metrics, probably extending the current 12 hour sessions to at least 15 or 20 hours for online would be really helpful.

My other gripe was about the course slide decks. The content is absolutely brilliant. But DataStax shares only the handouts, which do not help much since there are not many notes in the slides and also the content gets so small that it cant be read without zooming-in a lot. And even more bigger issue is, handouts do not allow copy + pasting the code contents for practicing either, considering the contents in the handouts gets saved as images and not as text. So in all the post-course surveys, I have given this feedback. Hopefully they will soon start sharing the complete PDFs of the course slide decks instead of just the handouts.

Also the sessions are not recorded. I missed out attending few of the days. As such there was no way for me to check back on the day. So it would have been helpful if we had access to the recorded sessions. I also feel it would be really great to refer to the course videos from time to time even otherwise, by going thru the recorded sessions.

That said, it was a great effort by the DataStax Training team and I absolutely loved all the courses and so too the course content, which I will be referring to, henceforth for all my Cassandra related work.

Thank you DataStax Training team! You guys are doing an amazing job.