Planet Big Data logo

Planet Big Data is an aggregator of blogs about big data, Hadoop, and related topics. We include posts by bloggers worldwide. Email us to have your blog included.


June 22, 2018

Revolution Analytics

Because it's Friday: The lioness sleeps tonight

Handlers for the lion enclosure at San Diego Zoo have developed a novel way to provide stimulation for their big cats: let them play tug-of-war with people outside. People plural that is — it turns...

Big Data University

Read and Write CSV Files in Python Directly From the Cloud

Every data scientist I know spends a lot of time handling data that originates in CSV files. You can quickly end up with a mess of CSV files located in your Documents, Downloads, Desktop, and other random folders on your hard drive.

I greatly simplified my workflow the moment I started organizing all my CSV files in my Cloud account. Now I always know where my files are and I can read them directly from the Cloud using JupyterLab (the new Jupyter UI) or my Python scripts.

This article will teach you how to read your CSV files hosted on the Cloud in Python as well as how to write files to that same Cloud account.

I’ll use IBM Cloud Object Storage, an affordable, reliable, and secure Cloud storage solution. (Since I work at IBM, I’ll also let you in on a secret of how to get 10 Terabytes for a whole year, entirely for free.) This article will help you get started with IBM Cloud Object Storage and make the most of this offer. It is composed of three parts:

  1. How to use IBM Cloud Object Storage to store your files;
  2. Reading CSV files in Python from Object Storage;
  3. Writing CSV files to Object Storage (also in Python of course).

The best way to follow along with this article is to go through the accompanying Jupyter notebook either on Cognitive Class Labs (our free JupyterLab Cloud environment) or downloading the notebook from GitHub and running it yourself. If you opt for Cognitive Class Labs, once you sign in, you will able to select the IBM Cloud Object Storage Tutorial as shown in the image below.

IBM Cloud Object Storage Tutorial


What is Object Storage and why should you use it?

The “Storage” part of object storage is pretty straightforward, but what exactly is an object and why would you want to store one? An object is basically any conceivable data. It could be a text file, a song, or a picture. For the purposes of this tutorial, our objects will all be CSV files.

Unlike a typical filesystem (like the one used by the device you’re reading this article on) where files are grouped in hierarchies of directories/folders, object storage has a flat structure. All objects are stored in groups called buckets. This structure allows for better performance, massive scalability, and cost-effectiveness.

By the end of this article, you will know how to store your files on IBM Cloud Object Storage and easily access them using Python.


Provisioning an Object Storage Instance on IBM Cloud

Visit the IBM Cloud Catalog and search for “object storage”. Click the Object Storage option that pops up. Here you’ll be able to choose your pricing plan. Feel free to use the Lite plan, which is free and allows you to store up to 25 GB per month.

Object Storage on the IBM Cloud

Sign up (it’s free) or log in with your IBM Cloud account, and then click the Create button to provision your Object Storage instance. You can customize the Service Name if you wish, or just leave it as the default. You can also leave the resource group to the default. Resource groups are useful to organize your resources on IBM Cloud, particularly when you have many of them running.

Creating an Object Storage instance

Working with Buckets

Since you just created the instance, you’ll now be presented with options to create a bucket. You can always find your Object Storage instance by selecting it from your IBM Cloud Dashboard.

There’s a limit of 100 buckets per Object Storage instance, but each bucket can hold billions of objects. In practice, how many buckets you need will be dictated by your availability and resilience needs.

For the purposes of this tutorial, a single bucket will do just fine.

Creating your First Bucket

Click the Create Bucket button and you’ll be shown a window like the one below, where you can customize some details of your Bucket. All these options may seem overwhelming at the moment, but don’t worry, we’ll explain them in a moment. They are part of what makes this service so customizable, should you have the need later on.

Creating an Object Storage bucket

If you don’t care about the nuances of bucket configuration, you can type in any unique name you like and press the Create button, leaving all other options to their defaults. You can then skip to the Putting Objects in Buckets section below. If you would like to learn about what these options mean, read on.

Configuring your bucket

Resiliency Options

Resiliency OptionDescription
Cross RegionYour data is stored across three geographic regions within your selected locationHigh availability and very high durability
RegionalYour data is stored across three different data centers within a single geographic regionHigh availability and durability, very low latency for regional users
Single Data CenterYour data is stored across multiple devices within a single data centerData locality

Storage Class Options

Frequency of Data AccessIBM Cloud Object Storage Class
Weekly or monthlyVault
Less than once a monthCold Vault

Feel free to experiment with different configurations, but I recommend choosing “Standard” for your storage class for this tutorial’s purposes. Any resilience option will do.


Putting Objects in Buckets

After you’ve created your bucket, store the name of the bucket into the Python variable below (replace cc-tutorial with the name of your bucket) either in your Jupyter notebook or a Python script.

There are many ways to add objects to your bucket, but we’ll start with something simple. Add a CSV file of your choice to your newly created bucket, either by clicking the Add objects button, or dragging and dropping your CSV file into the IBM Cloud window.

If you don’t have an interesting CSV file handy, I recommend downloading FiveThirtyEight’s 2018 World Cup predictions.

Whatever CSV file you decide to add to your bucket, assign the name of the file to the variable filename below so that you can easily refer to it later.

We’ve placed our first object in our first bucket, now let’s see how we can access it. To access your IBM Cloud Object Storage instance from anywhere other than the web interface, you will need to create credentials. Click the New credential button under the Service credentials section to get started.

In the next window, you can leave all fields as their defaults and click the Add button to continue. You’ll now be able to click on View credentials to obtain the JSON object containing the credentials you just created. You’ll want to store everything you see in a credentials variable like the one below (obviously, replace the placeholder values with your own).

Note: If you’re following along within a notebook be careful not to share this notebook after adding your credentials!

Reading CSV files from Object Storage using Python

The recommended way to access IBM Cloud Object Storage with Python is to use the ibm_boto3 library, which we’ll import below.

The primary way to interact with IBM Cloud Object Storage through ibm_boto3 is by using an ibm_boto3.resource object. This resource-based interface abstracts away the low-level REST interface between you and your Object Storage instance.

Run the cell below to create a resource Python object using the IBM Cloud Object Storage credentials you filled in above.

After creating a resource object, we can easily access any of our Cloud objects by specifying a bucket name and a key (in our case the key is a filename) to our resource.Object method and calling the get method on the result. In order to get the object into a useful format, we’ll do some processing to turn it into a pandas dataframe.


We’ll make this into a function so we can easily use it later:

Adding files to IBM Cloud Object Storage with Python

IBM Cloud Object Storage’s web interface makes it easy to add new objects to your buckets, but at some point you will probably want to handle creating objects through Python programmatically. The put_object method allows you to do this.

In order to use it you will need:

  1. The name of the bucket you want to add the object to;
  2. A unique name (Key) for the new object;
  3. bytes-like object, which you can get from:
    • urllib‘s request.urlopen(...).read() method, e.g.
    • Python’s built-in open method in binary mode, e.g.
      open('myfile.csv', 'rb')

To demonstrate, let’s add another CSV file to our bucket. This time we’ll use FiveThirtyEight’s airline safety dataset.

You can now easily access your newly created object using the function we defined above in the Reading from Object Storage using Python section.

Get 10 Terabytes of IBM Cloud Object Storage for free

You now know how to read from and write to IBM Cloud Object Storage using Python! Well done. The ability to pragmatically read and write files to the Cloud will be quite handy when working from scripts and Jupyter notebooks.

If you build applications or do data science, we also have a great offer for you. You can apply to become an IBM Partner at no cost to you and receive 10 Terabytes of space to play and build applications with.

You can sign up by simply filling the embedded form below. If you are unable to fill the form, you can click here to open the form in a new window.

Just make sure that you apply with a business email (even your own domain name if you are a freelancer) as free email accounts like Gmail, Hotmail, and Yahoo are automatically rejected.

Revolution Analytics

A guide to working with character data in R

R is primarily a language for working with numbers, but we often need to work with text as well. Whether it's formatting text for reports, or analyzing natural language data, R provides a number of...


June 21, 2018


DataOps: New Term, Similar Concepts

Componentization, containers and the cloud have all converged to usher in a new era focused on “Ops.” It started with DevOps, which according to Wikipedia is defined as: “a software engineering...


Revolution Analytics

AI, Machine Learning and Data Science Roundup: June 2018

A monthly roundup of news about Artificial Intelligence, Machine Learning and Data Science. This is an eclectic collection of interesting blog posts, software announcements and data applications I've...


June 20, 2018

Revolution Analytics

PYPL Language Rankings: Python ranks #1, R at #7 in popularity

The new PYPL Popularity of Programming Languages (June 2018) index ranks Python at #1 and R at #7. Like the similar TIOBE language index, the PYPL index uses Google search activity to rank language...


Curt Monash

Brittleness, Murphy’s Law, and single-impetus failures

In my initial post on brittleness I suggested that a typical process is: Build something brittle. Strengthen it over time. In many engineering scenarios, a fuller description could be: Design...


Curt Monash

Brittleness and incremental improvement

Every system — computer or otherwise — needs to deal with possibilities of damage or error. If it does this well, it may be regarded as “robust”, “mature(d),...

Making Data Meaningful

MicroStrategy – Have you looked at THEM lately?

If you haven’t done so lately, it’s time to take another look at MicroStrategy.  They have done a great job in updating their offerings to match what is currently hot in the marketplace.  They couple eloquently form and function together to please customers and technicians.  You are sure to find interesting and thought provoking content within their product offerings.  And you might even find things that have personal application as well as business application.

Here are the latest focus points in their strategy:

MicroStrategy Cloud Intelligence

One of the pillars MicroStrategy is focusing on is Cloud Intelligence.  The details about the structure and function of the cloud are easy to understand.  It is a seamless fit into the Microstrategy BI environment.  After an initial perusal of the offering, it is easy to see the power, flexibility, and security of cloud computing, and how one is able to utilize it to drive their business decisions and adoption within organizations.  MicroStrategy has gone to great lengths to highlight the advantages, the steps necessary to setup and deploy, and as a result benefit from the MicroStrategy Cloud.  I believe this is a special niche that those who can visualize how to ramp up business intelligence projects without a lot of the normal overhead of software/hardware procurement as part of projects.  One is left to focus on what they do best, and leverage an optimized platform as part of the overall deliverable.

MicroStrategy Mobile Intelligence

Microstrategy has bet the business to emphasize the importance of mobile intelligence.  They believe that it will overtake the traditional web browser based intelligence that is prevalent today.  MicroStrategy focuses on educating the business and developer community about the value of the MicroStrategy Mobile platform.  It is easy to gain access to learn how to use the MicroStrategy platform to design, build out, maintain, support, and customize visually enticing apps for multiple output devices (iPad, iPhone), while leveraging the enterprise-caliber features of the MicroStrategy BI platform. This is achieved by implementing the metadata layer that governs all content.  Highlighting such functionality, it helps to show that MicroStrategy is clearly the market leader within this pillar and there were customer stories to back up this claim.

MicroStrategy Social Intelligence

This is the most unusually interesting pillar due to the cutting edge nature of it.  MicroStrategy Social Intelligence solutionsis designed for both commercial customers and the consumer in the marketplace.  MicroStrategy has built a bridge between the two that is compelling and an opportunity for those that have the courage to leverage it.  MicroStrategy latest offerings that enable in-depth analysis of the Facebook fan base.  They also focused on how to apply the research in the wealth of information available at Facebook to deliver very effective marketing campaigns, which basically makes the older style CRM systems obsolete.  MicroStrategy walked through the steps and their products that help make this happen, which turn the promise of social media content into real business opportunities.  Once engaged, a loyal fan-base turns into revenue… and companies that understand this and who those customers are, they will achieve a competitive advantage.

MicroStrategy Big Data

Big Data is here now.  MicroStrategy has methods and technology that help clients deal with the extreme data volumes.  The point was made that companies need to have the ability to use very large databases and data sets to make intelligent business decisions to drive growth and gain competitive advantages.  Often interesting information is lurking in the details and MicroStrategy provides a method to make sense of it.  MicroStrategy also offer features such as improved self-service that reduces the reliance on IT, when it comes to navigating the business intelligence architecture.  There is even the possibility to connect MicroStrategy to Hadoop and begin to analyze web logs in a very easy to consume fashion.  In addition, MicroStrategy focused on high performance across the entire platform to eliminate latency issues and meet performance goals.

The time is now to take a fresh look at Microstrategy.  They are a big time player in the tools space to enable Business Intelligence.  You won’t regret it.

The post MicroStrategy – Have you looked at THEM lately? appeared first on Making Data Meaningful.


June 19, 2018

Revolution Analytics

In case you missed it: May 2018 roundup

In case you missed them, here are some articles from May of particular interest to R users. The R Consortium has announced a new round of grants for projects proposed by the R community. A look back...


June 18, 2018

Solaimurugan V.

Top list of Artificial Intelligence in Indian Agriculture - research ideas

#AIforAgri body{ margin: 0 ; padding: 0; } .font2{ font-size: 13px; #font-family:arial,sans-serif #font-family:"Comic Sans MS"; } .round{ width:8%; height: 6%; display: inline-block; border-radius: 50%; } .font3{ font-size: 15px; font-family: monospace; #text-shadow: 1px 1px 1px #3D4C4C; } #

June 16, 2018

Simplified Analytics

Digital Transformation in Recruitment

A few years ago, the impact of digitization was only established in top industries like Banking, Insurance, and Retail. Now times have changed – the recruitment industry is also adopting digital...


June 15, 2018

Revolution Analytics

Because it's Friday: Olive Garden Bot

Comedy writer Keaton Patti claims this commercial script for a US Italian restaurant chain was generated by a bot: I forced a bot to watch over 1,000 hours of Olive Garden commercials and then asked...


Revolution Analytics

Interpreting machine learning models with the lime package for R

Many types of machine learning classifiers, not least commonly-used techniques like ensemble models and neural networks, are notoriously difficult to interpret. If the model produces a surprising...

Solaimurugan V.

Big Data / Data Analytics Jobs

(adsbygoogle = window.adsbygoogle || []).push({ google_ad_client: "ca-pub-2788927527603741", enable_page_level_ads: true }); Data Analytics job in UK. Big data analytics Jobs body{ margin: 0 ; padding: 0; } .red{color:red;} .green{color:green;word-wrap: break-word;} //media screen having max width of 500px @media only screen and (max-width:500px){ } @media screen

June 14, 2018

Revolution Analytics

Detecting unconscious bias in models, with R

There's growing awareness that the data we collect, and in particular the variables we include as factors in our predictive models, can lead to unwanted bias in outcomes: from loan applications, to...


June 13, 2018

Revolution Analytics

Hotfix for Microsoft R Open 3.5.0 on Linux

On Monday, we learned about a serious issue with the installer for Microsoft R Open on Linux-based systems. (Thanks to Norbert Preining for reporting the problem.) The issue was that the installation...



Regulatory Compliance as a Strategic Weapon – conclusion

Last week we looked at The Ever Evolving Regulatory Environment and the need for a regulatory compliance architecture. If you didn’t see the first blog post on the topic, read it here. Part 4:...


June 12, 2018

InData Labs

6 Data Collection Rules for Your Future Perfect Machine Learning Dataset

Modern companies produce gigantic amounts of data. Later it becomes a part of their machine learning datasets. Those are further used to build models that aim to solve various problems business may face, and make it more profitable, customer-oriented and, of course, data-driven. Machine Learning depends heavily on data, that makes algorithm training possible.  Regardless...

Запись 6 Data Collection Rules for Your Future Perfect Machine Learning Dataset впервые появилась InData Labs.

Ronald van Loon

Everything Data Scientists Should Know About Organizing Data Lakes

Learn how to turn data lakes into organized & manageable data because businesses need the means to obtain real value from their data lakes. And discover how data lakes fit into the ecosystem of your organization as well as establish end to end data management practices supporting data & analytics innovation. Join Ronald van Loon and Anand Narayanan with SimpliLearn for a Fireside Chat on June 14th, 8:00 AM PDT or 17.00 CET.

Register for the webinar here:


Ronald helps data driven companies generating business value with best of breed solutions and a hands-on approach. He has been recognized as one of the top 10 global influencers by DataConomy for predictive analytics, and by Klout for Data Science, Big Data, Business Intelligence and Data Mining and is guest author on leading Big Data sites, is speaker/chairman/panel member on national and international webinars and events and runs a successful series of webinar on Big Data and on Digital Transformation. He has been active in the data (process) management domain for more than 18 years, has founded multiple companies and is now director at a Data Consultancy company, leader in Big Data & data process management solutions. Broad interest in big data, data science, predictive analytics, business intelligence, customer experience and data mining. Feel free to connect on Twitter or LinkedIn to stay up to date on success stories.

More Posts - Website

Follow Me:

Author information

Ronald helps data driven companies generating business value with best of breed solutions and a hands-on approach. He has been recognized as one of the top 10 global influencers by DataConomy for predictive analytics, and by Klout for Data Science, Big Data, Business Intelligence and Data Mining and is guest author on leading Big Data sites, is speaker/chairman/panel member on national and international webinars and events and runs a successful series of webinar on Big Data and on Digital Transformation. He has been active in the data (process) management domain for more than 18 years, has founded multiple companies and is now director at a Data Consultancy company, leader in Big Data & data process management solutions. Broad interest in big data, data science, predictive analytics, business intelligence, customer experience and data mining. Feel free to connect on Twitter or LinkedIn to stay up to date on success stories.

The post Everything Data Scientists Should Know About Organizing Data Lakes appeared first on Ronald van Loons.


June 08, 2018

Revolution Analytics

Because it's Friday: Sealese

If you've ever wondered what the animals are saying in those cute animal videos, fear not: YouTube has now apparently added captions for animal vocalizations, which makes everything clear: That's all...


Revolution Analytics

Microsoft R Open 3.5.0 now available

Microsoft R Open 3.5.0 is now available for download for Windows, Mac and Linux. This update includes the open-source R 3.5.0 engine, which is a major update with many new capabilities and...



Announces SoftBase Db2 Tools (SDT) Availability

SoftBase, a Fresche Solutions brand, is proud to announce the general availability of the SoftBase Db2 Tools suite 1.1.0 (SDT110). The SoftBase Db2 Tools suite allows developers, DBAs, and...


June 07, 2018


Regulatory Compliance as a Strategic Weapon

Guest post by Philip Gilligan, Sand Hill East LLP We have seen an extraordinary array of regulations in the last 8 to 10 years, globally as well as specifically in the US and EU. Starting...


June 06, 2018

Revolution Analytics

What's new in Azure for Machine Learning and AI

There were a lot of big announcements at last month's Build conference, and many of them were related to machine learning and artificial intelligence. With my colleague Tim Heuer, we summarized some...

Making Data Meaningful

The Management Revolution Infographic

Management Revolution


During a recent stroll through the Internet catching up on industry news and trends, I stumbled upon the infographic shown above.  That image presents a great deal of information, but the piece that stood out to me was the definition near the top:

Management – The organization and coordination of the activities of a business in order to achieve defined objectives.

It’s those last two words that stood out to me:  “defined objectives”.  What does that mean?  As a company you might have a defined objective to “be the preferred provider of [enter your product or service offering here] in the nation.”  If so, what does that mean?  Does it mean that everyone buys your product?  Does it mean that at least half of the people with a need for your product choose you?  This is where the Business Intelligence (BI) Analyst in me takes over.  If you are going to set an objective or a goal for your organization you need a number.  You need a metric.  You need a deadline.  Otherwise, how do you know whether or not you have been successful?

In the infographic’s comparison of Traditional Management Style versus New Management Style there are multiple items that refer to the changing role of the employee.  Employees today want and expect to be treated as integral to the success of an enterprise.  To that end, it is critical that the goals of the organization be shared with everyone.  It is extremely difficult to reach a destination if you don’t have everyone steering in the same direction.

At LÛCRUM, clear goals around financials and customer satisfaction, along with a timeline to achieve them, have been developed.  These metrics have been shared with the entire organization.  In fact, each employee has been provided with a small card that concisely states these goals along with our company’s purpose, values, and vision.  Everyone is encouraged to carry this card with them everywhere they go as a reminder.  Once the specified date has been reached we, as an organization, will have no question as to whether or not we met our goal.

What are your organization’s “defined objectives?”  Do you have measurable goals to determine your success?  If so, are they closely guarded secrets of the management or is everyone invested in achieving them?  Can you approach any individual at your company and reasonably expect them to be able to provide you with key success measures of the organization?

The post The Management Revolution Infographic appeared first on Making Data Meaningful.

Making Data Meaningful

Data Vault: Hubs, Links, and Satellites With Associated Loading Patterns

So my business sponsors and senior architects have decided to build a data vault. We have already recognized and considered the benefits of changing course for our enterprise. We spent a lot of time considering the business benefits that a different approach to business intelligence would provide. Some of these business related benefits that we identified are:

    • Supports functional areas of business
    • Integrates business keys that cross functional areas
    • Deep historical tracking of information as it changes over time
    • Need to load 100% of the data 100% of the time
    • Conceptual and logical models of the business are natural representations in data vault (DV)  structure

We also took the time to evaluate potential technical benefits. Some of the most important benefits to us were the following:

    • Apply business rules on the way out to data marts
    • Run ETL processes in parallel
    • Flexible and adaptable to change in business requirements over time
    • Auditable back to the source system
    • Compliance
    • Supports agile development approach
    • Simple ETL load patterns allow for code generation

Now that my company has made the decision to move forward with this new Data Vault Methodology approach to Business Intelligence, where do I begin? Well, let’s start with the basic building blocks of a data vault. A data vault can be as simple as a hub and a satellite, but in practice, there is generally a lot of each type.

Remember: a Hub is a collection of business keys. A link tracks the relationship between hubs, or potentially with other relationships (links). A satellite is the time sensitive collection of attributes related to either an only one hub or link.

Here is a sample data model with the end in mind. Notice the Hubs, Links, and Satellites are all here and are appropriately related to each other.

data vault model

So lets dig a little deeper into the purpose of each and how to model and load them effectively.


Hubs are the containers for business keys. They are the most important facets of the data vault methodology. The more successfully one is able to identify business keys the less refining of the model will follow. Business keys can be identified using a multitude of strategies. Sometimes it is from interviewing business users, sometimes it is from reviewing data models (primary keys or unique keys), sometimes it is from metadata systems that have identified key important information, as well as other areas.

The basic structure and treatment of the Hub table is as follows:

Mandatory Columns

    • Hub Sequence Identifier (generally a number generated from a database)
    • “Business Key” Value (generally a string to handle any data type)
    • Load Date (generally a date and time)
    • Record Source (generally a string)

Loading Pattern

    • Select Distinct list of business Keys
    • Add timestamp and record source
    • Insert into Hub if the value does not exist

Code Sample





 FROM stage.Customer stg


 FROM dv.H_Customer dv




Links stores the intersection of business keys (HUBS). Links can be considered the glue that holds the data vault model together. These tables allow for the data model to elegantly change over time because they can come and go as required by the business. Links also allow for the model to be created quickly without worry about whether the relationship is one to many or many to many. In addition, the flexible nature of link tables provides the option to add or drop link tables as requirements change throughout the maintenance lifecycle of the data warehouse or as part of a data mining exercise.

The basic structure and treatment of the Link table is as follows:

Mandatory Columns

    • Link Sequence Identifier (a database number)
    • Load Date and Time (generally a date field)
    • Record Source  (generally a string)
    • At least two Sequence Identifiers (either from Hubs or other Links and are numbers)

Loading Pattern

    • Select Distinct list of business Key combinations from source
    • Add timestamp and record source
    • Lookup data vault identifier from either Hub or Link
    • Insert into Link if the value does not exist

Code Sample
















    FROM stage.”Order” src

) stg


 FROM DV.L_Customer_Order dv





Satellites add all the color and description to the business keys (hubs) and relationships (links) in the data vault environment.  Satellites contain all the descriptive information, tracking change by start and end dates over time, to let one know the information in effect at any point in time.  In the purest sense, satellites are time aware and therefore tracks change over time as its main function.  Satellites are always directly related and are subordinate to a hub or a link. They provide context and definition to business key(s).  A satellite record is added when a change is detected in the processing.  In some cases, there may be multiple satellites pointing to one hub or one link.   The reasons for doing this could be multiple sources, or rate of change, or by data type.

The basic structure and treatment of the Link table is as follows:

Mandatory Columns

    • Hub or Link Sequence Identifier
    • Load Date
    • Load Date End
    • Record Source

Optional Columns

    • Attributes (may be only one, but usually a lot more strings, numbers, or dates)

Loading Pattern

    • Select list of attributes from the source
    • Add timestamp and record source
    • Compare to the existing applicable set of satellite records and insert when a change has been detected
    • Lookup and use the applicable Hub identifier or the Link identifier

Note: a two-step process is generally employed when using a Load End Date to set the time effective properly for satellites

Code Sample






 FROM DV.H_Product dv






 FROM stage.[Order] src

) stg









With this quick overview of the basics of the data vault model, I hope you can see the simplicity in the design as well as the pattern based loading process.  As you can see, whether you have 1 on 10 hubs or links, they should all look structurally similar as well as load in a similar fashion.  This drives down overall development and support costs when the Enterprise Data Warehouse is supported by a data vault.  Also, designers and developers that are new to the concepts generally can be up and productive in short order.

So if you are…

  • Currently engaging a data warehouse environment that is becoming harder and harder to support and maintain over time
  • Needing to address performance problems
  • Hoping to get your data governance problems addressed
  • Wanting more of a rapid and agile development process
  • Concerned about the current ETL processes having become rigid and difficult to support
  • Suffering from the lack of Business Rules maintenance and management
  • Embarking on a new Business Intelligence endeavor and would like to increase likelihood of success

…then the data vault methodology may be the answer for you.

Along with the loading patterns and models outlined here, there are many other benefits to applying this architecture and process to your Business Intelligence needs.

The post Data Vault: Hubs, Links, and Satellites With Associated Loading Patterns appeared first on Making Data Meaningful.

Making Data Meaningful

Which is Better: Faster or Slower?

I must admit I do enjoy Beck Bennett’s series of commercials for AT&T where he poses the question, “Which is better: faster or slower?”  I find his deadpan approach to a variety of co-actors and situations very humorous. The question “Which is better: faster or slower?” has interesting application in today’s information and analytics environment. Faster has always been better, correct? The scenario holds true in every industry. If you can make better decisions at a faster pace than your competitor or adversary, then you will always hold an advantage over them. However, the key isn’t just faster, but better decisions faster!

An interesting event occurred few years ago that made the point that faster is not always better. A short-lived Twitter hoax briefly erased $200 billion of value from the US Stock Market. False reports of explosions in the White House triggered a set of algorithms monitoring news feeds into a two-minute selling spree. In this case, untethered analytics only increased the pace at which we can make mistakes and caused the DOW to drop 145 points. The error was quickly identified and the DOW bounced back, but who knows what losses were incurred by algorithms reacting to the news feed and potentially to other algorithms reacting to those algorithms.

I am fortunate to be in the information and analytics industry and am continuously astounded by the algorithms and analytics that I see people put together. However, this event continues to remind me that even the best algorithms need good data and solid IT development principles such as building in a failsafe. Perhaps we need to teach these algorithms to check their sources before taking action.

The post Which is Better: Faster or Slower? appeared first on Making Data Meaningful.

Making Data Meaningful

Testing a BI Application

In order to deliver a high quality application, testing is a necessary component of the deliverable portfolio. Often this step is overlooked, underappreciated, or worse, rushed and hurried to meet a deadline. The best solution would be to integrate testing throughout the development process.

The way to approach testing a Business Intelligence (BI) system is to get the business to have ownership and buy-in early and often. The business users should be writing test cases and be responsible for executing them from a business perspective as well as trains them on the content in the system. The technical people should be ready to assist with query development or whatever to help the testing to be completed.

There should be some validation that is part of the design of the Extract, Transform, and Load (ETL) process itself. Some of this is to make sure mechanically things happened as they should and that there are appropriate logs when they don’t. In addition, the ETL developers should perform some kind of UFI (Unit Function) testing prior to moving to a TEST environment as well as a code review or peer review. Depending on the complexity of the ETL process, one generally don’t test each component of the process due to the details involved, but focus more on the net result of the test (i.e., all rows were inserted with no errors and all columns contain values – what happened in between is not as important to test because the load was successful).

In addition, the technician should take the next step of developing quality controls that make sure what was in the final table structures is what was expected. For example, have a report from the Operational Data Store (ODS) area that groups and sums some business keys with some key metrics and compared them to the results from the new implementation area, while highlighting only variances. This should be sent to a data governance team every morning. As long as it was clean, the BI team is sure mechanically things were working pretty well.

Depending on some of the business rules implemented, one may need to have reports that highlight “Unknown” values and other things that need to be dealt with by the business. Some of these scenarios should become test cases. The business should be using the Ad-hoc environment to be writing reports and queries to test the results. Ultimately, these reports should be reviewed by the data stewards as part of the data governance initiative.

For the most thorough results as well as the highest quality BI environment, everywhere there was a business rule implemented, there should be a test case that verifies the rule was implemented correctly. Depending on the volume and complexity, one may need to prioritize them and tackle the most important ones first.

For the documentation, it can be as simple as keeping a spreadsheet with the following items:
• Test Case #
• Test Case Description
• Tester
• Date Tested
• Expected Outcome
• Actual Outcome
• Pass/Fail

It is critical, as originally stated, to get the business users involved in the testing of the deliverables. There have been cases where the business thought what they were using to compare balances with was correct, but were eventually convinced the BI application was correct and they had a broken business process instead. This is most difficult because it is on a case by case basis, but this usually becomes the biggest challenge and hurdle that needs to be overcome to be perceived as successful. Ultimately the business must provide you with the information to know whether “the values put into the Data Warehouse or BI dashboard are correct”. And you are completely dependent on the business rules they gave you are correct (a lot of times they aren’t in version 1)…and even more risky if there is no data governance process in place.

A word of caution, if you don’t get the business buy-in on testing, they will certainly blame you when things aren’t correct in production (especially if that happens for things that were overlooked in testing). It is wise to have a step where the business had to sign off on testing and that they were comfortable with what was moving into production and that was very helpful when issues arose. Because they were involved in the process and it was not mostly IT doing the testing, finger pointing was kept to a minimum. In addition, shared success and teamwork was fostered bridging the gap between business users and Information Technology (IT) groups that sadly exists in a lot of organizations.

The post Testing a BI Application appeared first on Making Data Meaningful.

Making Data Meaningful

Mining Data Vault Loading Views

I first heard about Data Vault (DV) in late 2011. As I learned about DV, I quickly realized its benefits in the realm of Data Warehousing (DW), however it was not until early 2012, after I had the opportunity to attend Dan Linsted’s 3-day DV “Boot camp” class, when I realized it’s true power. DV is flexible, scalable, auditable pattern-based data warehousing methodology that by design solves a myriad of “traditional” DW issues. More information on DV is available here as well as on

In a recent Data Vault (DV) implementation, we utilized views to encapsulate the logic for the core DV loading. The main benefit of this implementation was that the processing shifted from the Extract Transform Load (ETL) tool / engine to the Database (DB) engine. In addition, it simplified the ETL jobs design and development. Another benefit I “discovered” was that, when coupled with DB Metadata schema (and providing that DB & ETL naming conventions were utilized consistently), the DV Load View implementation enables extracting a great deal of useful data about the DV relations, dependencies, Source-To-Target (STT) mapping, the DV model, and even the related ETL jobs.

In the DV implementation described above, the DB engine was ORACLE, therefore I used ORACLE’s Metadata objects to extract the following data elements:

  • DV Load View Name
  • Staging Table Name (source)
  • DV Table Name (target)
  • Lookup Table Name(s)
  • ETL Job Name (derived)
  • DV Load View SQL

My initial goal was to be able to generate STT mapping documentation / specification to give to the ETL developers and later on to pass on as part of the Knowledge Transfer (KT) documentation to the Production Support team. Once I started looking at the data my query returned, I realized that there are even more applications for this data that I initially thought of. To name a few:

  • Source-To-Target Mapping
  • Finding Dependent Tables for a given Load View
  • Validating DB Naming Conventions were properly used
  • Validating Lookup Tables were properly referenced
  • Finding which ETL Job(s) loads a given DV Target Table
  • Finding which DV Target Tables are loaded by a given Source (Staging Table)

Having such query built can be beneficial when joining in a new project where little or no documentation about the DV implementation is available. With few small tweaks, the same logic can be used to “reverse engineer” the whole DV model’s STT Mapping. Another application for this query would be to assist ETL code generation (in conjunction with ETL code templates) as it already has the majority of the ETL job related data elements (source name, target name, job name etv) needed to generate a new job.

The post Mining Data Vault Loading Views appeared first on Making Data Meaningful.


June 05, 2018

Solaimurugan V.

#AIforAll - AI in India : Use case & scenario explained by NITI AAYOG

#AIforALL body{ margin: 0 ; padding: 0; } .font2{ font-size: 13px; #font-family:arial,sans-serif #font-family:"Comic Sans MS"; } .round{ width:8%; height: 6%; display: inline-block; border-radius: 50%; } .font3{ font-size: 15px; font-family: monospace; #text-shadow: 1px 1px 1px #3D4C4C; } #

June 04, 2018

Forrester Blogs

When Piloting Co-location, Measure Customer Impact

When Piloting Co-location, Measure Impact To Location, Customers, and Operations It’s been interesting to see creative real estate co-location ideas such as the recently announced pilot for Aldi in...

Solaimurugan V.

#aiforall National Strategy for Artificial Intelligence from NITI AAYOG

#AIforALL body{ margin: 0 ; padding: 0; } .font2{ font-size: 13px; #font-family:arial,sans-serif #font-family:"Comic Sans MS"; } .font3{ font-size: 15px; font-family: monospace; #text-shadow: 1px 1px 1px #3D4C4C; } #main_container { width:64vw; margin: 70px auto; padding: 0; } p{ font-size:16px; font-family:Arial, sans-serif; font-weight:400px; padding:2px; margin-left:

Forrester Blogs

Risk Tech, Reg Tech — All The 2018 Tech

The Forrester Tech Tide™: Risk And Compliance Management, Q2 2018 We recently published our Tech Tide™ report outlining 14 key risk and compliance technologies to track in 2018. One of the...


June 03, 2018

Making Data Meaningful

When is BI not BI?


Google Trends shows the term “Business Intelligence”, as a web headline topic, has declined since 2004. In the past few years it has been surpassed by the term “Big Data”.

Business Analytics” is emerging as the term some industry thought leaders, such as Gartner and IDC, are using as the catch-all term for software solutions that use data analysis to guide business decisions.

Despite the essential inclusiveness of all three terms, there is no shortage of discussion on the differences among these and a number of other contenders. Are the old terms so limited that they cannot contain the huge new advances in the field? Or have there been too many disappointments with attempts to deliver “Business Intelligence,” that we need new, exciting, and “untainted” terms.

It is important that we do not get distracted by new umbrella terms that cover the same mission, the same systems, and the same activities.  It is like arguing over whether a Prius is an automobile or a car. The important thing is that there are exciting new technologies that can be applied to achieve the objectives of Analytics, Business Analytics, Business Intelligence or Big Data.

It really does not matter which term is used. Let’s face it, When is BI not BI?  If a term refers to ways of making data meaningful and profitable, it’s all BI.

The post When is BI not BI? appeared first on Making Data Meaningful.

Forrester Blogs

Telecom Operators Deliver Insights Services — With Network And Subscriber Data

A few years ago, I met with an incumbent telecom provider in Europe, and they came across as being from “the old country.” I asked about their data strategy, and they were horrified at the thought of...


June 01, 2018

Revolution Analytics

StatCheck the Game

If you don't get enough joy from publishing scientific papers in your day job, or simply want to experience what it's like to be in a publish-or-perish environment where the P-value is the only...


Forrester Blogs

First Forrester New Wave™ On ABM Platforms Sets The Bar For A Dynamic And Growing Market

Well, that was interesting! We just published our “The Forrester New Wave™: ABM Platforms, Q2 2018” report, which profiles the 14 most significant vendors in this market: 6sense,...

Making Data Meaningful

What is Hadoop?

what is hadoop

Let’s start with a little quiz:

Hadoop is

a)     Twitter shortcut for “I HAD it, but OOPS I lost it”?

b)     The latest dance song craze (Macarena, Gangnam, Hadoop)?

c)      A stuffed toy elephant?

d)     A software solution for distributed computing of large datasets?

The correct answers are actually c) and d).  You see, Hadoop is a software solution developed as part of the Apache project sponsored by the Apache Software Foundation, and it was named after a stuffed elephant owned by the son of the framework’s founder, Doug Cutting.

But what exactly is Hadoop and how does it work?

Per the Apache website, “Apache Hadoop is a framework for running applications on large cluster built of commodity hardware.” This open source software framework enables the developer/user to manage large amounts of data (Big Data) using a distributed file system.  The power of Hadoop lies in its ability to leverage distributed clusters of computing hardware.  It does this by leveraging two key technologies.

The first is the Hadoop Distributed File System (HDFS); a distributed, scalable, and portable file system.  It is written in Java specifically for the Hadoop framework.  A key component of HDFS is the name-node.  This is a single server that tracks all the other nodes in the distributed client/server cluster.  In other words, the name-node is the directory of who all the distributed clients are and which files each contains.  As clients and files are added to the cluster, commands update the links to these new nodes in the name-node.


The second key technology leveraged within a Hadoop implementation is that of MapReduce.  MapReduce is a programming model for processing large datasets.  It works by enabling a master node (the node assigned the processing request) to break apart the work request into smaller sub-tasks, and send the sub-tasks out to worker nodes.  This is the “Map” aspect as the master node is mapping out the workload to the worker nodes.  As each worker node completes the assigned sub-task, it ships the results back to the master node.  The master node then takes all the worker node results and combines them into one result set; thereby completing the assigned request.  This is the “Reduce” aspect.


It is important to note that for very large or complex requests, worker nodes can also MapReduce their assigned tasks into smaller sub-tasks for their worker nodes.  You could refer to this as Big Data outsourcing.  As each node determines another node is better equipped to handle a portion of an assigned request, it relegates the work to a more efficient worker node, while retaining responsibility for getting the completed assignment back to the master node.




Hadoop Wiki

To learn more about Hadoop solutions contact us today.

The post What is Hadoop? appeared first on Making Data Meaningful.


May 31, 2018

Revolution Analytics

New round of R Consortium grants announced

The R Consortium has just announced its latest round of project grants. After reviewing the proposals submitted by the R community, the Infrastructure Steering Committee has elected to fund the...


Forrester Blogs

How USAA Differentiates In Mobile Banking

Take another not-so-random walk with me, this time as my colleague Peter Wannemacher and I explore what makes USAA a standout performer in our Mobile Banking Benchmark assessment. As we did before...


Forrester Blogs

The US Federal Government Still Ranks Near The Bottom Of Forrester’s Customer Experience Index

The White House requires federal agencies to “provide a modern, streamlined, and responsive customer experience across government, comparable to leading private sector organizations.” Unfortunately,...


Forrester Blogs

Just Released: Top Retail Tech Investments For 2018 — Overview

In today’s increasingly crowded digital commerce market and with increasingly tighter budgets, retailers have to make smart choices about which technologies to invest in and which they should...


May 30, 2018

Forrester Blogs

Why Looking For The Perfect CX Metric Is Futile — And How To Try Anyway

What is the right top-line CX metric? I lead the CX measurement research at Forrester and get this question a lot. Usually clients ask whether Net Promoter Score (NPS)* is best, or whether customer...


Forrester Blogs

Customer Experience Q&A: Etsy’s Abby Covert, Information Architect

If you care about CX, you have to admire Etsy. It was the top digital retailer in Forrester’s Customer Experience Index™ in 2017. It excels at differentiating itself with a unique set of products and...



Healthcare and Artificial Intelligence: Saving Lives and Costs

Last week we looked at how the healthcare industry (and vendors) would enjoy financial advantages of using Artificial Intelligence (AI), computer vision, IoT and more. There’s no doubt AI is changing...


Revolution Analytics

Because it's Friday: Buildings shake

In 1978, a 59-story skyscraper in New York City was at risk of collapse. An engineering flaw, serendipitously discovered by an architecture PhD candidate studying the Citigroup Center as a thesis...


May 29, 2018

Forrester Blogs

Can CMOs Trust Consultancies For Programmatic Media?

Last week, Accenture Interactive launched a new media service called Accenture Interactive Programmatic Services. The new service will include programmatic consulting services for in-housing media,...


Forrester Blogs

Platform Economy Myth #2: There Are Only 2 Or 3 Platform Business Models

We’ve been analyzing the sacred myths of the platform economy and revealing the real practices that platform businesses have mastered. This is work from our report, “Earn Your Place In The Platform Economy.” Myth #2: There are only 2 or 3 platform business models. Reality: There are as many platform business models as there <abbr title="U+000b">�</abbr>are actual business […]

May 28, 2018

Ronald van Loon

Seamless Customer Experience for Telecoms: A Practical Approach

In this age of data and convenience, customers across the globe are getting used to great customer experience from numerous companies. Big names such as Google, Apple, Amazon, and many others lead the way when it comes to ensuring a seamless customer experience. While these names lead the front, Telcos lag behind when it comes to their perception of great customer experience.

In consideration of the fact that Telcos lag behind when it comes to their perception of great customer experience, I recently talked to Thomas Kinnman from Ericsson. Both of us discussed important factors related to customer experience in the eyes of Telcos, and what should be done in this regard.

Telcos Lagging Perception of Great CX

There are numerous negative customer experiences that often go unnoticed by Telcos. Telcos fail to deliver action at the right time, and often end up losing the customer value that they would have wanted to provide. It is necessary for Telcos to understand what constitutes a negative experience and what should be done to cater to their customers. It is also extremely important that Telcos understand that not all customers have the same expectations, and that, indeed, expectations can widely vary from person to person.

While talking this over, Thomas Kinnman mentioned the rise in silent churners. “Silent churners” is a term used to define people leaving the Telco without complaining, and Kinnman was quick to identify that this specific group is at a general increase.

Reason for Negative Customer Experience

There are numerous reasons for negative customer experience when it comes to a Telco. Some of these reasons are:

  • A complex network can often be hard to understand for Telcos, which is why they aren’t able to look for plausible reasons for a minor alteration such as a dropped call.
  • Many legacy networks in place that need to be updated.
  • Limited understanding of the end user experience and the behavior patterns of the end users.
  • These behavior patterns are often tricky to understand.

While negative customer experiences can be gauged through different metrics and numbers, it is a bit difficult to know the actual cause of a negative experience. The complications that arise while Telcos look for the causes behind a negative customer experience are:

  • Too many OSS and BSS systems spread across the organization that work in silos
  • Lacking the skills and time to correlate the customer data with network data in real-time.
  • Creating big data lakes integrating different data sets and start to build their own algorithms and use-cases, when they have limited time, experience and data to play with.
  • 85 percent of all data lakes are used for innovation, and not for running hardened use-cases 24×7 and certainly not use-cases that creates actionable insights to preempt negative experiences.
  • Customer grievances are hard to scale with respect to the data related to them.

Considering the complications that can arise while understanding the causes behind negative customer experiences, it is necessary that Telcos recognize what is required during this hour.

What is required for Telcos

What Telcos basically need to do right now is to understand the customer journey and all the intricate details involved in that process. These intricate details include the purchasing, billing, service usage and updating of network services. All these touchpoints are extremely necessary for operators to understand and harness for their own good.

Kinnman pointed out some facts about how that could help Telcos stuck in a possible conundrum, achieve better customer insights. Instead of Telcos trying to gather data and understand the insights from it, they can use pre-readymade algorithms crafted through the expertise of thousands of domain experts and data scientists working with the firm.

These data models and algorithms can work with any data set to provide actionable insights, to link network data with consumer data, and to find the root causes behind any negative experience. The prepackaged data algorithms are fed with reference data to help them give insights to different stakeholders across the operators, e.g. network planning, operations, customer care, marketing etc. The data correlation engine between data insights and customer problems can help solve problems on the go.

Understanding the Data Correlation Engine

The data correlation engine takes customer events in real-time and detects any experience problems and ensures plausible action to resolve the problem. For example, if there is a dropped call, the engine has a real-time correlated call record including the possible reasons behind that dropped call, and what could have been done to avoid that. Once the problem has emerged, the network data will set out to find out the major cause behind the dropped call. The cause could either be a network problem, device problem, subscription problem or OTT problem – something outside the operator’s control.

Once the possible causes have been outlined, the data correlation engine and the dynamic rules engine will outline the most probable causes as well as provide insights into why this cause was the reason behind the problem. Once the most probable cause has been outlined, the system also suggests a next best action so that the Telco can start to put an end to these causes.

With the most probable causes and next best actions outlined, the operations team can fix problems before they impact customers while the customer care representatives could contact already impacted customers with the reason behind the problem. For example, if the drop in call quality occurred because of a radio coverage problem, then the call representative should outline that the operators are looking into bettering the customer experience in the specific region through network upgrades. The issue should then be forwarded to the network planning team for further contemplation.

Moving to Automation

Telcos eventually have to move towards better automation, so that repetitive work is handled in a more efficient manner. After automation has been implemented, all people within a Telco should be made to focus on work that is of a higher value addition. It is necessary that the work people do and the processes in place change for the better, because that is how proper automation would be achieved. Rather than having the people think that automation is a threat, make them think that the machines will do the boring, repetitive work, while they’ll be the ones adding value, or in simpler terms putting in the human element.

Ericsson is putting a lot of research and development efforts into the area of Automation, including knowledge management and machine intelligence that enable automation. In order to make people trust a system to do the work for them you need to train the system properly and validate the system insights, most probable causes and the recommended actions that are to be automated.

Source Ericsson: 6 phases of automation maturity


Results of Transformation

Kinnman talked about the results of the transformation to automation, and how it will work for them. To put more weight to what he said, Kinnman referred to the very transformation that they had within Ericsson. The internal transformation at Ericsson has increased the presence of the company all around the world. The company has transformed its operations into more service centric operations, to benefit the end consumer.

With brilliant success in their own transformation, Ericsson also helps transform numerous Telcos around the world to become more service and customer centric. Below you find some example results from different transformations:

  • 60% improvement in average handing time (AHT) and first call resolution (FCR) observed in operations and customer care
  • 25% reduction in network detractors over a three-year period
  • 20% increase in upsell/cross- sell rate thanks to better understanding of customer satisfaction, behavior and needs

If you want to learn how to practically transform your telecom service organization into a customer centric organization, then you should join the webinar being hosted by Ericsson. You can ask Thomas Kinnman questions, and can provide insights of your own as well.


Ronald helps data driven companies generating business value with best of breed solutions and a hands-on approach. He has been recognized as one of the top 10 global influencers by DataConomy for predictive analytics, and by Klout for Data Science, Big Data, Business Intelligence and Data Mining and is guest author on leading Big Data sites, is speaker/chairman/panel member on national and international webinars and events and runs a successful series of webinar on Big Data and on Digital Transformation. He has been active in the data (process) management domain for more than 18 years, has founded multiple companies and is now director at a Data Consultancy company, leader in Big Data & data process management solutions. Broad interest in big data, data science, predictive analytics, business intelligence, customer experience and data mining. Feel free to connect on Twitter or LinkedIn to stay up to date on success stories.

More Posts - Website

Follow Me:

Author information

Ronald helps data driven companies generating business value with best of breed solutions and a hands-on approach. He has been recognized as one of the top 10 global influencers by DataConomy for predictive analytics, and by Klout for Data Science, Big Data, Business Intelligence and Data Mining and is guest author on leading Big Data sites, is speaker/chairman/panel member on national and international webinars and events and runs a successful series of webinar on Big Data and on Digital Transformation. He has been active in the data (process) management domain for more than 18 years, has founded multiple companies and is now director at a Data Consultancy company, leader in Big Data & data process management solutions. Broad interest in big data, data science, predictive analytics, business intelligence, customer experience and data mining. Feel free to connect on Twitter or LinkedIn to stay up to date on success stories.

The post Seamless Customer Experience for Telecoms: A Practical Approach appeared first on Ronald van Loons.


May 25, 2018

Revolution Analytics

Because it's Friday: Bad road

Sometimes I think the potholes in the roads in Chicago are bad, but then a road like this puts things into perspective: (Thanks to TH for the link.) Don't miss the shots looking back near the end to...


Revolution Analytics

Reflections on the ROpenSci Unconference

I had an amazing time this week participating in the 2018 ROpenSci Unconference, the sixth annual ROpenSci hackathon bringing together people to advance the tools and community for scientific...


May 24, 2018


AMPLYFI- Data and Beyond

Amplyfi is one of BrightPlanet’s Data-as-a-Service partners leveraging large-scale, open-source data from the Surface Web and Deep Web to build business intelligence for its clients. Our business and technology relationship has spanned years, and we are excited to see its market growth as a leader in artificial intelligence. NatWest Business Hub Article On May 10, […] The post AMPLYFI- Data and Beyond appeared first on BrightPlanet.

Read more »

May 23, 2018


AI, Analytics and IoT, Oh My! What big growth potential you have!

The healthcare industry is expected to undergo tremendous change and growth due to the overwhelming amount of data available today to help organizations make better, more informed and timely...


May 22, 2018

Revolution Analytics

AI, Machine Learning and Data Science Roundup: May 2018

A monthly roundup of news about Artificial Intelligence, Machine Learning and Data Science. This is an eclectic collection of interesting blog posts, software announcements and data applications I've...


May 21, 2018

Revolution Analytics

Video: speeding up R with parallel programming in the cloud

I had a great time in Budapest last week for the eRum 2018 conference. The organizers have already made all of the videos available online. Here's my presentation: Speeding up R with Parallel...