Planet Big Data logo

Planet Big Data is an aggregator of blogs about big data, Hadoop, and related topics. We include posts by bloggers worldwide. Email us to have your blog included.


January 18, 2019

Forrester Blogs

My Top NRF 2019 Takeaways (AKA, News Vs. Nonsense)

I say this as both an analyst and a recovering retail professional: NRF is like drinking Red Bull . . . from a 50′ beer bong . . . for three days straight . . . on Jupiter. You just know...


Forrester Blogs

Gillette’s Biggest Mistake — And How To Fix It

This video gives a 2-minute summary of the biggest mistake Gillette has made with the “We Believe” campaign and what the brand can do now to fix it. If you want more comprehensive...


Forrester Blogs

Netflix Has Become HBO

In a 2013 article in GQ, Netflix Chief Content Officer Ted Sarandos laid out the then-nascent streaming company’s challenge: “The goal is to become HBO faster than HBO can become...


Forrester Blogs

The Future Of Enterprise Computing: It’s Time To Develop A Long-Term Vision Of Devices And Apps

Computing has changed drastically over the past decade. In our report, “The Future Of Enterprise Computing,” we discuss these major changes and examine how computing will evolve in the...


Forrester Blogs

NRF 2019: It’s A Visionary Retail Revival

NRF 2019 attracted more than 800 exhibitors that demonstrated to more than 37,000 attendees. If last year’s theme was (Alexa’s) voice, this year’s show was all about computer vision driving:...


Forrester Blogs

Gillette’s Path Forward With “We Believe”: Own The Controversy

In Part 1 of this post, I criticized Gillette for its statement, “We weren’t trying to court controversy. We were just trying to upgrade the selling line that we’ve held for 30 years,” as lacking...


Forrester Blogs

A 21st-Century Refresh Revitalizes Sales Performance Management Solutions

Gold-Standard Products Get Updated For The 21st Century What’s old is new again. Corduroy, fanny packs, and overalls are all the rage. Star Wars, Jurassic Park, and Jumanji are at the top of the box...


Forrester Blogs

Keep Experimenting To Apply The Best Digital Technology For Your Store Environment

Retailers are slowly moving beyond omnichannel fulfillment capabilities in stores to invest in digital store technologies that empower store associates, improve customers’ engagement, and enhance...


Forrester Blogs

Luxury Retail 2018: What A Difference A Year Can Make

Luxury brands have gone from relative inertia over digital strategies to a flurry of activity, including technology upgrades, partnerships, and even acquisitions. Back in the summer of 2017, we...


Forrester Blogs

The Watershed Of Digital-First Consumer Behavior

Zoom back 10 or even five years ago, and consumers would tell Forrester, “Yes, I use digital channels, but more often than not, I discover, research, or purchase a product offline.” You...


January 17, 2019

Forrester Blogs

It’s Time To Mainstream ABM

When account-based marketing (ABM) burst onto the scene in 2015, it was positioned by a number of vendors as the death of traditional marketing. And while that controversial approach might have been...


Forrester Blogs

Gillette Just Admitted That It Has No Values

I’ve held off blogging about Gillette’s “We Believe” campaign, as there are plenty of pundits praising and trashing its strategy, execution, and use of the term “toxic...


Forrester Blogs

Facebook: The Myth Of The Monopoly

After a tumultuous 2018, Facebook’s fate looks bleak. While the characteristic hope of the New Year lingers, you might look at the glimmers of positive news around Facebook with optimism: Advertisers...


Forrester Blogs

Gillette’s Close Shave: Its Latest Ad Is A Masterful Emotional Play Let Down By Its Execution

Brands are jumping into the messy arena of polarizing issues with greater fervor — and Gillette just raised the stakes. After 30 years of sitting on the sidelines, Gillette is passing its razor like...


Forrester Blogs

Forrester Online Survey: Organize To Deliver And Manage Application Transformation

Every company faces a central dilemma: how to resolve the standardization and differentiation of its business apps to win, serve, and retain customers. That’s why we are still getting a lot of...

Ronald van Loon

3 Ways How AI Will Augment the Human Workforce

The question in the AI market is no longer about whether AI can affect the workplace and the human workforce. Instead, the raging curiosity in the market revolves around a series of interlinked questions: When will the AI Wave happen? Will robots replace the whole human workforce? What would the end result look like?

The answers to these questions? Well, AI is happening right now in front of our eyes. The solutions this technology provides vary from self-driven complex processes to the insights and recommendations you get while scrolling through social media. With the impact already visible, one can definitely say that AI will be part of the workforce, but not replace it.

The common fear that robots will soon take over and steal countless jobs is an example of the concept of singularity. While we don’t blame humans for thinking this way, because they have been forced to believe so much in these non-existing changes through the media, the merger of AI and the workplace will be based more on the concept of multiplicity then singularity. Workplaces with multiplicity incorporate a diverse group of humans and robots together to increase efficiency and achieve results that neither of these diverse groups could have achieved working alone.

Multiplicity is therefore the single most important concept that perfectly explains the entry of Artificial Intelligence to the market. The growing interest in Multiplicity by business owners is justified, as it enables humans and machines to innovate and collaborate together to solve their problems.

Tata Communications has worked on the concept of Multiplicity, and recently revealed a study titled ‘AI will diversify human thinking and not replace it.’ As a proud partner of Tata Communications, I have been granted access to this information and have summarised key points from the study below.

Key Findings in the Study

The recent study by Tata Communications is based on specific and thoughtful input from over 120 business leaders. Business leaders surveyed as part of this report are projecting  the potential impact of AI in the workplace and are currently using it for numerous purposes. In a transition from the dystopian views currently held by a few loud voices, this report breaks the pattern and introduces a new set of opportunities relating to the merger of technology and humans in the workplace. In short, AI can diversify the way humans think, rather than replace it.

Almost 90 percent of the business leaders surveyed as part of the study believed that cognitive diversity in the workplace is extremely important for running a successful organization. Managers in the contemporary workplace want employees to think differently and experiment with their typified ways of problem solving. While expecting such cognitive diversity was a bit difficult in the past, the role AI can play in the workforce means that organizations can expect greater rewards in the future. AI mechanisms will help augment human efforts in the workplace and stimulate cognitive diversification that benefits the organization.

The study also revealed that 75 percent of respondents expected AI to create new roles for employees. This is a clear indication that AI is not going to replace human jobs, but will instead increase efficiency and shift humans’ roles and even create new positions for employees that provide meaningful work better suited to humans’ strengths.

Interestingly, 93 percent of respondents believed that AI will enhance the decision-making skills of the organization. There is no doubt in the fact that AI hosts immense potential for delivering insights in the heat of the moment when they matter most. With decision makers presented with the right insights at the right time, they will be able to make more informed and forward thinking decisions.

Three Ways How AI will augment the Human Workforce

Having discussed the findings from the research by Tata Communication and the massive potential for humans and AI to co-exist and achieve increased efficiency, we will now shed light on the three ways through which AI will augment the skills of the human workforce.

With evidence to suggest that the human workforce is not going anywhere any time soon, below we will turn our attention to the important issue of how AI will enhance the human workforce.

Potential to Help Individuals Become More Curious, Agile and Nimble

AI’s increasing presence in the workplace will alter the skills that businesses expect from their employees. The way people think will be more important than ever, which is why the recruitment process will likely change for the better.

This people-CENTRIC view would champion creativity, curiosity, and experimentation as traits that humans boast about. Rather than being at the business end of things, humans will now devote more time to what humans are good at; adding value, thinking of new business ideas, creative problem solving and building meaningful relationships with stakeholders and clients.

AI Can Enhance Human Collaboration

Rather than thinking about substituting Artificial Intelligence into different business processes, organizations would benefit more from  thinking about how both humans and machines can work together to create a partnership greater than the sum of its parts.

The beginning of this partnership can signal great things for organizations and may even help them develop their business offerings to deliver significant gains in performance.

Collaboration between humans is also likely to increase with the use of AI, as the technology has the capacity limit the presence of silos and other barriers to cross-organizational collaboration,  ultimately making the organization a better, well-oiled machine for the future. Moreover, AI can also compensate for human error and further increase collaboration in the process.

AI Has the Potential to Enhance Cognitive Diversity within Groups

Perhaps the biggest benefit of AI comes with its ability to enhance the intellectual diversity and collective intelligence within a workplace. Both humans and machines will complement each other’s strengths to come up with diverse ideas that show the strength of AI’s implementation.

While humans can work towards bringing up ideas, creative solutions, and action plans, AI systems can work as the devil’s advocate and challenge inherent assumptions. These systems will negate creative flaws such as vampire creativity and groupthink among many others.

Impact on Businesses

Vinod Kumar, CEO and Managing Director at Tata Communication, believes that, “AI is now being viewed as a new category of intelligence that can complement existing categories of emotional, social, spatial, and creative intelligence. What is transformational about Multiplicity is that it can enhance cognitive diversity, combining categories of intelligence in new ways to benefit all workers and businesses”.

This change will impact businesses in the following ways:


The structure of work will change, and there will be a need for greater flexibility and agility. Organizations will want employees who are prepared to give the agility and flexibility that AI systems need to be leveraged to their fullest potential.

Replacement of Existing Jobs

Humans will toned to increase their skill set in a bid to adapt with changing organization settings, structures and tools. Existing jobs will be replaced, and humans will lean towards analytics more than anything else.

Lifelong Learning

One thing that AI promises to do is set the tone for a period of lifelong learning. AI is not a fad you can follow for a month or two and then abandon. With the rapid pace of technological evolution, organizations today identify it as a lifestyle shift that requires constant learning.

You can view the full report by Tata Communication by downloading it from here.


Ronald helps data driven companies generating business value with best of breed solutions and a hands-on approach. He has been recognized as one of the top 10 global influencers by DataConomy for predictive analytics, and by Klout for Data Science, Big Data, Business Intelligence and Data Mining and is guest author on leading Big Data sites, is speaker/chairman/panel member on national and international webinars and events and runs a successful series of webinar on Big Data and on Digital Transformation. He has been active in the data (process) management domain for more than 18 years, has founded multiple companies and is now director at a Data Consultancy company, leader in Big Data & data process management solutions. Broad interest in big data, data science, predictive analytics, business intelligence, customer experience and data mining. Feel free to connect on Twitter or LinkedIn to stay up to date on success stories.

More Posts - Website

Follow Me:

Author information

Ronald helps data driven companies generating business value with best of breed solutions and a hands-on approach. He has been recognized as one of the top 10 global influencers by DataConomy for predictive analytics, and by Klout for Data Science, Big Data, Business Intelligence and Data Mining and is guest author on leading Big Data sites, is speaker/chairman/panel member on national and international webinars and events and runs a successful series of webinar on Big Data and on Digital Transformation. He has been active in the data (process) management domain for more than 18 years, has founded multiple companies and is now director at a Data Consultancy company, leader in Big Data & data process management solutions. Broad interest in big data, data science, predictive analytics, business intelligence, customer experience and data mining. Feel free to connect on Twitter or LinkedIn to stay up to date on success stories.

The post 3 Ways How AI Will Augment the Human Workforce appeared first on Ronald van Loons.


January 16, 2019

Revolution Analytics

AI, Machine Learning and Data Science Roundup: January 2019

A monthly roundup of news about Artificial Intelligence, Machine Learning and Data Science. This is an eclectic collection of interesting blog posts, software announcements and data applications from...


Forrester Blogs

Location, Location, Location!

There are three things that matter in real estate: location, location, location. It is the real estate agent’s mantra and is largely believed to be the most important factor to consider when...


Forrester Blogs

What I See Coming For The Channel In 2019

We have seen more disruption in the channel in the past 18 months than we saw in the past 37 years combined. As a review, here were my 2018 predictions.   1. Private equity will continue...


Forrester Blogs

Forrester’s Global Financial Services Architecture Online Survey

We’ve Kicked Off Some New Research, And We Need Your Help.  An important part of Forrester’s research process is gathering input from financial services companies so that we can advise our clients on...

InData Labs

5 Predictions for Artificial Intelligence in 2019: Analytics, Industries, Approaches, Ethics, Job Creation

Humanity has long been obsessed with the potential of computerization and robots. The concept of artificial mind hasn’t appeared out of anywhere. Previously, artificial intelligence (AI) was vigorously discussed in the domain of science fiction. But those days can rightly be seen as the cradle of AI and approaches of employing it in the way...

Запись 5 Predictions for Artificial Intelligence in 2019: Analytics, Industries, Approaches, Ethics, Job Creation впервые появилась InData Labs.

Forrester Blogs

How Do You Talk To Your Board About Cybersecurity? An Old Problem In A New World

Talking to our firm’s board of directors about security isn’t a new responsibility for most security leaders; it’s been on our collective agendas for years. But many security leaders still...


January 15, 2019

Revolution Analytics

Use foreach with HPC schedulers thanks to the future package

The future package is a powerful and elegant cross-platform framework for orchestrating asynchronous computations in R. It's ideal for working with computations that take a long time to complete;...


Forrester Blogs

Nineteen In ’19: New Year’s Resolutions For The Data-Driven Marketer

Every New Year, I find myself making lofty goals: eat better, exercise more, spend less time on social media. By mid-February, those goals are a distant blip in my rearview mirror; the reality of...

Big Data University

Why every Data Scientist should know SQL

Still waiting…it’s been over an hour and still nothing. I watch the clock, get some tea, ruminate on the structure of dark matter….

I’m trying to work with course enrollment data in a relatively large database and format it in a nice splashboard, but processing this data takes far too long. Perhaps dark matter is to blame.

Let me back up.

Last year, I was tinkering with a Jupter notebook to summarize course enrollment and completion stats for some of our database courses.

In fact I started with a notebook that someone had originally created for another set of courses involving the same database. Why re-invent the wheel when a perfectly good notebook to do something similar you want already exists. After all, data science is a team sport.

I had made relatively minor updates to the notebook – just switched the course numbers that I wanted summaries for and clicked on Run All to execute all cells in the notebook.

I hadn’t really looked carefully at the code in the notebook before running it. But once the summarized results failed to materialize after a couple of hours, I knew I couldn’t blame things on dark matter anymore and would need to get my hands dirty with code.

So I grabbed another warm beverage and got ready to dig into the code in the notebook. But it only took scrolling to the cells that performed the database queries to recognize the problem.

SELECT * FROM ENROLLMENTS. I read it and the read it again. Aloud, the second time. It was like a eureka moment.

I was pleased that I was able to debug the problem so quickly, but was not too happy with the prospect of having to spend some time hacking the notebook to make it run faster. A lot faster.

I have over 25 years of experience working with databases so I knew fixing the database queries would be relatively quick. But much of the data analysis logic in the notebook involved Pandas dataframes.

I had only recently picked up some data science skills and most of my data science capability involved the use of R programming language. All the data scientists I had been talking to recently were using Python. [So this was also good opportunity for me to pick up some skills in Python and Data Analysis with Python].

But let me not digress further and get back to the problem with SELECT * FROM ENROLLMENTS.

Imagine you want to buy one item from an online retailer. Would you order all the millions of items in the retailer’s warehouse to get just the one you want and then discard or return the rest of the items? Can you imagine how long it would take to have the entire inventory shipped to you? Even if all of the contents managed to reach you somehow, would you even have enough capacity and resources in your house to receive and process the entire inventory?

But apparently that is exactly what a lot of data scientists actually do. They “order” all of the items in the data warehouse and then use tools like Pandas dataframes to sift through the data they need and discard the rest.

And that is exactly what the SQL query: SELECT * FROM ENROLLMENTS in my example above does. The database I was accessing had millions of rows for course enrollment and completion data, and getting all the data into a notebook would take considerable time. And with constrained resources on my laptop, processing those millions of rows with Pandas dataframes would take even longer.

Shortly after this issue, I met with a Database Administrator (DBA) at one of the big banks. Their CEO was sold on the fact that data science could help transform the company and data science teams were cropping up all over the company in the recent months, but that’s when his job had started to become “hell”.

DBAs run a tight ship. They tune the system and queries to the umpteenth degree so the database can hum along fine responding to predictable queries efficiently.

And then comes along a hotshot data scientist and runs a huge query like “SELECT * FROM ENROLLMENTS” against an operational database. The database slows to a crawl, and the company’s clients on the website start seeing database errors and timeouts. And the DBA responsible for the database gets called to the boss’s office.

I may have exaggerated a bit and fictionalized parts of the narrative but unfortunately this sort of a thing is quite common. But data scientists are not entirely to blame. Data Science itself has been evolving.

Data Science traditionally has been done on very small data sets. As a matter of fact, over 80% of data science work is done on a laptop according to one of the consulting firms.

Small data sets are easy and fast to manipulate in memory and Pandas is great for that. Data Scientists traditionally worked with CSV files (text files with comma separated values) and did not have a connection to a database. A DBA would do a one-time database dump in to a CSV and that was it.

We are in the age of Big Data and working with CSV files is simply not practical. Repeatedly generating CSV file extracts with more up to date data is even less practical. This means that Data Scientists need to learn to work with big data repositories like relational Data Warehouses, Hadoop, Spark, Cloud Object Storage etc.

The language of relational databases is SQL. And because of SQL’s ease of use, it is increasingly being adopted by other big data repositories.

In case of my query – “SELECT * FROM ENROLLMENTS” – all I had to do was add a WHERE clause to the query to filter the results for just the courses I was interested in so the result set would include only a small subset of the millions of rows in the table.

So that is one reason I feel the knowledge of SQL is essential for today’s Data Scientists. Perhaps modern data scientists only need to learn a subset of SQL. They don’t need to learn transaction processing but things like simple filtering and aggregation are a must.

The impact of adding filtering to my SQL query in the Jupyter notebook was dramatic. The results were rendered in a couple of minutes instead of a couple of hours. And I don’t consider myself to be a genius.

And if I could tweak SQL in my data science experiment by so little and have such a huge impact on performance, I could surely help other Data Scientists (and some of those DBAs who are frustrated with newly minted data science yahoos like myself) work more efficiently with databases and SQL.

So shorty after these episodes, working with my colleagues Hima Vasudevan and Raul Chong, we launched the course Databases and SQL for Data Science on Coursera. It is an online self-study course that you can complete at your own pace.

This course introduces relational database concepts and helps you learn and apply knowledge of the SQL language. It also shows you how to perform SQL access in a data science environment like Jupyter notebooks.

The course requires no prior knowledge of databases, SQL, Python, or programming. It has four modules and each requires 2 – 4 hours of effort to complete. Topics covered include:

Module 1:
– Introduction to Databases
– How to Create a Database Instance on Cloud
– CREATE Table Statement
– SELECT Statement
– INSERT Statement
– UPDATE and DELETE Statements
– Optional: Relational Model Concepts

Module 2:
– Using String Patterns, Ranges
– Sorting Result Sets
– Grouping Result Sets
– Built-in Functions, Dates, Timestamps
– Sub-Queries and Nested Selects
– Working with Multiple Tables
– Optional: Relational Model Constraints

Module 3:
– How to access databases using Python
– Writing code by Using DB-API
– Connecting to a Database by Using ibm_db API
– Creating Tables, Loading Data, and Querying Data from Jupyter Notebooks
– Analyzing Data with SQL and Python

Module 4:
– Working with Real-world Data Sets
– Assignment: Analyzing Chicago Data Sets using SQL and Python

The emphasis in this course is hands-on and practical learning. As such, you will work with real databases, real data science tools, and real-world datasets. You will create a database instance in the cloud. Through a series of hands-on labs, you will practice building and running SQL queries using cloud based tools. You will also learn how to access databases from Jupyter notebooks by using SQL and Python.

Anyone can audit this course at no-charge. If you want a certificate and access to graded components of the course, there is currently a limited time price of $39 USD. And if you are looking for a Professional Certificate in Data Science, this course is one of the 9 courses in the IBM Data Science Professional Certificate.

So if you are interested in learning SQL for Data Science, you can enroll now and audit for free.

NOTE: Portions of this post have been updated from the original version. In the process of publishing this blog post, I may have inadvertently hurt the emotions of a few Data Scientists and perhaps some DBAs, but certainly none were physically harmed. But seriously, it is not my intent to offend or stereotype any Data Scientist or DBA. So my sincere apologies to anyone who may have taken offence. The intent of this blog post is to highlight a real problem in data science, one that can be easily rectified with some knowledge of SQL, and I would be a lousy marketeer if I didn’t include a shameless plug for the IBM Data Science Professional Certificate on Coursera.

The post Why every Data Scientist should know SQL appeared first on Cognitive Class.


January 14, 2019

Forrester Blogs

New Tech Spotlight: Security Technology Takes Center Stage

Venture capital and private equity spending on security technology reached an all-time high in 2018. Currently, this emerging technology market sector stands at about $31 billion in total funding....


Forrester Blogs

IBM’s Quantum Announcement Is A Big Step In A 1,000-Mile Journey

IBM unveiled a complete “quantum computing system,” IBM Q System One, last week. What’s more, it chose to do it at CES in Las Vegas. Should you take IBM’s claim of a “commercial” quantum...


Forrester Blogs

Hulu Signals An Inflection Point For OTT

You’ve likely read the facts: Hulu announced that it has 25 million subscribers and collected $1.5 billion in advertising revenue in 2018. (For more information and analysis, check out this...


January 13, 2019

Cloud Avenue Hadoop Tips

Developing with AWS Workshop - CGC, Landran

Completed a 5 day Workshop "Developing with AWS" for Engineering and MCA Students  at CGC, Landran. Nice to see a good bunch of happy students towards the end of the Workshop.


January 12, 2019

Forrester Blogs

CES 2019 Delivers Dazzling Tech But Disappointing Experiences

I spent this week in Las Vegas at CES to check out the latest and greatest technology wonders, ranging from 3D printing, AI, and cryptocurrency to drones, autonomous vehicles, dancing robots, and...


January 11, 2019

Forrester Blogs

Channel Data Is A Competitive Differentiator

Leverage Channel Data To Give Partners A Better Experience: Winning In The Channel Requires Data-Driven Program Innovation     Brands that provide an enhanced partner experience grow faster...

Knoyd Blog

On hackathons: lessons learned, experience, advice

Although our daily job is to help companies with setting up their analytics processes, building machine learning solutions, and hiring data scientists, we never say no to a hackathon invitation. Recently, we participated in one exciting hackathon in Slovakia, so we decided to share our experience while it's still fresh in our memories.

Corporate-startup cooperation

With the speed of data growth in the last couple of years, there is a constant need to analyze and leverage the data that are collected. Companies have started to analyze the data, build machine learnings models, and make data-driven business decisions. However, there are still many that are lagging behind in this area and are trying to improve. If you belong among them, you have several options:

  1. Cooperation with a consultancy firm. These are companies that will help you to set up the processes, build first machine learning solutions, and train your current employees. Such cooperation is a great starting point – the consultants will advise you on which data you can leverage, how to collect them, and will know from their previous experience which techniques are most suitable for you.

  2. Hiring the first (=lead) Data Scientist. At one point, you might realize that cooperation with a consultancy firm does not make sense for you anymore. This might be because leveraging the data is at the core of your business, because the maintenance of the existing infrastructure built by an external company suddenly takes enough time to keep a full-time employee busy, or simply because you just want more control over the process. This can be a tricky task since hiring your first 'data guru' is not the same as hiring for any other position in the company. If you need some advice, you might find useful one of our past articles on this topic.

  3. Look for the inspiration and/or potential partners by organizing hackathons. Although you cannot expect a ready-to-deploy solution from a hackathon, you will certainly get many fresh ideas, various approaches to a given problem (or a definition of a new challenge) and a look from several different angles.


Nowadays, there are many smart, data-oriented people and small startups (like us ;)) that are constantly looking for ways to open doors into bigger companies to show them what they can do. This is one of the reasons we love hackathons. It's not only fun and a great learning experience but also gives small companies a chance to get to know the competition and/or potential future partners but mostly to meet the big, well-established players (usually the organizer of the event), who can otherwise be really hard to get to. We see hackathons as a win-win for both sides – the organizers will get a number of solutions and good ideas while the participants gain experience and knowledge about the specific industry.

Our first hackathon experience

The first time we participated in a hackathon was 4 years ago. It was organized by Daimler in Germany.  The task was to build a model, that would check for outliers and data drift in different datasets. We decided to approach the problem in a robust way. We did an ensemble of different statistical tests, where weights in the ensemble would adjust based on the user feedback so that it took the most information from the test that was perfect for the specific dataset. This way, also people without math skills would be able to run these tests.

Before the final presentations, our hopes were high, because our solution was capable of identification of outliers and data drift in different types of data, numerical or categorical. But the reality struck us during the final presentations – while our solution would return a json with row ids of outliers, there were teams with functional interactive dashboards and a lot of other fancy features. The fact we didn't win came as no surprise. The beginner's luck wasn't really applicable this time, rather the luck favored the prepared.


Our most important takeaway from this hackathon was that it was an excellent learning opportunity. I had a lot of theoretical knowledge from the university. However, since I studied Mathematics, I lacked coding skills. We had some statistical courses where we used R, but at that time, I hadn't been smart enough to realize R was something I would use in the future :). During the hackathon, we used Python, and for me personally, it was 40 hours of constant googling and using of stack overflow for Python-specific errors. There is hardly a way you can learn more in two days than by participating in a hackathon. Although you will need to invest some extra time in learning best practices for coding afterward because you almost certainly end up with an ugly piece of code :).

Four years and four hackathons later

Since we liked the experience, we participated at other hackathons later on, such as a travel hackathon in St Moritz and a hackathon organized by Andritz in Graz. Each time we came a bit smarter and better prepared and left even smarter and even better prepared for the next one.

Our team at the Andritz hackathon in Graz. Full focus mode on.

Our team at the Andritz hackathon in Graz. Full focus mode on.

Recently, we participated in another great hackathon organized by ZSE – a Slovak electric utility company based in Bratislava, Slovakia, in cooperation with ImpactHUB. ZSE gave us a chance to see real data from the energy sector and how they use it. On the other hand, we got a chance to show them what we could do with this data to innovate their business.

The task

There were two tasks given to teams. First, to build a dashboard showing how much energy is consumed and produced in each grid supply point. The other one was to come up with an interesting model for trading the electricity among different grid supply points to minimize the costs.

Our approach

Each hackathon is a bit different. This one was specific because we knew the tasks and had access to the data 2 weeks beforehand. However, we didn't use this much to our advantage, since we were quite busy with our clients and on top of everything, we traveled to the Websummit conference one week before the competition. We expected the other teams to come with a ready solution. Fortunately, this didn’t happen so they gave us a chance to compete with them :). Because of a delayed flight, one of our team members arrived late 5 hours late and we had a work to do to catch up with the other teams so we decided to approach the challenge in a real 'hackathon style' (=no sleep and a lot of coke). We ended up being the only ones who stayed up all night.

Building the solution, we mostly focused on thinking about how to differentiate from the other teams. We knew that everyone would be able to come up with some kind of a dashboard. We decided to use open source technologies and build our dashboard from scratch using Python and Dash.

This is what our dashboard looked like.

This is what our dashboard looked like.

This way allowed us to easily add additional insights from custom machine learning models. In the end, the users of this dashboard wouldn't only be able to see the current consumption, but also future predictions and they could compare each grid supply point with those with similar consumer behavior using our own clustering model.

Income vs Spending for one specific grid supply point.

Income vs Spending for one specific grid supply point.

And the winner is…

Quite distinctively from the final presentations at the hackathon 4 years ago, this time we could see that our solution was objectively better than the others – a working demo, an interactive dashboard,... and it was us who walked off with the first prize!

It feels good to be the one walking away with the big check :).

It feels good to be the one walking away with the big check :).


Usually, it is very hard for a small company to get access to real data of a company such as ZSE. Therefore, hackathons are also a great way to get to datasets from different industries. We encourage all data enthusiasts, whether they are beginners or advanced, to take part in these competitions from time to time to learn new things, meet new companies and people, and have some fun!

If you would like to learn more about our winning solution feel free to reach out to me at


January 10, 2019

Forrester Blogs

16 Retail Trailblazers To Learn From In 2019

On our morning news feeds, we see headlines chock-full of emerging tech that is supposedly changing the game for next-generation retail experiences. Somehow, those hoverboarding delivery robots,...


January 09, 2019


January 08, 2019

Revolution Analytics

AzureR packages now on CRAN

The suite of AzureR packages for interfacing with Azure services from R is now available on CRAN. If you missed the earlier announcements, this means you can now use the install.packages function in...


Forrester Blogs

Blockchain And GDPR: Not Mutually Exclusive But Can Be A Toxic Blend

Depending on who you listen to, the combination of GDPR and distributed ledger technology (DLT, AKA blockchain) is either a poisonous cocktail or a magic potion. As you’d expect, the reality is...


January 07, 2019

Forrester Blogs

Akamai Purchases Janrain

Today, Akamai announced that it has acquired Portland, Oregon-based Janrain. Although the financial terms were not disclosed, Forrester estimates the purchase price to be in the $250M–$275M range,...


Forrester Blogs

China Takes The Moonshot Advantage

The origin of the term “moonshot” — used to describe the most advanced, disruptive innovations — has returned again as an innovation focus of its own. Last week, the China National Space...


January 06, 2019

M. Kinde

Data Points: January 2019

A roundup of random thoughts on data, information and design for the New Year. Diversity in the 116th U.S. Congress There has been a lot of discussion this week about the racial, ethnic and gender...


January 04, 2019

Revolution Analytics

Because it's Friday: A timeline of the elements

A few chemical elements: copper, iron, sulphur, and a few others have been known since the dawn of time. This animated timeline, created by Dr Jamie Gallagher, shows the year of discovery (or in some...


Revolution Analytics

Because it's Friday: Synthetic faces, styled to your specifications

If you need someone's face to use in an application or some marketing materials, you might search one of the stock photography vendors for people of a given gender, skin tone, hairstyle, etc. Or, you...


Revolution Analytics

Who is the greatest finisher in soccer?

It's relatively easy to find the player who has scored the most goals in the last 12 years (hello, Lionel Messi). But which professional football (soccer) player is the best finisher, i.e. which...


Revolution Analytics

In case you missed it: December 2018 roundup

In case you missed them, here are some articles from December of particular interest to R users. R 3.5.2 is now available. Roundup of AI, Machine Learning and Data Science news from December 2018....


January 03, 2019

Revolution Analytics

Notebooks from the Practical AI Workshop

Last month, I delivered the one-day workshop Practical AI for the Working Software Engineer at the Artificial Intelligence Live conference in Orlando. As the title suggests, the workshop was aimed at...


Forrester Blogs

Forrester + SiriusDecisions

Today Forrester closed the deal to acquire SiriusDecisions. You can find our press release here. SiriusDecisions helps business-to-business companies align the functions of sales, marketing, and...


January 02, 2019

Forrester Blogs

You’ve Still Got Mail

It has been more than two decades since AOL popularized email with the catchy “You’ve got mail” greeting. So ubiquitous was it in its heyday that it was the title of a rom-com starring Tom Hanks and...

InData Labs

How OCR Can Help Employees Fight Through Most Mundane Tasks

These days, office employees need an AI hero. Can you imagine the number of hours wasted on handling a paper-based workflow? Isn’t it time to save employees from piles of paper? No one is saying it will be easy to eliminate paper documents promptly. For instance, in the legal sphere where the cost of a...

Запись How OCR Can Help Employees Fight Through Most Mundane Tasks впервые появилась InData Labs.


December 27, 2018

Jean Francois Puget

Looking At The Stars: PLasTiCC competition



Looking at the sky at night is one of the most ancient human habit.  I did it too, of course but I never thought I would work on helping astronomers making sense if it.  Yet, I just did, our team finished 5th with a gold medal in the Kaggle PLasTiCC challenge.  As Kaggle puts it, the challenge was to:

Help some of the world's leading astronomers grasp the deepest properties of the universe.

The human eye has been the arbiter for the classification of astronomical sources in the night sky for hundreds of years. But a new facility -- the Large Synoptic Survey Telescope (LSST) -- is about to revolutionize the field, discovering 10 to 100 times more astronomical sources that vary in the night sky than we've ever known. Some of these sources will be completely unprecedented!

The Photometric LSST Astronomical Time-Series Classification Challenge (PLAsTiCC) asks Kagglers to help prepare to classify the data from this new survey. Competitors will classify astronomical sources that vary with time into different classes, scaling from a small training set to a very large test set of the type the LSST will discover.

It was a very challenging competition because we had to classify unevenly spaces time series. All time series problems I worked on before were regularly sampled time series. Sure, some value could be missing, but nothing like what we have here. Moreover, it was an open classification problem, with more classes in the test data than in the train data.  Another challenge was the size of the test dataset.  I had to leverage parallelism to be able to compute features in a reasonable amount of time.Yet it takes about 2 weeks on a 20 core machine to compute the features we use.

My part of the solution is described here, mostly feature engineering and lightgbm models, and some of my code is on github. A team mate, Kun Hao Yeh, described his here, mostly RNN.

I learned a lot about astronomy in general, and supernovae in particular in this challenge.  I also learned about new techniques around curve fitting, especially using Gaussian process modeling.  If you are like me, i.e. willing to give a try at Gaussian process modeling, but having no clue about it, then you may want to read this series of introductory posts:

The bible is this book, available online:

There is also this paper:

Several comments in the competition forum said GP was nice but too slow because the initial step requires a matrix inversion that takes O(n³) steps if there are n points in the time series.  Well, this is not true if you use the celerite package. Indeed, celerite inverts the matrix in O(n) steps!

Unfortunately for me, it took me too much time to master the celerite package, and I only could use GP the last day of the competition. It is a bit of  a pity given the winner of the competition based all his solution on GP!  Anyway, you can find my code for celerite on github, and I am showing here some of the curves it helped compute.


This competition completes a good year for me on Kaggle: 4 gold medals, including 2 solo gold medals, and 3 Silver medals out of 7 competitions. My worst result is top 4%, and my best is 5th.





December 26, 2018

Forrester Blogs

The Race To Innovate CX In Brokerage And Wealth Management

Recently, Forrester published our CX predictions for 2019. Our first prediction was that stagnating CX quality will cause short, destructive price wars. We cited Fidelity’s two new zero-fee...


December 24, 2018

Forrester Blogs

Everything I Learned About Training A Machine-Learning Model That I Learned From My Kids

“You who are on the road must have a code that you can live by . . . teach your children well . . . and feed them on your dreams.” Graham Nash was spot-on — about machine-learning. Training a...


December 21, 2018

Revolution Analytics

Because it's Friday: Happy Holidays

🎵 He'd better watch out // he'd better comply 🎵: This is my favourite festive GDPR gag of Christmas 2018 so far. — Charlie King (@charlietheking) November 7, 2018 With...


December 20, 2018

Revolution Analytics

R 3.5.2 now available

R 3.5.2, the latest version of the R language for statistical computation and graphics from the R Foundation, was released today. (This release is codenamed "Eggshell Igloo", likely in reference to...


Forrester Blogs

Phishing: The Simple Attack That Shreds The Defenses Of Sensitive Networks

Diplomatic networks carry some of the world’s most sensitive information: communications between world leaders, key technical intellectual property, trade strategies, and military plans. A recent...


Forrester Blogs

Your Corporate Intranet Is Dead

Forrester clients across industries report that their existing intranet projects are struggling to find relevance with employees. New approaches put a focus on collaboration, knowledge sharing,...


Forrester Blogs

Augmented Intelligence Is The Key To Driving Rapid Business Value With AI

Leveraging AI to augment human intelligence leverages the superhuman advantages of AI and the super-AI advantages of humans to drive large, tangible business value quickly. Thanks to easier...


Forrester Blogs

Leverage Forrester’s Business Strategy Workshops To Accelerate Digital Transformation

Looking for help with rethinking your current business model? Watch the video below to learn how Forrester is leveraging design thinking in our business strategy workshops to help executives unlock...


Forrester Blogs

Understand The EAMS Vendor Landscape To Choose The Right Tool

Forrester defines enterprise architecture management suites (EAMSes) as: A foundation for capturing, managing, and reporting on a firm’s strategic and operational assets, defining the relationships...


Forrester Blogs

What Is Forrester’s Definition Of Enterprise AI?

There are two types of AI. Watch the video below to learn which one you should forget about and which one you should go full steam ahead with. Have additional questions about enterprise AI? Schedule...