decor
 

Planet Big Data logo

Planet Big Data is an aggregator of blogs about big data, Hadoop, and related topics. We include posts by bloggers worldwide. Email us to have your blog included.

 

September 28, 2016

Data Digest

Will machines help humans make big decisions in future? Chief Analytics Officers weigh in


In a report recently released by PwC entitled PwC’s Data and Analytics Survey 2016: Big Decisions TM, it was revealed that “we’re at an inflection point where artificial intelligence can help business make better and faster decisions.”  The said report “shows that most executives say their next big decision will rely mostly on human judgment, minds more than machines. However, with the emergence of artificial intelligence, we see a great opportunity for executives to supplement their human judgment with data-driven insights to fundamentally change the way they make decisions.”

In our discussions with notable thought leaders in this space for the upcoming Chief Analytics Officer Forum Fall on 5-7 October in New York, we got a deeper insight as to how this trend is felt and viewed on the ground. For example, John France, Head of Sales Operations & Analytics at VALEANT PHARMACEUTICALS sees that the opportunities are limitless. He said that, “if there was a machine that could scan you at home and provide an instant reading on your health (heart, blood pressure, cholesterol, diet, etc…) and then deliver an action plan to correct such as what to eat that day, a work out regime, what meds to take, etc… that could really help save lives”

On the other hand, John Lodmell, VP, Credit & Data Analytics, ADVANCE AMERICA  believes that the increasing collection of geographic information from cell phones or fitness trackers will open up a lot of big data opportunities around movement and traffic patterns. “If you think back to vehicle traffic studies putting “car counters” across the road to track every-time a car crossed a certain point, we now have much richer data and collection techniques that are not fixed to tracking crossing a single point, but overall movement.  I remember hearing years ago that a large retail store was tracking traffic patterns within the stores seeing where customers went.  It provided them tons of useful information around where to place promotional items and how to make things more convenient for customers to find.  I can only imagine that type of information being used for urban planning or store marketing,” he said.

Read our full interviews by clicking on the links below:

Andrea L. Cardozo - Pandora 
Ash Dhupar - PCH 
Cameron J. Davies - NBC Universal 
Christina Hoy - WSIB 
Dipti Patel-Misra - CEP America 
Eric Daly - Sony Pictures 
Inkyu Son - Nexlend 
Jason Cooper - Horizon 
John France -  Valeant 
John Lodmell - Advance America 
Nikhil Aggarwal - Standard Chartered     


To learn more about Chief Analytics Officer Forum Fall, visit www.chiefanalyticsofficerforum.com 
 

September 27, 2016


BrightPlanet

How to Find the ‘Signal’ in the Noise of Open Source Data

Finding ‘signal’ through the noise of open source information is ultimately what will drive value to your organization. Whether this is to support sales, investment decisions, or the fight against fraud, corruption, IP theft, or terrorism, it all depends on identifying the ‘signal’ in the data. This article is intended to start a conversation about […] The post How to Find the ‘Signal’ in the Noise of Open Source Data appeared first on BrightPlanet.

Read more »

Revolution Analytics

The Financial Times uses R for Quantitative Journalism

At the 2016 EARL London conference senior data-visualisation journalist John Burn-Murdoch, described how the Financial Times uses R to produce high-quality, striking data visualisations. Until...

...
Data Digest

El vibrante mercado de Big Data en Latinoamérica.


Las predicciones sobre el crecimiento del mercado de análisis de datos y Big Data han sido bastante positivas en los últimos años para Latinoamérica. Gartner, Frost & Sullivan y muchos otros líderes en el tema, han estimado un crecimiento cercano al 40% en la adquisición de soluciones e implementación de herramientas para análisis de datos avanzado en los siguientes 4 años.

Ciertamente, los sectores Financieros y Aseguradores se encuentran liderando este fenómeno, bien sea por la necesidad de cumplir con agencias regulatorias en relación al uso y manejo de información o por una iniciativa clara de monetizar sus activos de datos. No obstante, otros sectores locales ya se encuentran en un proceso muy adelantado de modernización e implementación de estrategias corporativas para explotar sus datos.

Por ejemplo, industrias como Marketing, Telecomunicaciones, Retail y Producción de Bienes de Consumo están mostrando un mayor interés en las posibilidades que el análisis avanzado y Big Data les ofrece en términos económicos a mediano y largo plazo. Esta tendencia puede verse directamente reflejada, por una parte, en la aparición de un nuevo rol dentro de las estructuras directivas de dichas compañías – Chief Data Officer o Chief Analytics Officer-, y por otra, en, la implementación de procesos de modernización tecnológica para mejorar la captura y análisis de datos en tiempo real.

Por ejemplo, industrias como Marketing, Telecomunicaciones, Retail y Producción de Bienes de Consumo están mostrando un mayor interés en las posibilidades que el análisis avanzado y Big Data les ofrece en términos económicos a mediano y largo plazo.

A pesar de lo importante de estas iniciativas, y luego de más de 50 conversaciones directas con altos ejecutivos en la región, es evidente que aún hay un buen camino por recorrer antes de poder ver realmente los beneficios comerciales de cualquiera de estos sectores. Las razones principales para ello oscilan entre (1) la implementación de proyectos de modernización tecnológica en sus etapas iniciales; (2) la consolidación de programas de desarrollo profesional privados y públicos; y (3) la divergencia entre las necesidades locales y las soluciones disponibles en el mercado.

Interesantemente, las dos primeras aspectos han sido rápidamente reconocidos y varios agentes públicos y privados han sumando esfuerzos para poder acelerar sus resultados. Proyectos de modernización de infraestructura tecnológica y capacitación ya están en un estado muy avanzado de desarrollo. Mientras que el tercero, aún está lejos de lo que podría llegar a ser. Es claro que el mercado de soluciones local e internacional todavía se está ajustando al cambio organizacional y que se ven sorprendidos al ver el gran conocimiento que los nuevos CDOs y CAOs tiene sobre Big Data y análisis de datos.

Al parecer, son pocas las compañías que reconocen que éstos líderes regionales son los agentes adecuados y directos con los que pueden establecer posibles alianzas comerciales. De igual forma, son pocos los que se han tomado el tiempo de entender cuáles son sus reales necesidades, cuáles podrían ser las mejores formas de colaboración conjunta y cuál podría ser el camino más corto y económico para implementar programas duraderos de explotación de datos.

Por esta razón, Corinium Global Intelligence ha abierto la primera puerta de discusión directa en la región, para permitir que ambas partes tengan la oportunidad de reunirse y discutir sobre las mejor soluciona para solventar esta discrepancia. Del 24 y 25 de Enero de 2017 se llevará acabo el primer Chief Data & Analytics Officer, Central America, en el Marquis Reforma en la Ciudad de México, que reunirá más de 100 especialistas en la materia para discutir los aspectos operativos y estratégicos en la implementación de programas alrededor de la explotación comercial de datos. Esperamos ver a todos los expertos en regionales el siguiente año en México!

By Alejandro Becerra:

Alejandro Becerra is the Content Director for LATAM/USA for the CDAO and CCO Forum. Alejandro is an experienced Social Scientist who enjoys exploring and debating with senior executives about the opportunities and key challenges for enterprise data leadership, to create interactive discussion-led platforms to bring people together to address those issues and more. For enquiries email: alejandro.becerra@coriniumintelligence.com
Jean Francois Puget

CPLEX Is Free For Students (And Academics)

image

 

As part of our effort to make optimization pervasive we made our mathematical optimization products free for academic use 6 years ago.  4 years ago we removed license files, enabling the use of CPLEX offline for teachers, researchers and university staff.  We are now making a further effort by allowing any student to use CPLEX for free. 

 

We have also streamlined the registration process on the new Academic Initiative site.  If you use an email address provided by your institution, then registration on the new site should be instantaneous in most cases.  Mail and phone support for registration is available as well if needed.
In order to download it, please follow instructions on these pages:


We hope this will foster the use of state of the art mathematical optimization for both operations research projects and data science projects.

Thank you for using CPLEX.

Teradata ANZ

Understanding Your Customer’s Journey

Many of my recent discussions on the role of analytics in achieving business outcomes have been focussed on the customer and how companies can use analytics to better understand the unique set of interactions and transactions that make up the customer journey.

Designer Shoe Warehouse (DSW), a leading shoe retailer in the US connects with customers and inspires front-line employees through analytics. In a case study video Kelly Cook, Vice President of Customer Strategy and Engagement talks about how DSW is benefiting from analytics, “… Customers have no problem telling you everything you could do better. Knowing what you need to do for customers allow you to understand what data is needed, so you’ll know what to attack first.”

Accenture has published Retail Consumer Research 2016 and I simply love the cover page of the Executive Summary which simply states, “Retail customers are shouting – are you adapting?”

In an earlier blog, I talked about The Connected Consumer and the fact is all consumers today leave breadcrumbs across all channels during their journeys with you. There are three sets of capabilities required to be able to listen, adapt and interact with your customers:

monica-woolmer-customer-journey-1 Connected Data – Enable insights from all types of data sources, offline, online and real time, by providing the ability to build complete customer profiles that power data-driven decisions.

monica-woolmer-customer-journey-2Connected Analytics
– Right time analytics and decisioning capabilities at all channels. Provide the ability to see and understand potential and actual outcomes.


monica-woolmer-customer-journey-3Connected Interactions
– Provide the ability to manage a consistent customer experience across all channels by continuing to enhance the integration between all online and offline customer interactions.

Important to note that while the target capabilities are the same, each company will have different starting points and existing capabilities, as well as their own priorities and requirements that must be factored in to designing the overall business solution.

bonprix, one of Germany’s largest clothing merchants is able to micro-segment groups of customers based on shopping behaviour data such as reactions to coupon offers, rankings and website filters. Analysing the interactions helps bonprix determine how to tailor future digital and traditional marketing campaigns. Along with a significant uplift in key markets from the 1.5 billion targeted emails sent each year, bonprix has gained the ability to leverage existing data for improved forecasting, a deeper understanding of product return rates, fraud reduction, budget optimisation and other key business needs. As a result, analytics is providing a benefit back to the company – and to its customers.

As it should be, understanding the customer journey comes down to being connected with your customers. Connecting data, analytics and interactions helps you maintain and grow your relationships with your customers. In today’s world there are many more ways for your customers to transact and interact with you. Have you adapted how you listen and respond?

monica-woolmer-customer-journey-4
Learn more about Teradata’s Customer Journey Analytic Solution, a complete set of capabilities for discerning the behavioural paths of each individual customer, determining the next best interaction and delivering a consistent, personalised brand experience through every channel and touch point.

The post Understanding Your Customer’s Journey appeared first on International Blog.


Simplified Analytics

The Good, The Bad & The Ugly of Internet of Things

The greatest advantage we have today is our ability to communicate with one another. The Internet of Things, also known as IoT, allows machines, computers, mobile or other smart devices to...

...
 

September 26, 2016

Big Data University

Introducing Two New SystemT Information Extraction Courses

This article on information extraction is authored by Laura Chiticariu and Yunyao Li.

We are all hungry to extract more insight from data. Unfortunately, most of the world’s data is not stored in neat rows and columns. Much of the world’s information is hidden in plain sight in text. As humans, we can read and understand the text. The challenge is to teach machines how to understand text and further draw insights from the wealth of information present in text. This problem is known as Text Analytics.

An important component of Text Analytics is Information Extraction. Information extraction (IE) refers to the task of extracting structured information from unstructured or semi-structured machine-readable documents. It has been a well-known task in the Natural Language Processing (NLP) community for a few decades.

Two New Information Extraction Courses

We just released two courses on Big Data University that get you up and running with Information Extraction in no time.

The first one, Text Analytics – Getting Results with System T introduces the field of Information Extraction and how to use a specific system, SystemT, to solve your Information Extraction problem. At the end of this class, you will know how to write your own extractor using the SystemT visual development environment.

The second one, Advanced Text Analytics – Getting Results with System T goes into details about the SystemT optimizer and how it addresses the limitations of previous IE technologies. For a brief introduction to how SystemT will solve your Information Extraction problems, read on.

Common Applications of Information Extraction

The recent rise of Big Data analytics has led to reignited interest in IE, a foundational technology for a wide range of emerging enterprise applications. Here are a few examples.

Financial Analytics. For regulatory compliance, companies submit periodic reports about their quarterly and yearly accounting and financial metrics to regulatory authorities such as the Securities and Exchange Committee. Unfortunately, the reports are in textual format, with most of the data reported in tables with complex structures. In order to automate the task of analyzing the financial health of companies and whether they comply with regulations, Information Extraction is used extract the relevant financial metrics from the textual reports and make them available in structured form to downstream analytics.

Data-Driven Customer Relationship Management (CRM).  The ubiquity of user-created content, particularly those on social media, has opened up new possibilities for a wide range of CRM applications. IE over such content, in combination with internal enterprise data (such as product catalogs and customer call logs), enables enterprises to have a deep understanding of their customers to an extent never possible before.Besides demographic information of their individual customers, IE can extract important information from user-created content and allows enterprises to build detailed profiles for their customers, such as their opinions towards a brand/product/service, their product interests (e.g. “Buying a new car tomorrow!” indicating the intent to buy car), and their travel plans (“Looking forward to our vacation in Hawaii” implies intent to travel) among many other things.  Such comprehensive customer profiles allow the enterprise to manage customer relationship tailored to different demographics at

Besides demographic information of their individual customers, IE can extract important information from user-created content and allows enterprises to build detailed profiles for their customers, such as their opinions towards a brand/product/service, their product interests (e.g. “Buying a new car tomorrow!” indicating the intent to buy car), and their travel plans (“Looking forward to our vacation in Hawaii” implies intent to travel) among many other things. Such comprehensive customer profiles allow the enterprise to manage customer relationship tailored to different demographics at

Such comprehensive customer profiles allow the enterprise to manage customer relationship tailored to different demographics at fine granularity, and even to individual customers. For example, a credit card company can offer special incentives to customers who have indicated plans to travel abroad in the near future and encourage them to use credit cards offered by the company while overseas.

Machine Data Analytics. Modern production facilities consist of many computerized machines performing specialized tasks. All these machines produce a constant stream of system log data. Using IE over the machine-generated log data it is possible to automatically extract individual pieces of information from each log record and piece them together into information about individual production sessions. Such session information permits advanced analytics over machine data such as root cause analysis and machine failure prediction.

A Brief Introduction to SystemT

SystemT is a state-of-the-art Information Extraction system. SystemT allows to express a variety of algorithms for performing information extraction, and automatically optimizes them for efficient runtime execution. SystemT started as a research project in IBM Research – Almaden in 2006 and is now commercially available as IBM BigInsights Text Analytics.

On the high level, SystemT consists of the following three major parts:

1. Language for expressing NLP algorithms. The AQL (Annotation Query Language) language is a declarative language that provides powerful primitives needed in IE tasks including:

  • Morphological Processing including tokenization, part of speech detection, and finding matches of dictionaries of terms;
  • Other Core primitives such as finding matches of regular expressions, performing span operations (e.g., checking if a span is followed by another span) and relational operations (unioning, subtracting, filtering sets of extraction results);
  • Semantic Role Labeling primitives providing information at the level of each sentence, of who did what to whom, where and in what manner;
  • Machine Learning Primitives to embed a machine learning algorithm for training and scoring.

2. Development Environment. The development environment provides facilities for users to construct and refine information extraction programs (i.e., extractors). The development environment supports two kinds of users:

  • Data scientists who do may not wish to learn how to code can develop their extractor in a visual drag-and-drop environment loaded with a variety of prebuilt extractors that they can adapt for a new domain and build on top of. The visual extractor is converted behind the scenes into AQL code.

Information Extraction

  • NLP engineers can write extractors directly using AQL. An example simple statement in AQL is shown below. The language itself looks a lot like SQL, the language for querying relational databases. The familiarity of many software developers with SQL helps them in learning and using AQL.

AQL Information Extraction

3. Optimizer and Runtime Environment. AQL is a declarative language: the developer declares the semantics of the extractor in AQL in a logical way, without specifying how the AQL program should be executed. During compilation, the SystemT Optimizer analyzes the AQL program and breaks it down into specialized individual operations that are necessary to produce the output.

The Optimizer then enumerates many different plans, or ways in which individual operators can be combined together to compute the output, estimates the cost of these plans, and chooses one plan that looks most efficient.

This process is very similar to how SQL queries are optimized in relational database systems, but the types of optimizations are geared towards text operations which are CPU-intensive, as opposed to I/O intensive operations as in relational databases. This helps the productivity of the developer since they only need to focus on “what” to extract, and leave the question of the “how” to do it efficiently to be figured out by the Optimizer.

Given a compiled extractor, the Runtime Environment instantiates and executes the corresponding physical operators. The runtime engine is highly optimized and memory efficient, allowing it to be easily embedded inside the processing pipeline of a larger application. The Runtime has a document-a-time executive model: It receives a continuous stream of documents, annotates each document and output the annotations for further application-specific processing. The source of the document stream depends on the overall applications.

Advantages of SystemT

SystemT handles gracefully requirements dictated by modern applications such as the ones described above. Specifically:

  • Scalability. The SystemT Optimizer and Runtime engine ensures the high-performance execution of the extractors over individual documents. In our tests with many different scenarios, SystemT extractors run extremely fast on a variety of documents, ranging from very small documents such as Twitter messages of 140 bytes to very large documents of tens of megabytes.
  • Expressivity. AQL enables developers to write extractors in a compact manner, and provides a rich set of primitives to handle both natural language text (in many different languages) as well as other kinds of text such as machine generated data, or tables. A few AQL statements may be able to express complex extraction semantics that may require hundreds or thousands lines of code. Furthermore, one can implement functionalities not yet available via AQL natively via User Defined Functions (UDFs). For instance, developers can leverage AQL to extract complex features for statistical machine learning algorithms, and in turn embed the learned models back into AQL.
  • Transparency. As a declarative language, AQL allows developers to focus on what to extract rather than how to extract when developing extractors. It enables developers to write extractors in a much more compact manner, with better readability and maintainability. Since all operations are declared explicitly, it is possible to trace a particular result and understand exactly why and how it is produced, and thus to correct a mistake at its source. Thus, AQL extractors are easy to comprehend, debug and adapt to a new domain.

If you’d like to learn more about how SystemT handles these requirements and how to create your own extractors, enroll today in Text Analytics – Getting Results with System T and then Advanced Text Analytics – Getting Results with System T.

The post Introducing Two New SystemT Information Extraction Courses appeared first on Big Data University.


Revolution Analytics

Deep Learning Part 3: Combining Deep Convolutional Neural Network with Recurrent Neural Network

by Anusua Trivedi, Microsoft Data Scientist This is part 3 of my series on Deep Learning, where I describe my experiences and go deep into the reasons behind my choices. In Part 1, I discussed the...

...
 

September 25, 2016


Simplified Analytics

Why Data Scientist is top job in Digital Transformation

Digital Transformation has become a burning question for all the businesses and the foundation to ride on the wave is being data driven. DJ Patil & Thomas Davenport mentioned in 2012 HBR article,...

...
 

September 23, 2016


Revolution Analytics

Because it's Friday: Illusions at the Periphery

This fantastic optical illusion has been doing the rounds recently, after being tweeted by Will Kerslake: There are twelve black dots in that image, but I bet you can only see one or two of them at a...

...

Revolution Analytics

Microsoft R at the EARL Conference

Slides have now been posted for many of the talks given at the recent Effective Applications of the R Language (London) conference, and I thought I'd highlight a few that featured Microsoft R. Chris...

...
 

September 22, 2016

Silicon Valley Data Science

Noteworthy Links: September 22 2016

We’re at Enterprise Dataversity this week in Chicago, and next week we’ll be in NYC for Strata + Hadoop World. In the midst of this busy September, here are some articles we’ve come across and enjoyed.

MoMA Exhibition and Staff Histories—This open data set contains all exhibitions at MoMA from 1929–1989 (1,788 in all).

Agate: A Data Analysis Library for Journalists—Agate is a new Python library that claims to optimize “for the performance of the human who is using it.” Let us know if you’ve tried it out.

GitHub’s Project Management Tool—GitHub has a new, Trello-esque tool called Projects. We love new tools, and are intrigued by this one. Are you going to switch to Projects?

Tube Heartbeat—This visualization shows the “pulse” of London’s Underground, which is strangely relaxing to watch.

Creativity and Data Visualization—A group of artists is turning data into art through an exhibition called Visualizing the Invisible.

Want to keep up with what we’re up to? Sign up for our newsletter to get updates on blog posts, conferences, and more.

The post Noteworthy Links: September 22 2016 appeared first on Silicon Valley Data Science.

 

September 21, 2016

The Data Lab

Thomas Blyth, Business Development Executive

Thomas has over 16 years of experience in Business Development in the Oil & Gas Industry working with both small and large multi-national technology companies in the UK and Internationally.

Principa

Making the move from Predictive Modelling to Machine Learning

Everyone is wanting to learn more about how Machine Learning can be used in their business. What’s interesting though is that many companies may already be using Machine Learning to some extent without really realising it. The lines between predictive analytics and Machine Learning are actually quite blurred. Many companies will have built up some Machine Learning capabilities using predictive analytics in some area of their business. So if you use static predictive models in your business, then you are already using Machine Learning, albeit of the static variety.  

The move from Predictive Modelling to Machine Learning can be easier than you think. However, before making that move you need to keep two key considerations in mind to ensure that you benefit from all that machine learning has to offer and that your predictive analytics system remains a trustworthy tool that lifts your business, rather than harming it: Retaining Frequency and the Consequence of Failure.

 

September 20, 2016


Revolution Analytics

Welcome to the Tidyverse

Hadley Wickham, co-author (with Garrett Grolemund) of R for Data Science and RStudio's Chief Scientist, has focused much of his R package development on the un-sexy but critically important part of...

...

Revolution Analytics

Linux Data Science Virtual Machine: new and upgraded tools

The Linux edition of the Data Science Virtual Machine on Microsoft Azure was recently upgraded. The Linux DSVM includes Microsoft R, Anaconda Python, Jupyter, CNTK and many other data science and...

...

Rob D Thomas

The End of Tech Companies

“If you aren’t genuinely pained by the risk involved in your strategic choices, it’s not much of a strategy.” — Reed Hastings Enterprise software companies are facing unprecedented market pressure....

...

Revolution Analytics

How to choose the right tool for your data science project

by Brandon Rohrer, Principal Data Scientist, Microsoft R or Python? Torch or TensorFlow? (or MXNet or CNTK)? Spark or map-reduce? When we're getting started on a project, the mountain of tools to...

...

BrightPlanet

ACFE Fraud Conference Canada Recap: OSINT to Strengthen Risk Management

We promoted Tyson’s presentation at last week’s ACFE Fraud Conference in Montreal on our blog and now, we gathered some of his thoughts coming out of the event. From Tyson: The ACFE (Association of Certified Fraud Examiners) did an amazing job hosting and the venue was spectacular. If you have never been to Montreal, you need […] The post ACFE Fraud Conference Canada Recap: OSINT to Strengthen Risk Management appeared first on BrightPlanet.

Read more »
Big Data University

This Week in Data Science (September 20, 2016)

Here’s this week’s news in Data Science and Big Data. AI robot

Don’t forget to subscribe if you find this useful!

Interesting Data Science Articles and News

Upcoming Data Science Events

New in Big Data University

  • Data Science Fundamentals Learning Path – When a butterfly flaps its wings what happens? Does it fly away and move on to another flower or is there a spike in the rotation of wind turbines in the British Isles. Come be exposed to the world of data science where we are working to create order out of chaos that will blow you away!

The post This Week in Data Science (September 20, 2016) appeared first on Big Data University.

 

September 19, 2016


Revolution Analytics

YaRrr! The Pirate's Guide to R

Today is Talk Like A Pirate Day, the perfect day to learn R, the programming language of pirates (arrr, matey!). If you have two-and-a-bit hours to spare, Nathaniel Phillips has created a video...

...
Teradata ANZ

Why segment customers in a Big Data world?

Gary Comer, founder of mail order clothing retailer Lands’ End, once said “Think one customer at a time and take care of each one the best way you can”. The only way to implement this in the early 1960s, in the days of limited data and computing power, was to segment consumers into subsets with common needs, interests or priorities and then target them appropriately.

Fifty years on, there is an abundance of behavioral data about each customer: both transactional data indicating past responses to campaigns as well as interactions. Customer values and opinions are also shared on social media networks. Most importantly, there are now scalable supervised learning technologies that can link and analyse all of these granular data to create accurate predictive models. These changes have given marketers the ability to understand each customer’s unique needs and priorities enabling accurate targeting of a single individual rather than segments. Yet, as Rexter’s 2007 data miner survey shows, 4 out of 5 data miners still conduct segmentation analyses, i.e., unsupervised learning on data with sufficient information to perform supervised learning. And this is more frequently used by those working with CRM/Marketing data, in other words, for targeting customers.

So why are most marketers still persisting with targeting segments rather than an individual?

There are a number of reasons for this including:
1. Choice of analytic platforms,
2. Campaign funding structures and
3. The fact that simplistic segmentation models are easier to sell to the senior management.

1. Choice of analytic platforms for CRM

The Figure below shows the results of the 2015 KDnuggets poll on computing resources for analytics and data mining. A whopping 85% of all data miners still use PC/laptop for their analysis (even though they may also use other platforms).

bhavani-raskutti_segmentation-1Now, treating each person as a unique individual requires understanding their preferences, needs and current priorities. A person is targeted if and only if all of their behavior, social conversations and current events of relevance indicate that the campaign would be of interest. Such focused targeting needs building specific predictive models for each marketing campaign taking into account all of the customer’s transactions and interactions in all channels as well as any events of relevance.

Clearly, analysing such vast amounts of data in a timely manner is difficult or impossible with the PC/laptop, the analytic platforms used by most CRM analysts. Hence, the use of less resource-intensive approaches to support their campaigns, namely, segments based on the small demographic, economic and lifestyle datasets.

bhavani-raskutti_segmentation-22. Campaign funding structures

Predictive models for campaigns enable targeting of a very small population to achieve a high response rate (RR). The accompanying gain chart is typical of what is possible with models. Targeting 0.5% of the population in 200 such campaigns results in a contact rate of 100% (0.5% X 200) and an average RR of 40% (10% of 2% X 200).

Contacting the same number of people in fewer campaigns reduces the overall effectiveness. Yet, the minimum volumes needed to justify funding for campaigns mean that marketers have to contact more people in each campaign. This reduces the effectiveness of predictive models to the same level as that of simplistic segmentation models.

3. Segmentation is easier to sell 

When there are only five to seven non-overlapping segments, they can be explained to senior management with compelling visuals. Catchy segment titles such as “Indulgent traveler”, “Fashionista professional” and “Senior sippers” evoke images in our minds and it is then possible to mobilise funding for an entire program around that segmentation.

In comparison, predictive models that crunch 1000s of variables to then spit out likelihood scores are distinctly unexciting. Further, the process of crunching needs automation of CRM and investment in data science and that is not the province of the marketing staff controlling the funding. Marketing would much rather spend it on a marketing program based on a different subjective segmentation strategy.

So, how do we move away from just segmentation for CRM to embracing the most effective technique? The answer is to examine the data available to solve the business problem. Use segmentation, if there is insufficient historical information to learn a predictive model and the only option is to use undirected learning.

Thus, segmentation would be the technique of choice when:
• Launching products that are completely different from anything previously sold,
• Exploring new markets with very different geo-demographics and
• Designing new products by understanding gaps in current offerings in current markets through a market research done across the whole population, not just the customer base.

For all other campaigns predictive models would be the choice. This combination of supervised learning for regular campaigns and segmentation for new ones will ensure long term viable CRM.

The post Why segment customers in a Big Data world? appeared first on International Blog.

Data Digest

Driving the CX Agenda: Who’s Behind the Wheel?


“Have a very good reason for everything you do” – Laurence Olivier

How does your customer experience look under the glare of your customers' expectations? Olivier’s sentiment cries out for justification, to put the thought process behind every business decision impacting the customer up in neon lights for reflection. What would be revealed?

According to PwC’s 2016 Global CEO Survey ‘Customers remain the top priority, with 90% of CEOs indicating they have a high or very high impact on their business strategy’. But how is that translating into existing CX strategy? The Survey states that ‘customer behaviour, in particular, has become more complicated as values and buying preferences evolve.’

Undoubtedly this rapidly evolving environment makes customer centricity a cornerstone, but who has stepped out from the shadows to ensure it stays firmly in the spotlight?

Customer advocates, Chief Customer Officers, are active the boardroom championing the cause of the customer and putting in place the strategy to promote change, inter-discipline collaboration, organisational alignment and customer-centric decision making. 

NAB announced in July 2016 that they are creating not one but three Chief Customer Officer roles.

However the role of a customer advocate will look very different across organisations and backgrounds vary significantly between individuals. When we take a closer look at how Chief Customer Officers have arrived at their destination, we get a better flavour of the complex nature and diverse remit of the CCO role.

For example Julie Batch was appointed as the Chief Analytics Officer at Insurance Australia Group (IAG) in July 2014. By December 2015, Ms. Batch was heading up IAG’s Customer Labs as Chief Customer Officer, responsible for developing customer propositions and marketing strategies. For IAG, customer experience strategy is intrinsically linked to driving product innovation through data and insights. A natural progression for a CAO.

For Carsales.com.au their Chief Customer Officer, Vladka Kazda, was Chief Marketing Officer at the company for over five years before arriving at the CCO position. Ms Kazda owned and influenced customer experience at every level during her journey to CCO so a logical move.

For others, a natural rise in the ranks via customer experience roles has seen them awarded the CCO role. Damian Hearne, Chief Customer Officer at Auswide Bank has excelled in the leadership qualities required of a CCO to unite across  silos and move the business from delivering an uncoordinated experience to a reliable, deliberate and preferred customer experience. 

Mark Reinke, Chief Customer Experience Officer, Suncorp, has also united the critical elements of customer, data and marketing. The customer listening path is critical but alone, it cannot deliver. It needs the proactive and innovative advocate with the leadership skills to drive initiatives. 

CCOs often have a broad remit but primarily the requirement to develop the competency to operationalise the brand promise. Looking at the language, prioritisation, decision-making, bringing together operating groups, transforming the collaboration process, implementing the customer experience design. There is both faith and science behind the Chief Customer Officer.

Ultimately everyone in the business is involved in putting the customer first, but employee customer advocates are only fostered from a successful customer centric culture. What metrics are being used to measure the impact and success of a CCO?

To learn more about driving change, overcoming the challenges and critically measuring the success of the CCO, join Julie Batch, Vladka Kazda, Damian Hearne, and Mark Reinke as they share their insights at Chief Customer Officer Sydney, 28-29 November 2016

Learn how other organisations are addressing their CX challenges, learn about new approaches and strategies, whilst making new connections with industry peers. Join The Chief Customer Officer Forum LinkedIn group here.
 

September 18, 2016


Simplified Analytics

What is Cognitive Computing?

Although computers are better for data processing and making calculations, they were not able to accomplish some of the most basic human tasks, like recognizing Apple or Orange from basket of fruits,...

...
 

September 16, 2016


Revolution Analytics

Because it's Friday: A big chart about climate change

The problem with representing change on a geologic timescale is just that: scale. We humans have only been around for a tiny fraction of the planet's history, and that inevitably colours our perceptions of changes that occur over millennia. That's one of the things that make the climate change debate so difficult: it's hard to represent the dramatic changes in the climate over the last 200 years in the context of climate history. (Deliberate FUD also has a lot to do with it.) It's difficult just to chart temperatures over that timescale, because the interesting part to us (human history) gets lost in the expanse of time. As a result, most representations of climate data are either compressed or truncated, which dampens the impact.

Randall Munroe has figured out a clever way to demonstrate the dramatic impact of modern climate change in a recent issue of XKCD: to simply plot the history of global temerature since the last ice age in one really, really, tall chart -- liberally decorated with the usual XKCD humour, of course:

Xkcd climatepng

That's just one tiny excerpt of the chart, you really need to click through and scroll through to the end to really appreciate its impact.

That's all from us here at the blog for this week. Have a great weekend, and we'll see you back here on Monday. Enjoy!


Revolution Analytics

Because it's Friday: A big chart about climate change

The problem with representing change on a geologic timescale is just that: scale. We humans have only been around for a tiny fraction of the planet's history, and that inevitably colours our...

...

Revolution Analytics

Reflections on EARL London 2016

The Mango Solutions team have done it again: another excellent Effective Applications of R (EARL) conference just wrapped up here in London. The conference was attended by almost 400 R users from...

...
Data Digest

Top 10 Takeaways at the Chief Data & Analytics Officer Melbourne 2016


Feeling empowered and inspired after a fantastic three-day conference in Melbourne last week with over 200 data enthusiasts! The topics were vast and the speakers kept everyone engaged with their wealth of knowledge and stories shared. A massive thank you to all our speakers, sponsors and attendees – we learnt lots and had a lot of fun!

Here are my top 10 takeaways from the conference:



1. On leveraging open data for social good: we can all be superheroes! Thank you Jeanne Holm, City of Los Angeles for your inspiring stories of open data being used to improve the world we live in.


2. On building a culture for data governance: “Work with the willing and win hearts” Kate Carruthers, Chief Data Officer, University of New South Wales

3. On establishing a data quality framework: create a team brand that represents value add to your organisation, and keep your dq metrics simple and powerful - Michelle Pinheiro, IAG



4. On ensuring the success of your data analytics projects: agile, agile, agile, AGILE!

5. On IT and business alignment– it’s simple, says World Vision International's John Petropoulos, if your partner doesn’t get it, you need to re-write it!
   
6. Big data personalisation = world class, machine learning predictive model @Woolworths
   
7. On data privacy – and where is that creepy line? A key point for consideration from Brett Woolley, NAB is content vs intent of personal information used.
   
8. On developing a cost model for data governance… you really ought to check out Gideon Stephanus du Toit’s presentation: http://bit.ly/2cvewwW
   
9. On leveraging machine learning for safer flights into Queenstown – a very cool use case from Mark Sheppard, GE Capital



10. On marketing analytics: “don’t fall for vanity metrics” Geoff Kwitko, Edible Blooms

Thanks again all, and we look forward to catching up in Sydney on 6-8 March 2017!

To discuss the Chief Data & Analytics Officer Sydney 2017 event and speaking/ sponsorship opportunities, please get in touch: monica.mina@coriniumintelligence.com   


By Monica Mina:

Monica Mina is the organiser of the CDAO Melbourne, consulting with the industry about their key challenges and trying to find exciting and innovative ways to bring people together to address those issues. For enquiries, monica.mina@coriniumintelligence.com.
 

September 15, 2016

Silicon Valley Data Science

Jupyter Notebook Best Practices for Data Science

Editor’s note: Welcome to Throwback Thursdays! Every third Thursday of the month, we feature a classic post from the earlier days of our company, gently updated as appropriate. We still find them helpful, and we think you will, too! The original version of this post can be found here.

The Jupyter Notebook is a fantastic tool that can be used in many different ways. Because of its flexibility, working with the Notebook on data science problems in a team setting can be challenging. We present here some best-practices that SVDS has implemented after working with the Notebook in teams and with our clients—and that might help your data science teams as well.

The need to keep work under version control, and to maintain shared space without getting in each other’s way, has been a tricky one to meet. We present here our current view into a system that works for us—and that might help your data science teams as well.

Overall thought process

There are two kinds of notebooks to store in a data science project: the lab notebook and the deliverable notebook. First, there is the organizational approach to each notebook.

Lab (or dev) notebooks:

Let a traditional paper laboratory notebook be your guide here:

  • Each notebook keeps a historical (and dated) record of the analysis as it’s being explored.
  • The notebook is not meant to be anything other than a place for experimentation and development.
  • Each notebook is controlled by a single author: a data scientist on the team (marked with initials).
  • Notebooks can be split when they get too long (think turn the page).
  • Notebooks can be split by topic, if it makes sense.

Deliverable (or report) notebooks

  • They are the fully polished versions of the lab notebooks.
  • They store the final outputs of analysis.
  • Notebooks are controlled by the whole data science team, rather than by any one individual.

Version control

Here’s an example of how we use git and GitHub. One beautiful new feature of Github is that they now render Jupyter Notebooks automatically in repositories.

When we do our analysis, we do internal reviews of our code and our data science output. We do this with a traditional pull-request approach. When issuing pull-requests, however, looking at the differences between updated .ipynb files, the updates are not rendered in a helpful way. One solution people tend to recommend is to commit the conversion to .py instead. This is great for seeing the differences in the input code (while jettisoning the output), and is useful for seeing the changes. However, when reviewing data science work, it is also incredibly important to see the output itself.

For example, a fellow data scientist might provide feedback on the following initial plot, and hope to see an improvement:

not-great

better-fit

The plot on the top is a rather poor fit to the data, while the plot on the bottom is better. Being able to see these plots directly in a pull-request review of a team-member’s work is vital.

See the Github commit example here.

Note that there are three ways to see the updated figure (options are along the bottom).

Post-save hooks

We work with many different clients. Some of their version control environments lack the nice rendering capabilities. There are options for deploying an instance of nbviewer behind the corporate firewall, but sometimes that still is not an option. If you find yourself in this situation, and you want to maintain the above framework of reviewing code we have a workaround. In these situations, we commit the .ipynb, .py, and .html of every notebook in each commit. Creating the .py and .html files can be done simply and automatically every time a notebook is saved by editing the jupyter config file and adding a post-save hook.

The default jupyter config file is found at: ~/.jupyter/jupyter_notebook_config.py

If you don’t have this file, run: jupyter notebook --generate-config to create this file, and add the following text:

c = get_config()
### If you want to auto-save .html and .py versions of your notebook:
# modified from: https://github.com/ipython/ipython/issues/8009
import os
from subprocess import check_call
def post_save(model, os_path, contents_manager):
    """post-save hook for converting notebooks to .py scripts"""
    if model['type'] != 'notebook':
        return # only do this for notebooks
    d, fname = os.path.split(os_path)
    check_call(['jupyter', 'nbconvert', '--to', 'script', fname], cwd=d)
    check_call(['jupyter', 'nbconvert', '--to', 'html', fname], cwd=d)
c.FileContentsManager.post_save_hook = post_save

Run jupyter notebook and you’re ready to go!

If you want to have this saving .html and .py files only when using a particular “profile,” it’s a bit trickier as Jupyter doesn’t use the notion of profiles anymore.
First create a new profile name via a bash command line:

export JUPYTER_CONFIG_DIR=~/.jupyter_profile2
jupyter notebook --generate-config

This will create a new directory and file at ~/.jupyter_profile2/jupyter_notebook_config.py. Then run jupyter notebook and work as usual. To switch back to your default profile you will have to set (either by hand, shell function, or your .bashrc) back to: export JUPYTER_CONFIG_DIR=~/.jupyter.

Now every save to a notebook updates identically-named .py and .html files. Add these in your commits and pull-requests, and you will gain the benefits from each of these file formats.

Putting it all together

Here’s the directory structure of a project in progress, with some explicit rules about naming the files.

Example directory structure

- develop # (Lab-notebook style)
 + [ISO 8601 date]-[DS-initials]-[2-4 word description].ipynb
 + 2015-06-28-jw-initial-data-clean.html
 + 2015-06-28-jw-initial-data-clean.ipynb
 + 2015-06-28-jw-initial-data-clean.py
 + 2015-07-02-jw-coal-productivity-factors.html
 + 2015-07-02-jw-coal-productivity-factors.ipynb
 + 2015-07-02-jw-coal-productivity-factors.py
- deliver # (final analysis, code, presentations, etc)
 + Coal-mine-productivity.ipynb
 + Coal-mine-productivity.html
 + Coal-mine-productivity.py
- figures
 + 2015-07-16-jw-production-vs-hours-worked.png
- src # (modules and scripts)
 + init.py
 + load_coal_data.py
 + figures # (figures and plots)
 + production-vs-number-employees.png
 + production-vs-hours-worked.png
- data (backup-separate from version control)
 + coal_prod_cleaned.csv

Benefits

There are many benefits to this workflow and structure. The first and primary one is that they create a historical record of how the analysis progressed. It’s also easily searchable:

  • by date (ls 2015-06*.ipynb)
  • by author (ls 2015*-jw-*.ipynb)
  • by topic (ls *-coal-*.ipynb)

Second, during pull-requests, having the .py files lets a person quickly see which input text has changed, while having the .html files lets a person quickly see which outputs have changed. Having this be a painless post-save-hook makes this workflow effortless.

Finally, there are many smaller advantages of this approach that are too numerous to list here—please get in touch if you have questions, or suggestions for further improvements on the model! For more on this topic, check out the related video from O’Reilly Media.

The post Jupyter Notebook Best Practices for Data Science appeared first on Silicon Valley Data Science.

The Data Lab

Calling all Data Scientists to join our "Office Hours" initiative

We would like to invite data scientists and academics to attend our offices at 15 South College Street, Edinburgh EH8 9AA, to come together and discuss problems, share insights, and meet like-minded technicians in a relaxed and informal environment. From 9:30 the attendees will each give a very brief presentation to the group (around 5 minutes each), then afterwards are welcome to use our offices for follow-up discussions, networking, or simply to do their day job!

Please follow the link to the application form. For this first iteration spaces are limited to fifteen and therefore places will be allocated at The Data Lab’s discretion. We intend to run these days regularly with increased capacity across various locations in Scotland, so please give us some information on topics of interest to help us plan future days.

If you have any queries please do not hesitate to contact us at science.group@thedatalab.com.

 

Template: 
Image
Share: 
Google Plus
Facebook
Pinterest
Twitter
Principa

What is Machine Learning?

Here's a blog post covering some of the most frequently asked questions we get on Machine Learning and Artificial Intelligence, or Cognitive Computing. We start off with "What is Machine Learing?" and finish off with addressing some of the fears and misconceptions of Artificial Intelligence.

So, what is machine learning? A simple search on Google for the answer will yield many definitions for it that leave most non-analytical people confused and entering more "What is..." statements into Google. So, I asked our Head of Marketing to try his hand at defining Machine Learning in the most simplistic way he can: explain Machine Learning to someone you've just met at a social gathering. Here's his definition - a "Machine Learning for Beginners' " definition if you will. 

 

September 14, 2016


David Corrigan

Now You See Me, Now You Don’t

The Trials & Tribulations of the Anonymous Customer I bought an office chair from an office retailer a few months ago.  Seeing as I was buying something I wanted vs. something I needed...

...
Jean Francois Puget

Machine Learning From Weather Forecast

Using weather data in machine learning is promising.   For instance, everyone knows that weather forecast influences buying patterns, be it for apparel, food, or travel.  Wouldn't it be nice to capture weather forecast effect on these?

All we need to do is to use weather data in addition to other data we have , then use our favorite machine learning toolbox.

It sounds simple but there is a catch.  I try to explain what it is below.

For the sake of simplicity we will assume we want to predict future sales, but what follows applies to any situation where we want to use weather forecast as part of a machine learning application.

If you omit weather data, then sales forecast is a classical problem for which several statistical and machine learning techniques can be applied, for instance Arima.  These techniques deal with time series: past sales data is ordered by time, and the model extrapolates the time series.  In a nutshell, the model finds trends in past sales data, and applies the trend to the current data. 

One way to assess model accuracy is to run it against historical data, and compare predictions with actual sales.  For this you must use historical data that was not used when creating the model, see Overfitting In Machine Learning to know why, if it is not clear.  

Assuming you use held out historical data, you would, for each week w, run your model with all weeks wi with wi < w as input, then you would compare the output of the model with sales at week w. If your model is good, then you would get something like this where the predicted values are close to the actual values:

image

You can then use the model to predict future sales.

If you want to predict with weather forecast, your model will take two time series as input:

  • Past sales
  • Past weather forecast

and it will output sales forecast for the coming week.

Issue is that past forecasts aren't available in general.  Most weather forecast providers store past observed weather, but they don't store past weather forecast because it would require way more storage capacity.

The usual way to get around for the lack of past weather forecast is to approximate it by using past observed weather.  The weather forecast for past week w will be the observed weather for week w

While this seems appealing it creates an issue. Indeed, when using actual weather in place of weather forecast we assume perfect weather forecast.   But when we will use our model on current data, we will have current weather forecast as input, which is unlikely to be perfect.  Our model will assume it is perfect anyway, and it may rely on it too much.  This may negatively impact our sales forecast accuracy. 

The cure to this issue is to really use past forecast.  Unfortunately, as said above, this data isn't stored in general.  One way to cope with its absence is to reconstruct it by running weather forecasting models with past weather data.  For each week in the past we would run the weather forecast model with previous weeks as input, and store the result.  This is what our weather forecast team at IBM does using Deep Thunder technology ( wikipedia link ).  This is the rich man's solution, expensive but quite effective.

If you cannot reconstruct past weather forecasts, a poor man solution can be to add noise to past weather data when you use it as a proxy for weather forecast.  This way you no longer assume perfect weather forecasts, and your model will probably be better off. 

Chose the solution you want, but do not use raw past weather as a proxy for past weather forecast.  It will lead you to disappointing results.

 

 

 


Revolution Analytics

How an algorithm behind Deep Learning works

There are many algorithms behind Deep Learning (see this comparison of deep learning frameworks for details), but one common algorithm used by many frameworks is Convolutional Neural Networks (CNNs)....

...
Teradata ANZ

Hey! Are Your Colleagues Cheating The Business?


If your organisation is ignoring its Big Data potential, your colleagues are cheating the business because data-driven companies get a much higher return on investment than their competitors.

Fact.

And the key to maximising your data potential also happens to be the fourth ‘V’ in Big Data; ‘Value’. Or rather, ‘Value’ spliced with ‘Time-to-Market’.

Now, for the majority of businesses data-driven competition presents a massive challenge. Particularly if they take the traditional, often tortuous approach to the development of products and services. Typically, it can take a ‘last-gen’ IT team anything from six months to a year (or more) to deliver the goods, leaving their hamstrung c-suite helpless in the face of more agile pace setters.

The quickest route to business value

In that kind of torpid environment, scope creep often leads to widespread under-delivery (numbers, KPIs, insights, etc.). Also, the team have to face up to the thorny question of what happens if development fails to confirm the original business case. Or what if the investment isn’t written off immediately (a real dollar-burner)?

Clearly, a fresher, more vigorous approach is required. A radical approach where innovation is the catalyst for a sustainable reduction in development costs, deadlines, and the time-to-market for insights, as well an increase in business value.

Because in markets where Big Data and IoT as well as disruptive technologies and business models, are triggering unprecedented change, accelerating the time-to-market for your data insights can make the difference between success and failure.

It pays to know what you don’t know… yet

Therefore, you need to know what’s worth and not worth pursuing as soon as possible so you can divert resources to the most profitable and productive business areas.

This calls for a kind of thrash-metal approach to R&D researching deeper and developing faster than ever before. Which is not as crazy as it sounds. This discovery-driven approach follows the same fail-fast principles adopted by life sciences companies (new drug development), McDonald’s (new recipes), and most other organisations developing new and uniquely successful IP.

Try. Fail. Repeat. Try. Fail. Repeat. Try, fail, and repeat until you find a data-driven business case that carries enough real value to warrant operationalisation.

Delivering actionable insights – faster

Okay, so a fail-fast approach provides a springboard for improved and sustainable profitability. The trouble is it also causes great upheaval, and that can hit hard. Your whole operation could need transforming.

Scary, huh? Actually, it’s not as difficult to get your head around as you might think. Especially as data science is on hand to help you discover the business value buried in your data.

And that’s the point. To get more business value, you need to know your data in detail. Then visual analytics can turbo-charge your knowledge, providing clarity and a shorter time-to-market en route to a brand new rapid-insight development cycle.

Creating a culture of analytics

The quicker you can identify the business value in your data, the greater (potentially) the ROI. To this end, visual analytics cut straight to the money shot, offering an easy method of fast tracking discovery and breaking down data complexity. And creative visualisations of the results, like the images in the Art of Analytics collection*, make the implications more readily understood.

That said, a method is just a method and to flourish, your organisation needs to establish a pioneering culture; a culture of analytics which encourages the whole team to think outside the box. That means assigning the right people and embedding the right analytical and fail-fast processes, while enabling them with the right technology, the right methodology, and the right incentives (no one should be afraid of failing).

The Art of Analytics

The Art of Analytics* is a series of pictorial data visualisations drawn from real-world use cases and presented as works of art in their own right. Alongside each image are details of the practical benefits gained by the organisation concerned. These stunning artworks are the product of intensive analytical work aimed at creating extra business value and solving real business problems. The visualisation process involves interrogating diverse types and quantities of data with a tailored mix of analytical tools and methods.

Check out the practical benefits of the Art of Analytics in ‘Visual Analytics Beat Even Oprah For Working Out Complex Relationships. #1: Les Misérables’ – the next in this series of Art of Analytics blogs.

This post first appeared on Forbes TeradataVoice on 27/05/2016.

The post Hey! Are Your Colleagues Cheating The Business? appeared first on International Blog.

 

September 13, 2016


BrightPlanet

How to Find and Harvest Dark Web Data from the TOR Network

The Internet is constantly changing, and that’s more apparent on the TOR Network than anywhere else. In this section of the Dark Web, you can see a URL one minute and it’ll be gone the next. These fly-by-night URLs make it challenging to collect TOR data, but it’s important to stay on top of Dark […] The post How to Find and Harvest Dark Web Data from the TOR Network appeared first on BrightPlanet.

Read more »
Teradata ANZ

How about trying complex Data Analytics Solutions for size and fit, before you splash the cash?

The pressure to deliver Alpha – potential above the market average – is immense. And that’s not surprising because in a quick-change disruptive world, Alpha consistency puts an organisation at the sharp edge of its digitally-driven market. If it can be done quickly, that is.

One of the biggest stumbling blocks to creating breakthrough insights that drive value though, has been the time it takes to realise or monetise the business potential in data. But what if I said that instead of waiting months or even years for project results you could deliver Alpha, fully, within 6-10 weeks?

And what if I told you we could predict the business value of analytic solutions before you shell out a ton of money on the technology and other resources?

What would that be worth to your organisation? To you, personally?

The RACE for business value in data

Predicting outcomes is complicated. A myriad of different things that complicate the deployment and business use of analytic solutions need to be taken into consideration, like new data sources (including IoT sensor data) and new analytic techniques, for instance. Yet, in spite of this giant basket of variables, the potential ROI and strategic business impact of any analytic solution is expected to be totted-up and delivered to the table before any money changes hands.

Which is where Teradata’s RACE approach comes in. Teradata’s agile, technology-agnostic process, RACE (Rapid Analytic Consulting Engagement), has been developed to complement both agile development (e.g. CRISP) and agile methodology (e.g. SCRUM).

Crossing the Business – IT divide

The RACE process also soothes a number of old wounds. Business departments pass their needs and ideas onto their analysts, who simply respond. And, whereas IT departments have their processes, business and their analysts don’t really have a way of streamlining business value identification before hitting IT with a development request.

Often (surprise, surprise), business departments don’t understand the analytical potential of data. At the same time, neither the analysts nor the IT department understand business processes and ideas. Consequently, business thinks IT is too slow; IT feel they are not taken seriously and have no clue about how the business is really run.

One of the great things about RACE is that it fuses business and IT together through its leadership and commitment model. At the same time it enables both sides to intensively learn from another.

RACEing involves three primary phases:

  1. Align – together, business and IT identify and align the highest-potential-value uses cases, and validate the availability of key data assets to support the use case.
  2. Create – data scientists load and prepare the data developing new, or applying existing, analytic models to the selected use cases. This phase involves rapid iterations with the business to ensure the analytic insights hit the right business targets.
  3. Evaluate – business and the analysts / data scientists analyse the results and document the potential ROI of deploying the analytic use cases at scale, as well as developing a deployment recommendation.

RACE leverages multi-genre analytics to generate new business insights, reducing time to market (takes average of 6 weeks to validate ROI in the new business insights), and minimising deployment risk (generated insights act as a prototype for operationalisation). Oh yes, it identifies the Alpha by validating use-case business potential, too.

And the upshot is that you begin each project with a clear ROI roadmap which answers three burning business questions: “How?”, “Where?”, and “What will it be worth?”.

What’s not to like?

The post How about trying complex Data Analytics Solutions for size and fit, before you splash the cash? appeared first on International Blog.

Big Data University

This Week in Data Science (September 13, 2016)

Here’s this week’s news in Data Science and Big Data. NBA data

Don’t forget to subscribe if you find this useful!

Interesting Data Science Articles and News

Upcoming Data Science Events

Cool New Courses

The post This Week in Data Science (September 13, 2016) appeared first on Big Data University.

Teradata ANZ

DevOps Decoded: Modular Design

I have been reading and thinking a lot about DevOps recently, specifically in the area of development/test/deployment automation and how it would be best applied to building analytic solutions.

Like metadata, I believe DevOps is coming into its prime, with the advancements in open source and resurgence in programming combining to provide all of the enabling components to build an automated delivery pipeline.

In a simple delivery pipeline you will have a number of stages that the code moves through before final deployment into a production environment.

nathan-green_devops-1

Code is created by a developer or generated by a tool and committed into a source code repository. The change is detected and triggers the packaging process.

QA checks are performed and a deployment package is built. This contains the set of changes (delta) which must be applied to the production environment to deploy the new code.

The package is then deployed into a series of environments where testing is executed and the package is automatically promoted throughout the test environments based on the “green light” results of the previous round of testing.

Test results are monitored throughout the entire process and failures are detected and reported back to the developer immediately for correction. The faster you can find and report on bugs, the easier it is for the developer to fix the issue, as it will still be “top of mind” and requires least effort to remedy.

Finally packages are deployed into production, either automatically (continuous delivery) or as part of a scheduled release.

An automated delivery pipeline like this looks simple on the surface, but as soon as you peel back the covers and look into the detail you quickly realise that there is a lot of complexity contained within.

Some of the challenges are technical in nature, “how do I generate delta changes to a database object?” and others are firmly in the design/architecture realm, “how do I perform continuous integration on a monolithic data maintenance process?”.

Whilst I am not going to be able to explore all of these issues within this article, I would like to discuss the core principle which I believe is the key to solving this complex puzzle – modular design.

Modular design certainly is not a new concept, indeed it has been used in traditional software development for many years. Micro services are a great example of modular design in action, and I believe they will play a far greater role in analytic solution development in the future.

Many analytic solutions (eg: Data Warehouse, Data Lake etc…) will have a modular design to some degree, but most do not extend the concept down to a low enough level to enable the benefits of continuous integration & delivery that DevOps automation can provide.

nathan-green_devops-2The monolithic process design encapsulates multiple functions within a single object, commonly a script or piece of SQL code.

To test this object, we must supply object definitions for all inputs, test data for all inputs and expected results for the output.

Testing this single component does not provide any particular challenges when done in isolation, however when integration test requirements are considered the limitations of monolithic design become apparent.

Consider the case where the output of this process is consumed by 2 downstream processes.

nathan-green_devops-3For integration testing, we must also test the downstream processes to ensure that they still produce expected results. The testing scope has now increased significantly and this may reduce the frequency of our integration testing, based on the elapsed time of executing the full suite of tests required.

Organisations who have implemented automated deployment pipelines often report that integration testing is only done overnight, as the end to end elapsed time is “hours”.

When implementing automated deployment into an existing environment this is going to be the starting position, as you must incorporate the existing monolithic processes and implement improvements over time.

A modular process design can be thought of as a monolithic, complex process decomposed into a series of simple, atomic functions. In the agile world this is analogous to taking a user story and decomposing it into a number of tasks.

nathan-green_devops-4We now have four separate objects implementing the previous monolithic process.

In general, there will be a reduced number of inputs to the object, as only the inputs necessary for that specific function are required.

Output objects will tend to be reused in subsequent functions, and there will be an increased number of working/temporary objects to store intermediate results.

The number of objects and associated artifacts (code, scripts, object definitions, test data, etc…) has increased, while the complexity of each artifact has decreased.

Managing the artifacts is a perfect candidate for automation. Many of the artifacts can be generated using templates and customised with parameters.

nathan-green_devops-5Unit testing now has a much reduced scope, if we make a change to the business rule we just need to test the inputs and output objects.

Each individual test will be simpler, and it will be easier to expand the test coverage. The elapsed time of testing will be shorter, allowing more frequent testing.

How does this impact on the scope of integration testing? Impact to external processes in our example should be constrained to the case where we change the “apply changes” function, as that is the only point where output is produced ready for use by other processes. In this case we must integration test that function and any consumers.

Let us assume that the output table of our changed function is used in both the “resolve keys” and “apply business rule” functions of the dependent processes.

nathan-green_devops-6We now have what I call a “minimum testable component” for integration testing which ensures that all dependent processes are tested, but also keeps the scope of the testing to a minimum.

There will tend to be less permanent source objects involved in the testing, the functions tested are simple, the individual tests are simple but comprehensive and elapsed time will be the minimum possible.

This is the road to the holy grail of continuous integration testing, where the end to end elapsed time of the testing fits within a small enough window that the testing can be performed on demand, for every change committed into the source code repository.

Continuous integration enables continuous delivery, where every change has the potential of automatically flowing through the pipeline to be deployed into production.

Organisations I have seen that have implemented this do not let all changes automatically deploy to production, or indeed let all changes proceed with the minimum testable component for integration testing, that is an aspirational goal rather than normal practice.

Changes are rated according to risk – high, medium & low. Low risk changes can be automatically deployed to production, given all testing succeeds. Medium risk changes have some manual governance checks applied before deployment, and may trigger off integration testing with expanded scope. High risk changes are typically implemented in a scheduled release, with supporting resources (developers and operations) close at hand.

The goal is to minimise the risk of each change. Over time development teams will realise that they can deploy much faster by implementing many simple, low risk changes rather than a smaller number of monolithic high risk changes.

Following modular design techniques will help you to maximise the number of low risk changes, enabling agility in delivery of new functionality within an analytics environment which traditionally has been viewed (and built) as a large monolithic application.

Modular design will also help you to start to unravel the complexities of automating your deployment pipeline, with every small step forward providing benefits though increased code quality, reduction in production defects and faster time to market for new functionality.

The post DevOps Decoded: Modular Design appeared first on International Blog.

 

September 12, 2016


Revolution Analytics

2016 Data Science Salary Survey results

O'Reilly has released the results of the 2016 Data Science Salary Survey. This survey is based on data from over 900 respondents to a 64-question survey about data-related tasks, tools, and the...

...

Revolution Analytics

Volunteer to help improve R's documentation

The R Consortium, in its most recent funding round, awarded a grant of $10,000 to The R Documentation Task Force, whose mission is to design and build the next generation R documentation system....

...
Principa

How Marketers can use Machine Learning to boost Customer Loyalty

Thanks to mobile technology, wearable devices, social media and the general pervasiveness of the internet, an abundance of new customer information is now available to marketers. This data, if leveraged optimally, can create opportunities for companies to better align their products and services to the fluctuating needs of a demanding market space.


Simplified Analytics

Digital Transformation - Top 5 challenges to overcome

The Digital Tsunami is moving at a rapid pace, encompassing all aspects of business and society. It touches every function of a business from purchasing, finance, human resources, operations, IT and...

...
Data Digest

17 Quotes on Big Data and Analytics that Will Open Your Eyes to Reality


There are times when perception is not a clear representation of reality. Take for example the topic of Big Data and Analytics. The perception is that this ushers in a brave new world where there is actionable intelligence, on-demand data and sexy graphs and charts popping up on our computer screens on the fly. While this could be a reality for some, this is certainly not the case for many – at least, not yet.

In the course of our conversations with noted Chief Data Officers, Chief Analytics Officers and Chief Data Scientists for our conferences and events, some priceless gems of knowledge had been uncovered. Knowledge that can only come from people who’ve actually been on the front lines and understand Big Data and Analytics, warts and all. Here are 17 quotes that will inspire you in your own journey and open your eyes to reality.

1. The reality that ‘real-time’ is not necessarily good all the time.


2. The reality that data governance is an absolute must.


3. The reality that personalization is the name of the game.


4. The reality that CDOs/CAOs/CDSs need to be leaders more than anything.


5. The reality that acceptance to data-driven decision making takes consistent effort.


6. The reality that a CEO buy-in is absolutely important.


7. The reality that the success of Big Data relies heavily on people.


8. The reality that change can only happen when you change yourself.


9. The reality that data and analytics cannot succeed if it’s used as a tool for punishment.


10. The reality that data analytics is not an option. It’s a must.


11. The reality that cool is in.


12. The reality that data breach comes from all angles.


13. The reality that quick wins must happen for long term gain to be sustained.


14. The reality that ‘data ownership’ is passé in today’s environment.


15. The reality that privacy must be ensured at all times.


16. The reality that Big Data will be available to everyone and not just a few select individuals.


17. The reality that expectations today are greater than ever before.


To learn more about Big Data, Analytics and Digital Innovation or to attend our upcoming conferences and meet the leading Chief Data Officers, Chief Analytics Officers and Chief Data Scientists, visit www.coriniumintelligence.com 
 

September 09, 2016


Revolution Analytics

Because it's Friday: The Happy Files

We've looked before at how performing a cheerful song in a minor key makes it mournful. Now, here's the other side of that same coin: the X-File title theme, played in a major key, sounds downright...

...

Revolution Analytics

A predictive maintenance solution template with SQL Server R Services

by Jaya Mathew, Data Scientist at Microsoft By using R Services within SQL Server 2016, users can leverage the power of R at scale without having to move their data around. Such a solution is...

...

Revolution Analytics

The palettes of Earth

Take a satellite image, and extract the pixels into a uniform 3-D color space. Then run a clustering algorithm on those pixels, to extract a number of clusters. The centroids of those clusters them make a representative palette of the image. Here's the palette of Chicago:

Chicago
The palette of Chicago

The R package earthtones by Will Cornwell, Mitch Lyons, and Nick Murray — now available on CRAN — does all this for you. Pass the get_earthtones function a latitude and longitude, and it will grab the Google Earth tile at the requested zoom level (8 works well for cities) and generate a palette with the desired number of colors. This Shiny app by Homer Strong uses the earthtones package to make the process even easier: it grabs your current location for the first palette, or you can pass in an address and it geolocates it for another. That's what I used to create the image above. (Another Shiny app by Andrew Clark shows the size of the clusters as a bar chart, but I prefer the simple palettes.) There are a few more examples below, and you can see more in the earthtones vignette. If you find more interesting palettes, let us know where in the world you found them in the comments.

 

Broome
The palette of Broome, Australia

 

 

 

Qatar
The palette of the middle of Qatar

Will Cornwell (github): earthtones


Revolution Analytics

In case you missed it: August 2016 roundup

In case you missed them, here are some articles from August of particular interest to R users. An amusing short video extols the benefits of reproducible research with R. A guide to implementing a...

...
 

September 08, 2016


BrightPlanet

Tyson Johnson Session Speaker at ACFE Fraud Conference Canada

Our own Tyson Johnson will be speaking at next week’s ACFE (Association of Certified Fraud Examiners) Fraud Conference Canada. The conference is taking place September 11-14 in Montreal, QC. Tyson’s session is titled Building an Online Anti-Fraud Open Source Monitoring Program. In the session he’ll cover how fraud examiners are becoming more adept at using open sources […] The post Tyson Johnson Session Speaker at ACFE Fraud Conference Canada appeared first on...

Read more »

Revolution Analytics

The elements of scaling R-based applications with DeployR

If you want to build an application using R that serves many users simultaneously, you're going to need to be able to run a lot of R sessions simultaneously. If you want R to run in the cloud, you...

...
Silicon Valley Data Science

Image Processing in Python

Editor’s note: This post is part of our Trainspotting series, a deep dive into the visual and audio detection components of our Caltrain project. You can find the introduction to the series here.

The first step in developing our Caltrain project was creating a proof of concept for the image processing component of the device we used to detect passing trains. We’re big fans of Jupyter Notebooks at SVDS, and so we’ve created a notebook to walk you through that proof of concept.

Check out the notebook here

You can also download the .ipynb version and do some motion detection of your own. Later blog posts in this series will cover making this algorithm robust enough for real-time deployment on a Raspberry Pi.

Let us know any questions in the comments below, or share which pieces of the Caltrain project you’re most interested in. If you’d like to keep up with our activities, please check out our newsletter.

The post Image Processing in Python appeared first on Silicon Valley Data Science.

Principa

From Credit-Worthy to Target-Worthy: How Predictive Scoring is being used in Marketing

As a Marketer or Customer Engagement professional, imagine the cost-savings if you knew who in your database or lead list were likely to be the most profitable customers or most likely to respond? Would you bother mailing a list of a million contacts if you knew that only 100,000 of those contacts were “worthy” of your campaign and very likely to respond?

Innovation is not necessarily the invention of something new, but be the result of finding a new use for an existing product, service, methodology or practice. Take the use of predictive scoring in Marketing.

Data Digest

Chief Analytics Officer Survey: 57% say 'culture' a key barrier in advancing data and analytics strategy


We have recently conducted a survey of the Chief Analytics Officer Forum attendees to find out some of the key issues facing them and their solutions investment plan in the next 12-24 months. In this survey, it was revealed that 57% of respondents found ‘Driving cultural change’ as the biggest barrier to advancing data and analytics strategy. This was closely followed by ‘Integration of new technology with legacy systems’ (52%) and ‘Getting buy in from business units’ (41%).

The result of this survey actually tallies with key industry findings. In an article entitled ‘4 strategies for driving analytics culture change’ published by CIO.com, it was boldly declared that: “Culture change is hard." It further continues that "the solution lies in a mix of tooling and analysis and information delivery architecture. Often, culture changing strategies can fall flat because they approach the problem from purely a tooling perspective. Vendors offering such tools paint a rosy picture of how the right tool can change the culture and behavior within an organization. However, the problem often is more complex.”

Culture change is hard. The solution lies in a mix of tooling and analysis and information delivery architecture.

Meanwhile, the Chief Analytics Officer Survey also looked at the respondents’ investment plans and a whopping 70% of all respondents (81% of whom are C-Suite Decision Makers) revealed that they plan to invest in Data Analytics solutions in the next 12-24 months, followed by Predictive Analytics solutions (64%) and Business Intelligence tools (62%). It’s also interesting to note that 68% of them choose solutions at Conferences and Events.

More of our findings from the infograph below:


 For more information on how you can join our upcoming events and conferences or if you're interested in sponsorship opportunities, visit www.coriniumintelligence.com   
Teradata ANZ

Analyzing your analytics

‘Eat your own dog food’ and ‘practice what you preach’. I love these sayings and I strongly support the idea behind them. If you believe in something, prove it. In the world of data and analytics, we’ve been able to ignore this wisdom for a long time. The time for change has come though! We are going to analyze the analytics.

In fact, understanding how data is used and what kind of analysis is done might just become a key capability in successfully utilizing company data. Of course, analytics have been the subject of analysis for a long time now. Most organizations have nice trending overviews of their data growth, the daily run times of the ETL jobs, and how many queries were run per application, per user, and per department. We need to take it to the next level though – just like we augmented the traditional (and still very important) dashboards and KPIs with self-service BI and advanced analytics.

Growth and self-service
Because of the rise in self-service analytics, a more efficient analysis of data is necessary. The sort of trending analysis mentioned above was sufficient to get a good idea of what’s going on when the realm of data was controlled by data engineers, with strict processes and managed governance. But the more you open your data to savvy business analytics, data scientists, app builders and others, the more you lose sight of who is doing what with which data and for which purpose. And, by all means: we don’t want to burden our creative people with a pile of administrative paperwork. That’s where the metadata comes in.

Follow the trail
All this experimental work leaves its traces. When analyzed properly, these traces can tell a lot about what’s going on. They can provide insight into which data is used the most, and in combination with what other data. Do people perform their own additional transformation when using the data? Is data just simply exported or is it used the way it was initially intended? These insights can shed light on who will get angry if some data will be deleted.

If possible, all analytics should be performed pro-actively. If the results are made available in an open, searchable way, everyone can profit – this kind of information can be valuable to a wider range of professionals than just the DBAs and the data engineers. If an analyst discovers that everyone else is using a different join condition than he does, it might be good for him to find out why this is the case.

Homegrown solution
As far as I know, the number of organizations that already started analyzing their analytics is still limited. In the few active cases, homemade solutions are used to manage the analytics. This proves that there is still a long way to go. Nonetheless, there are already a couple of tools popping up addressing this need. A personal favorite of mine is Alation. This tool manages to bring more value from the available metadata. An example of how this is done is the use of Google’s page rank algorithm, to determine which tables are most important. Besides the automated analysis of logs etc., it also aims to efficiently use the results by augmenting them with lots of user input. This makes it a very useful tool for collaboration – another key aspect for being successful in the world of unlimited analytical freedom.

Investing in data
Earlier, I stated that analyzing your own analytics is necessary to get a competitive edge out of using your data. Without this input, it will be impossible to efficiently manage the provisioning and management of all your data. As the move towards an analytical ecosystem with multiple platforms/techniques keeps gaining more steam – and with it the increasing complexity of organizing data – you need all the help you can get. Using these analytics to find out which data is used often and which isn’t, will help you to decide which data should be moved to your cheaper storage platform. Finding out on which data users apply much extra transformation logic can tell you where more modeling is needed. And last but not least, data that is frequently copied or exported may simply reside on a platform with the wrong capabilities.

Just like companies use web analytics to continuously improve their website (based on what they learn from analyzing the web visits), companies need to improve their data by examining the way it is being used. Truly understanding how your data is used requires facts and smart analytics. Lots of them.

 

The post Analyzing your analytics appeared first on International Blog.

 

September 07, 2016

Ronald van Loon

Why Do Television Companies Need a Digital Transformation

header

Over just a few years, the world of television production, distribution, and consumption has changed dramatically. In the past, with only a few channels to choose from, viewers watched news and entertainment television at specific times of the day or night. They were also limited by where and how to watch. Options included staying home, going to a friend’s house, or perhaps going to a restaurant or bar to watch a special game, show, news story, or event. When we are taking about the TV industry has now been completing and moving to the high definition from the standard definition, now the discussion is about 4K and 8K video standard. But before all these things happen, analog based broadcasting needs to transform digitally. That means TV industry is unavoidable needing a disruptive transformation in their ICT platform to cope with the new processes of acquisition, production, distribution and consumption.

Fast-forward to today, and you have a very different scenario. Thanks to the rise of the Internet – and, in particular, mobile technology – people have nearly limitless options for their news and entertainment sources. Not only that, but they can choose to get their news and other media on TV or on a variety of smart devices, including phones, tablets, smart watches, and more.

Improved Business Value From New Information and Communication Technologies (ICT)

The world has changed, and continues to change, at a rapid pace. This change has introduced a number of challenges to businesses in the television industry. Making the digital media transformation can do a number of things to resolve these challenges and improve your business and viewership.

With leading new ICT, you can see significant business value and improved marketing and production strategies. For example, making this transformation can vastly improve your television station’s information production and service capabilities. It can also smooth the processes involved with improving broadcasting coverage and performance as well.

With these improvements, your station will have faster response times when handling time-sensitive broadcasts. This delivers to your audience the up-to-the-minute coverage and updates they want across different TV and media devices and platforms.

Improved Social Value with New ICT

A television station that refuses to change and evolve with viewers’ continuously evolving needs and wants will find themselves falling behind competitors. However, a TV station that understands the necessity to make the digital media transformation will have significantly improved social value with their audiences.

Television stations that embrace new technology, digital media, storage, cloud computing and sharing will see massive improvements in social value. Consider that this transformation enables your station to produce timely and accurate reports faster, giving your audience the freshest information and entertainment.

By bringing news and entertainment media to your audience when, where and how they want and need it, you can enrich their lives and promote a culture of information sharing that will also serve to improve your ratings and business. With technologies like cloud-based high-definition video production and cloud-based storage and sharing architectures, you can eliminate many of the challenges and pain points associated with reporting news and bringing TV entertainment to a large audience.

Why Do Television, Media, and Entertainment Companies Need a Digital Transformation?

Consider the basic steps that a TV news station must take to get the news to their audience:

  • Acquisition
  • Production
  • Distribution
  • Consumption

For television stations that have not yet embraced a digital media transformation, these steps do not just represent the process of delivering news media to the public. They also represent a series of pain points that can halt progress and delay deadlines. These include:

  • Traditional AV matrices use numerous cables, are limited by short transmission distance for HD signals and require complicated maintenance, slowing down 4K video evolution.
  • Delays when attempting to transmit large video files from remote locations back to the television station.
  • Delays when reporters edit videos because office and production networks in TV stations are separated from each other, requiring them to move back and forth between the production zone and the office zone in their building to do research
  • Delays due to the time it takes to transmit a finished program (between six and twenty-four minutes, depending on the length and whether or not it is a high-definition video) to the audience.
  • 4K video production has much higher requirements on bandwidth and frame rates.

These challenges all occur in traditional structures and architectures for media handling, but they quickly dissolve when a TV station makes the digital transformation and begin using a cloud-based architecture with new ICT.

Keeping Up With Viewer Demand via Ultra High Definition (UHD) Omnimedia

Increasingly, viewers demand more and more individualized experiences. These include interactive programming, rich media, UHD video, and they want it across all applicable devices. Delivering UHD omnimedia is only possible through new ICT, as older IT infrastructures simply cannot scale to the levels necessary to keep up with viewer demands.

Fortunately, through cloud-based architectures and faster sharing, networks and stations may not only keep up with consumer demand but actually surpass it. For example, when using 4K formatting, your station can provide viewers with the highest resolution possible (4096 x 2160 pixels), and your video formatting will be easily scalable for different platforms for the most convenient viewing possible.

Furthermore, by becoming an omnimedia center, your station can enjoy the benefits of converged communications. Essentially, this means that you will be creating information and/or entertainment that can be used in multiple different ways for television, social media, news sites, etc., giving you more coverage and exposure than ever before.

What Is Required to Make the Transformation to Digital Media?

Cloud computing and embracing 4K for video formatting are both essential to digital media transformation, but they are not all that is necessary. Aside from these two elements, television stations can take advantage of advances in technology in a number of ways to improve their marketing and production strategies through the use of new ICTs.

For example, thin clients and cloud computing could enable video editing anywhere and anytime, increasing efficiency. In order to improve the latency between the thin clients and the cloud, with the help of enhanced display protocol, virtual machine and GPU virtualization technology, the new ICT architectures today can enable a smooth editing of 8-track HD video in audio / video synchronization, or even support 6-track 4K video editing on clients via the industry’s only IP storage system.

As mentioned earlier, through cloud computing, it is no longer necessary to physically transport video from a news site to the station. Likewise, it is no longer necessary to do all production work and research in separate areas. Thanks to cloud storage and sharing, these pain points can easily be eliminated, as sharing and sending information becomes much simpler and faster.

An all-IP based video injection process is a must if TV stations want to lower network complexity and simplify system maintenance. There are two ways to approach this:

  1. For example, IP cables can replace traditional SDI signals. Each cable transmits 1 channel of 4K video signal. (SDI requires 4 cables to transmit the same video.) Thus, using IP cables can reduce the number of necessary cables by up to 92%, improving O&M efficiency by 60%, and bringing convenience to system interworking and interaction.
  2. With the help of mobile broadband, WAN accelerated networks, smart phones or tablets, journalists in the field can now shorten the video submission process by 90%. Most importantly, cloud computing allows journalists to edit video anywhere and anytime. With the help of fast trans-coding resources in the cloud, real time video reporting is now possible.

Another major factor in any digital media transformation is big data and data analytics. By collecting and analyzing information on your station’s viewers, you can better create more personalized viewing experiences. Netflix has, perhaps, one of the best and most widely known examples of this, as they have created specific algorithms based on previous customer behavior to predict whether or not a viewer will enjoy a certain film or show, and which media to recommend for any viewer.
Through these and other information and communication technologies, such as the Internet of Things (IoT), SDN (software-defined networking), improved mobile broadband, etc., television stations can bring faster, more accurate, and more convenient news and entertainment to their customers and viewers.

Who Is Leading the Way in the Transformation?

In my opinion, the company who has complete agile innovations across cloud-pipe-device collaboration will lead the way to transformation. One of companies in China called Huawei is now trying to create an ecosystem for the global channel partners and solution partners across the news and media entertainment industry, and it provides an open ICT platform that encourages media industry developers to continue to innovate their products. With strong development in cloud-based architectures, SDN, mobile broadband, and IoT, developers and partners are able to create the most comprehensive solutions that best empower media stations of all kinds to move into the future.

What do you think of the digital media transformation in the Television Industry?

Leave your feedback and questions in the comments, and follow me on LinkedIn and Twitter for more information on Big Data, IoT, Data Science, and Digital Transformation.

 

Ronald

Ronald helps data driven companies generating business value with best of breed solutions and a hands-on approach. He has been recognized as one of the top 10 global influencers by DataConomy for predictive analytics, and by Klout for Data Science, Big Data, Business Intelligence and Data Mining and is guest author on leading Big Data sites, is speaker/chairman/panel member on national and international webinars and events and runs a successful series of webinar on Big Data and on Digital Transformation. He has been active in the data (process) management domain for more than 18 years, has founded multiple companies and is now director at Adversitement, leader in Big Data & data process management solutions. Broad interest in big data, data science, predictive analytics, business intelligence, customer experience and data mining. Feel free to connect on Twitter or LinkedIn to stay up to date on success stories.

More Posts - Website

Follow Me:
TwitterLinkedIn

Author information

Ronald helps data driven companies generating business value with best of breed solutions and a hands-on approach. He has been recognized as one of the top 10 global influencers by DataConomy for predictive analytics, and by Klout for Data Science, Big Data, Business Intelligence and Data Mining and is guest author on leading Big Data sites, is speaker/chairman/panel member on national and international webinars and events and runs a successful series of webinar on Big Data and on Digital Transformation. He has been active in the data (process) management domain for more than 18 years, has founded multiple companies and is now director at Adversitement, leader in Big Data & data process management solutions. Broad interest in big data, data science, predictive analytics, business intelligence, customer experience and data mining. Feel free to connect on Twitter or LinkedIn to stay up to date on success stories.

The post Why Do Television Companies Need a Digital Transformation appeared first on Ronald van Loons.

The Data Lab

The Data Lab takes part at Byte Night

Byte Night

Byte Night is Action for Children's biggest annual fundraiser; a national ‘sleep-out’ event. It started in 1998, when 30 friends slept out in London and raised £35,000. Since then Byte Night has raised over £7.3 million to tackle the root causes of youth homelessness. Byte Night is now the UK's largest sleep out event, with individuals and teams from the technology and business services sleeping out to raise vital funds to prevent youth homelessness. This year Byte Night is bringing together the UK’s largest corporate companies across eight locations with an aim to raise £1.2 million. 

Action for Children have been helping the young and vulnerable for over 145 years, working in local communities to protect and support disadvantaged children as they grow up. Today, they support more children and their families than any other children’s charity. More than 390,000 children, young people, parents and carers across the UK are supported by Action for Children's more than 600 services, to make their lives better today; tomorrow and every day. They work to make sure every child has the love, support and opportunity they need to reach their potential. You can see some of Byte Night's success stories here.

Over 80,000 young people in the UK find themselves homeless each year. Action for Children believes in early intervention, finding the root causes of youth homelessness and providing support to families before their issues reach crisis point. 

Action for Children has never been afraid to push for the best for children, and we strongly relate to that. We believe the work they are doing is saving lives, and so we decided to get involved and help. And you can too! A donation of just £5 could pay for a hot meal for a homeless young person. £10 can buy a set of pots and pans to help a formerly homeless young person start to build their own home. By donating £18, you could pay for a warm shower and hygiene kit to help a young person prepare for interviews.

To get involved and help us raise more funds for this cause, please donate through our Just Giving page.

Thank you for your support!

 

Template: 
Image
Share: 
Google Plus
Facebook
Pinterest
Twitter
Teradata ANZ

Siemens Sinalytics – Keeping The Customer Satisfied

Don’t wait for a failure and then fix it. Predict the failure and then prevent it.

This was the key message I took away from the keynote address given by Dr. Norbert Gaus from Siemens, at the Teradata Universe conference, Hamburg. He was talking about the company’s Sinalytics program, which they use to provide analytics on their ‘Web of Systems’.

Dr Gaus illustrated their philosophy with a great story about Renfe, the Spanish railway company. Renfe operate Siemens-built Valero E high-speed trains, which operate with as high as 99.9 percent punctuality. And if, however unlikely, a train is more than 15 minutes late, Renfe guarantees passengers a full refund.

Now, I’m sure that regular train commuters among you are thinking: “If only…”. But Siemens’ ambition is to turn all their devices and systems into smart systems to offer customers optimum availability and performance.


Define. Measure. Act.

I also heard Chris Twogood (Teradata marketing) highlighting the importance of measuring and acting on levels of customer satisfaction. Teradata have launched a Customer Satisfaction Index application which runs on Teradata Aster. It allows organisations to define and monitor the key events that influence customer satisfaction and to create individual customer satisfaction scores.

These talks got me thinking: “If only Siemens had operated my Hamburg flight and my airline had a better handle on customer satisfaction, then I reckon the trip would have been a far more agreeable customer experience”. Here’s what happened.

At 04:45 I got a customer service text message from the airline telling me that my 07:45 flight was going to be delayed and I wouldn’t be flying until 11:00. However, they told me to report to the airport on time in any case. At the airport, I joined the queue for a customer service agent who could advise me on a new schedule because the delay meant that I would miss my connection. Finally, an hour-and-a-half later, I was re-booked on different flights.

A few more zzzzzs would’ve been nice.

From a customer service point of view, it would have been better if my text message not only advised me of the delay but also included details of my revised schedule. It would have been a more positive start to the day. An extra couple of hours in bed – knowing I’d be late maybe, but at least I’d have had the chance to give colleagues a heads-up and to sort out some alternative arrangements.

It was even more galling to find out that the delay was caused by a technical fault on the aircraft. Okay, failures happen and it’s a good idea to fix faults before allowing me and my fellow passengers to board. But the fault had been diagnosed the night before and the delay was extended by having to fly-in a replacement part from another UK airport.

Faultless performance.

Could they have predicted the fault? Possibly. But even if they couldn’t, why wait for a scheduled flight to deliver the part? Wouldn’t the cost of an overnight courier be worth the money to avoid inconveniencing 150 dissatisfied paying passengers?

As organisations like Siemens build more sophisticated smart products they raise expectations, and customer satisfaction and predictive analytics become intrinsically linked. So, no longer is it okay just to inform the customer of problems. A fix or a workaround has to be proffered otherwise the communication is just another source of digital noise and customer frustration.

And that’s no kind of solution.

This post first appeared on Forbes TeradataVoice on 20/05/2016.

The post Siemens Sinalytics – Keeping The Customer Satisfied appeared first on International Blog.

 

September 06, 2016

Silicon Valley Data Science

Introduction to Trainspotting

Here at Silicon Valley Data Science, we have a slight obsession with the Caltrain. Our interest stems from the fact that half of our employees rely on the Caltrain to get to work each day. We also want to give back to the community, and we love when we can do that with data. In addition to helping clients build robust data systems or use data to solve business challenges, we like to work on R&D projects to explore technologies and experiment with new algorithms, hypotheses, and ideas. We previously analyzed delays using Caltrain’s real-time API to improve arrival predictions, and we have modeled the sounds of passing trains to tell them apart. In this post we’ll start looking at the nuts and bolts of making our Caltrain work possible.

If you have ever ridden the train, you know that the delay estimates Caltrain provides can be a bit…off. Sometimes a train will remain “two minutes delayed” for ten minutes after the train was already supposed to have departed, or delays will be reported when the train is on time. The idea for Trainspotting came from our desire to integrate new data sources for delay prediction beyond scraping Caltrain’s API . Since we had previously set up a Raspberry Pi to analyze train whistles, we thought it would be fun to validate the data coming from the Caltrain API by capturing real-time video and audio of trains passing by our office near the Mountain View station.

There were several questions we wanted our IoT Raspberry Pi train detector to answer:

  1. Is there a train passing?
  2. Which direction is it going?
  3. How fast is the train moving?

Sound alone is pretty good at answering the first question because trains are rather loud. To help answer the rest of the questions, we added a camera to our Raspberry Pi to capture video.

We’ll describe this process in a series of posts. They will focus on:

  1. Introduction to Trainspotting (you are here)
  2. Image Processing in Python
  3. Streaming Video Analysis with Python
  4. Streaming Audio Analysis and Sensor Fusion
  5. Recognizing Images on a Raspberry Pi
  6. Connecting an IoT device to the Cloud
  7. Building a Deployable IoT Device

Let’s quickly look at what these pieces will cover.

Walking through Trainspotting

In the upcoming “Image Processing in Python” post, Data Scientist Chloe Mawer demonstrates how to use open-source Python libraries to process images and videos for detecting trains and their direction using OpenCV. You can also see her recent talk from PyCon 2016.

In “Streaming Video Analysis with Python,” Data Scientist Colin Higgins and Data Engineer Matt Rubashkin describe the steps to take the video analysis to the next level: implementing streaming, on-Pi video analysis with multithreading, and light/dark adaptation. The figure below gives a peek into some of the challenges in detecting trains in varied light conditions.

Challenges in detecting trains in varied light conditions

Challenges in detecting trains in varied light conditions

In a previous post mentioned above, Listening to Caltrain, we analyzed frequency profiles to discriminate between local and express trains passing our Sunnyvale office. Since that post, SVDS has grown and moved to Mountain View. Since the move, we found that the pattern of train sounds was different in the new location, so we needed a more flexible approach. In “Streaming Audio Analysis and Sensor Fusion,” Colin describes the audio processing and a custom sensor fusion architecture that controls both video and audio.

After we were able to detect trains, their speed and their direction, we ran into a new problem: our Pi was not only detecting Caltrains (true positive), but also detecting Union Pacific freight trains and the VTA light rail (false positive). In order to boost our detector’s false positive rate, we used convolutional neural networks implemented in Google’s machine learning TensorFlow library. We implemented a custom Inception-V3 model trained on thousands of images of vehicles to identify different types of trains with >95% accuracy. Matt details this solution in “Recognizing Images on a Raspberry Pi.”

Trainspotting_tensorflow

In “Connecting an IoT Device to the Cloud,” Matt shows how we connected our Pi to the cloud using Kafka, allowing monitoring with Grafana and persistence in HBase.

Monitoring our Pi with Grafana

Monitoring our Pi with Grafana

The tools and next steps

Before we even finished the development on our first device, we wanted to set up more of these devices to get ground truth at other points along the track. With this in mind, we realized that we couldn’t always guarantee that we’d have a speedy internet connection, and we wanted to keep the devices themselves affordable. These requirements make the the Raspberry Pi a great choice. The Pi has enough horsepower to do on-device stream processing so that we could send smaller, processed data streams over internet connections, and the parts are cheap. The total cost of our hardware for this sensor is $130, and the code relies only on open source libraries. In “Building a Deployable IoT Device,” we’ll walk through the device hardware and setup in detail and show you where you can get the code so you can start Trainspotting for yourself.

Device and hardware setup supplies

Device and hardware setup supplies

If you want to learn more about Trainspotting and Data Science at SVDS, stay tuned for our future Trainspotting blog posts, and you can sign up for our newsletter here. Let us know which pieces of this series you’re most interested in.

You can also find our “Caltrain Rider” in the Android and Apple app stores. Our app is built upon the Hadoop Ecosystem including HBase and Spark, and relies on Kafka and Spark Streaming for ingestion and processing of Twitter sentiment and Caltrain API data.

The post Introduction to Trainspotting appeared first on Silicon Valley Data Science.

decor