Planet Big Data logo

Planet Big Data is an aggregator of blogs about big data, Hadoop, and related topics. We include posts by bloggers worldwide. Email us to have your blog included.


October 18, 2018

Revolution Analytics

Maryland's Bridge Safety, reported using R

A front-page story in the Baltimore Sun reported last week on the state of the bridges in Maryland. Among the report's findings: 5.4% of bridges are classified in "poor" or "structurally deficient"...


October 16, 2018

Making Data Meaningful

Oracle supports Microsoft

I can’t tell you how many times I’ve been in conversations around the topic of “Oracle vs. Microsoft”. I’ve heard both sides of the story ranging from “SQL Server for mission critical operations…are you crazy!” to “Oracle costs me my first born child…year after year!”. While these discussions are often entertaining, the line delineating the two database giants is blurring by each subsequent release.

In my years consulting for LÛCRUM, I have worked for numerous clients that have had installations of both Oracle and Microsoft running in their environments. With recent statistics estimating that Oracle controls >50% of the database market and Microsoft controlling >50% of the server operating system market, are you surprised? SQL Server only runs on Microsoft. Oracle offers more operating system versatility. While you’ll see UNIX and Linux installations, Oracle’s ability to run on Microsoft remains strong and they are improving their functionality with respect to Microsoft development. Where might an Oracle database deployed on a Microsoft server make most sense? In the small and mid-sized business market (SMB). In the SMB market, Oracle has competitively priced versions such as Oracle Database Standard Edition and Standard Edition One.

So what advantages does running Oracle on Microsoft have to offer? First, Oracle has tight integration with Active Directory and Windows Security Framework. Items such as single sign-on and security via database role and Active Directory group fall into this category. Next, Oracle offers 32-bit and 64-bit versions. In the 32-bit version, Oracle is able to utilize up to 3GB (out of a 4GB O.S. maximum) of system memory for database use. Finally, Oracle has also been working on enhancing its ability to integrate with the Windows development suite, specifically Visual Studio 2008. Oracle supports .NET in 3 ways. The Oracle Data Provider for .NET leverages ADO.NET API and allows .NET applications to access Oracle data. These APIs should be familiar to most Microsoft developers. In addition, through an add-in (free for that matter), developers can work with Oracle services via Visual Studio 2005 (and 2008 as previously mentioned). Through the development suite, developers have access to various wizards to perform various database tasks (i.e. DDL), a procedure editor (for PL/SQL procedures, packages, and functions), a Debugger for runtime error interaction, and integrated help for items such as Oracle error reference, SQL, and PL/SQL user manuals. Lastly, Oracle has integrated .NET extensions directly inside the database. This allows developers to created stored procedures and functions using C# or VB.NET within Visual Studio. This code can then be deployed to the database and referenced wherever a stored procedure or function is permitted.

Oracle has shown it is advantageous to offer solutions that fit neatly into an operating system that controls the majority of the server market, even if that vendor also happens to be a major competitor in the database market. Offer a product that is extensible and easy to use with development GUIs is sure to give you a seat at the table when it comes to choosing a solution for your organization. That is precisely why Oracle supports Microsoft (most of the time <grin>).

The post Oracle supports Microsoft appeared first on Making Data Meaningful.

Making Data Meaningful

What is Web 2.0?

Wikipedia defines Web 2.0 as “a term describing the trend in the use of World Wide Web technology and web design that aims to enhance creativity, information sharing, and, most notably, collaboration among users.” It also quotes Tim O’Reilly (who is widely credited with coining the term Web 2.0), as saying that “ Web 2.0 is the business revolution in the computer industry caused by the move to the Internet as platform, and an attempt to understand the rules for success on that new platform.”

Forrester Research defines Web 2.0 as “a set of technologies and applications that enable efficient interaction among people, content, and data in support of collectively fostering new businesses, technology offerings, and social structures.” (from “Global Enterprise Web 2.0 Market Forecast: 2007 To 2013″ by G. Oliver Young, dated April 21, 2008.)

The phrase Web 2.0 (which really is more of an idea than a software version) has now come to mean the new Web. In an informal sense, Web 2.0 refers to an enhanced form of the world wide web which provides a richer user experience, is more interactive, collaborative, dynamic, participative, and up-to-date. It’s not just about retrieving data anymore but also about creating data and shared knowledge often times in association with others. So users not only download or read but are able to actively upload and create/edit content.

When we speak of Web 2.0, we are speaking about particular technologies and features such as Really Simple Syndication (RSS), Weblogs (Blogs), wikis, forums, mashups, rich Internet Applications, collaborative or social tagging, podcasting, shared bookmarks, virtual team workspaces, widgets or gadgets, etc. XML, web services, AJAX, etc. are typical building blocks to enable these features of Web 2.0.

And Web 2.0 is big business! In the article quoted above from Forrester, we also read that “Enterprise spending on Web 2.0 technologies will grow strongly over the next five years, reaching $4.6 billion globally by 2013, with social networking, mashups, and RSS capturing the greatest share… In 2008, firms will spend $258 million on social networking tools. After blogs and wikis, mashup technology takes off next, growing from a small base of $39 million in 2007 to $682 million in 2013 — second only to social networking.”

Web 2.0 has already affected social behavior significantly and is beginning to now affect the workplace culture and the ways of doing business. There is enhanced communication because of these technologies and more collaboration between a business, its employees, partners and customers. There is increased knowledge sharing and growth in online communities of shared interests. New and exciting ways of presenting the same information are becoming available.

So even as we witness the evolution of the Web into its Web 2.0 form, we can already feel its revolutionary impact!

The post What is Web 2.0? appeared first on Making Data Meaningful.

Making Data Meaningful

BI Maturity: You can’t get there from here!

I spent last weekend fishing with my father, brother and nephews. Since they live in Connecticut and I in Ohio – we decided to meet half way – somewhere in “The Alleghenies and Her Valleys” to quote the brochure. While I usually rely on my trusty Magellan GPS, I had given it to my oldest daughter to borrow as she was driving south where the weather is warm. So that left me driving through the mountains and her valleys around midnight. Driving through a small town on a small road, looking for a very small park proved more than a challenge. Since I have three daughters and one wife, I have learned to swallow my pride and ask directions. I did, but on my walk in to the “Sheetz” gas station, I was thinking they might give me that response…”Well, you can’t get there from here – you gotta go somewhere else, then loop back around to get there.” However, I received perfect directions! Thank you Mr. Sheetz!

But that got me thinking as I stumbled upon a poster from the TDWI folks that spoke about BI maturity and adoption. It was a few years old, but some things are so true. I was excited to see that old friend and took some time to write up my thoughts on the matter – as well as capture their context.

In case you are wondering how I am going to tie the first two paragraphs together, here goes. From my experience, when senior leadership learns of the value that BI can bring, they really want to ‘get this thing going’. They want to launch a large, comprehensive, enterprise-wide, based-on-the-new-tools, BI innovation that will hit home runs and win ball games. Well the problem is that you can’t get there from here. 🙂  There is a progression that must happen. Farmers can’t just throw seed on the ground and expect to make a profit – there is work to be done to prepare the soil correctly, then lots of care and feeding, praying for rain (but not too much rain) and then F  I  N  A  L  L  Y  –> the harvest!

TDWI states that “Most organizations go through six stages when evolving their BI environment from a cost-center operation to a strategic resource that drives the business and shapes the market.”

Using their framework, here is how I break it down…

1.     The Beginning (TDWI calls it ‘Prenatal’):  Since this is mostly financial, there are a medium amount of standards and not much flexibility. The control is dictated from finance or finance needs. Causal users account for most activity. Power users may make use of this type of information and leverage it into their own ‘shadow’ systems. The problem is that there is a large IT backlog for these reports. The problem here is the information gap – we get the information after the decision had to be made. This decision latency could contribute to the wrong direction as the data it is built on is often not fresh. However, at this stage, a general ‘awareness’ exists – there is at least the existence of the correct information. To get here, there is a larger initial investment and costs are high as economies of scale are not yet a reality.

  • Architecture: Management Reporting
  • Scope: System
  • Type of System: Financial
  • Analytics: Paper Report
  • User: All
  • BI Focus: What Happened?
  • Executive Perception: Cost Center

2.     Army of One (TDWI calls it ‘Infant’): Lots of flexibility and not any standards to speak of – other than what is negotiated from one user to another. People think local and resist any global initiatives. Causal users use decline, while power users step in to take advantage of this new information. Still, power users are reliant on IT to set the stage for their data and IT continues to struggle with a backlog of requests. As we extract the right data and manually assemble it to address business problems, there is an understanding of what factors are leading to what business results. Cost are somewhat low as analysts are using their own tools and working with certain data sets extracted by IT.

  • Architecture: Spreadmarts
  • Scope: Individual
  • Type of System: Executive
  • Analytics: Briefing Book
  • User: Analyst
  • BI Focus: What Will Happen?
  • Executive Perception: Inform Executive


3.     Working as a team (TDWI calls it ‘Child’): Flexibility is somewhat high, but at this point is waning as people within the department begin to work together. But still not many standards to speak of – other than what is negotiated from one user to another or driven from within the department. People still think/act from a local perspective and resist any global initiatives. Causal users use starts to trend up to take advantage of some individual benefits provided to them at the department level. Once the organization deploys data marts based on the emerging standards, the BI environment becomes a self service type, where the bottle neck that once existed within IT has been removed. At this stage, an understanding of why things have happened is occurring because knowledge workers are using analytical systems to extract data to their own needs and using that data to draw conclusions about business events. Costs begin to creep a small amount as some technology is purchased, but overall not a big factor.

  • Architecture: Data Marts
  • Scope: Departmental
  • Type of System: Analytical
  • Analytics: Interactive Report
  • User: Knowledge Worker
  • BI Focus: Why Did It Happen?
  • Executive Perception: Empowers Workers


4.     Thinking bigger (TDWI calls it ‘Teenager’): Flexibility starts to fade as division wide standards arise. People see the need to work together and are driven by common divisional goals. Here there is an atmosphere of negotiation and consolidation as these standards are built out. Now causal user use is on the rise, as the wide standards lead to increased reliability on the data available within the BI ‘system’. Power users use remains flat; but their ideas are rolled back in to divisional solutions; they are seen as subject matter experts and are often tapped to provide leadership and direction for their domain.  The focus here is customized delivery at the divisional level; dashboards, scorecards, report cards and the like.  At this stage, managers are making use of the divisional wide dashboards and are given real time information that is actionable – what is happening right now.  Costs are rising as we work within the division to develop standards and customized delivery.

  • Architecture: Data Warehouse
  • Scope: Division
  • Type of System: Monitoring
  • Analytics: Dashboard
  • User: Manager
  • BI Focus: What Is Happening?
  • Executive Perception: Monitor Processes


5.     Mature (TDWI calls it ‘Adult’): Standards are formed at the enterprise level. Governance groups are now formal processes with the proper structure and sponsorship. Senior level support is solid. Although flexibility takes a dip at first as the organization learns, flexibility then trends up as efficiencies and learnings are gained. Truly people are planning globally to act local at this stage. At this stage, executives are also onboard as the mature BI environment serves to align all players within the organization.  The causal users use the system to help them understand what they should be working on and how their efforts affect the organization as a whole. The delivery mode transitions from a divisional perspective to the enterprise but is also balanced; balanced or cascading scorecards are the focus of the organization and serves as the single point of truth to answer the questions about our goals and progress towards them.  At this stage, executives are using the BI environment as a communication tool to both align the organization on goals and objectives as well as communicate the results and current situations. These pivotal points are balanced all the way through the organization and are socialized in a manner that is equal to business strategy. The balanced or cascading scorecards open up alternative decisions as all indicators that lead to a ‘score’ are actionable. Costs rise again because of the enterprise level investment and collaboration, but the value should increase dramatically.

  • Architecture: Enterprise Data Warehouse
  • Scope: Enterprise
  • Type of System: Strategic
  • Analytics: Cascading Scorecards
  • User: Executive
  • BI Focus: What Should We Do?
  • Executive Perception: Drive The Business


6.     Harvesting Relationships or Partnerships (TDWI calls it ‘Sage’): Leveraging the mature BI environment  by opening up that business service to clients is the last stage. Here standards and control continue to be formed, but originates from client relationships. Flexibility is also harvested as new ideas, thoughts, concepts are embraced by the mature BI environment. At this stage, our BI environment can now be thought of as a BI Utility for our customers; helping them to solve their business problems by making use of the organization’s rich and focused information in the form of customer focused solutions deployed to help build or bind relationships, thus increasing the value proposition to them by the organization. At this stage, the BI Utility becomes a needed part within the customer’s infrastructure (whether it be for a single customer’s need for specific and unique information or for a company with complex business processes). Costs take a dip, the data/information infrastructure is in place and the capital expenditure has been amortized. Tools already exist to develop rich applications.  The value increases exceedingly abundantly here, as a very narrow scoped application brings a deep penetration of partnership.

  • Architecture: Analytical Services
  • Scope: Inter-Enterprise
  • Type of System: Business Service
  • Analytics: Embedded BI
  • User: Customer
  • BI Focus: What Can We Offer?
  • Executive Perception: Drive The Market


Now that you know… Where do you see your organization? How can you actualize the next stage? What are some value statements that you can take to senior leadership? What business problems can you address? What type of socialization strategy will work best? Where should you invest and what will the return look like? Who can you trust to help you get there from here without shortcutting the maturity journey – proper growth is built on a series of solid foundations. These sucesses are the underpinnings of the needed BI elements; Trust, Vision, Focus, Value, Momentum…repeat.

Happy Maturing!

The post BI Maturity: You can’t get there from here! appeared first on Making Data Meaningful.

Making Data Meaningful

Enterprise 2.0 and Governance

Prof. Andrew McAfee (see my previous post) in a blog post in Nov 2006 asked his readers to consider this: Imagine two competitors, one of which has the guiding principle “keep security risks and discoverability to a minimum,” the other of which is guided by the rule “make it as easy as possible for people to collaborate and access each others’ expertise.”  Both put in technology infrastructures appropriate for their guiding principles.  Take all IT, legal, and leak-related costs into account.  Which of these two comes out ahead over time? 

My guess would be the latter.

So, what is Governance?

In simple terms I’d say that Governance is the set of policies, procedures and structures you define and establish in an organization to guide and direct the use of technology to achieve organizational goals. So it is about both IT and the business.

Many people think governance is about control and limiting the power of the user. On the contrary I think that good governance actually enables better collaboration – the more clearly the terms of use and the collaboration structures and mechanisms are defined, the easier it becomes to navigate the world of unstructured content. To me, governance is like the banks of a river – without the banks the river waters could cause enormous destruction and would spread out and be wasted. The banks provide direction and enable the river waters to be channeled and to achieve useful purposes.

In the Enterprise 2.0 context, one finds that companies are concerned when it comes to allowing the use of such technologies across the firewall. The typical concerns are around monitoring the content posted in blogs and wikis, fears over potential lawsuits emanating from the publication of information that is slanderous in nature or hate-oriented or which can be interpreted as being harassing or discriminatory in any shape or form. There is also concern around the potential for trade secrets, new product information, R&D information and other such inside information getting out. At the client I am at currently, there is tremendous concern over the possibility of allowing for collaboration between internal employees and any external entities.

I wonder if the very same concerns existed when email was first introduced or way back when the telephone was first made available in a business setting.

Most companies tend to have an information security policy and that seems to have generally sufficed to handle these concerns around email and phone calls. In a similar fashion, I think it’d be useful to extend that information security policy to cover Enterprise 2.0 technologies such as blogs, wikis, RSS, podcasts, social networking, etc. It’d also be useful to highlight and publicize these policies so that employees are aware that instant messages or blog or wiki posts or comments on a discussion thread are to be treated as public communication. Also, one needs to consider that typically online, social communities tend to be self-policing and self-correcting.

So I would like to suggest that for a company to successfully embrace Enterprise 2.0, it should first decide how it wants to handle the content that will be generated through the use of such technologies. Would it not be reasonable to assume that all such information should be treated as the company’s digital assets?

So when it comes to providing governance around your Enterprise 2.0 solution, it might be useful to look at the following areas:
* Findability – how can you make it easy to find relevant content so users do not have to remember URLs or content locations? What can you do to provide true enterprise search capabilities? Can the search experience be customized?
* Retention – how long should content be retained both from a legal perspective and because of the business necessity to find older content? How about a mechanism to archive content that isn’t being actively used?
* Versioning – how many versions of content would you like to support? Make it easy to go back to a previous version but manage this effectively to minimize storage costs. Can you enforce storage quotas?
* Information Architecture – what guidelines do you want to provide around navigation and search? What kind of metadata do you want captured with different kinds of content to make it easy to find pertinent information? Do you have specific thoughts around taxonomy? Also, would you want to use workflows to manage document state? How about content approval policies? How do you integrate the content in the collaboration system with your enterprise portals?
* Customization – is the system customizable and does it provide the ability to turn functionality on/off as needed? What kinds of user customization of the system would be acceptable? How do you plan to verify that those customizations are safe to be deployed to your Production environment? Will there be a rollback mechanism?
* Security – provide adequate security so that content that needs special security can be effectively protected. Also verify that there are adequate mechanisms to audit and report system usage, and enforce information management policies such as retention, auditing, expiration, etc.
* The terms of use – define what terms govern the use of your system.
* Acceptable content – state upfront what kinds of content are acceptable i.e. regarding text, images, videos, audio, etc. Also specify your policy around sharing this content externally.
* Integration of such systems into the organization’s Enterprise Content Management system – How do you envision the information flowing from the location that provides free 
collaboration to your enterprise content management system? How do you plan to handle e-discovery? Where should the final, legal record of content reside?
* Tools – finally, evaluate the available Enterprise 2.0 tools to see which one best meets your needs in light of the requirements outlined above.
* Documentation – document your policies and procedures and the custom framework you are going to implement with respect to the software tool(s) selected above and publicize its availability.

To exercise the system, a pilot rollout could be considered and the guidelines and policies then be tweaked appropriately based on relevant feedback. But after that, IT should get out of the way and strive to effectively enable and empower the business to use these Enterprise 2.0 technologies.

One other thing I think companies should focus on is identifying and establishing the process for maintaining a single version of the truth when it comes to content management. Having multiple redundant versions of the same content for example in email, the user’s pc, a shared network folder, a collaboration space and a content management system is not a good idea not only in terms of governance and compliance but also very expensive when you think of backup and storage costs and not to mention the amount of time lost in finding out which is the latest version or the single version of truth. I had a colleague use the term “single point of truth” or SPOT and thinking in terms of SPOT should be a key focus area for governance. In this regard, you could for instance institute a policy that states that internally, email should not be used to forward documents but instead that a link to the document on the intranet or collaboration area be emailed. The same policy could be adopted for external communication as long as an extranet site is available to share content with external collaborators.

Today it is becoming common that the business is beginning to make use of Web 2.0 tools without overt IT involvement. With regard to Web 2.0, this is not as significant an issue since it is mainly about the consumer aspects and geared towards the individual user. However, when it comes to Enterprise 2.0, I do not think that is necessarily the best thing to do long term. For Enterprise 2.0, my recommendation would be that the business work hand-in-hand with IT and make use of a corporate vendor that builds and integrates its Enterprise 2.0 offering with existing infrastructure and has the vision, and proven financial and technical abilities to engineer a solution that can scale well and provide the necessary controls and mechanisms outlined above.

I would like to propose that IT be forward-looking and embrace Enterprise 2.0 technologies and strive to empower and enable the business to effectively use such technologies. And I would also propose that the business work in association with IT to achieve its ends instead of pursuing solutions that are good for a single department but may not scale well to the enterprise or are unable to provide the functionality needed long term.

The post Enterprise 2.0 and Governance appeared first on Making Data Meaningful.

Making Data Meaningful

How is Enterprise 2.0 different from Web 2.0?

The term “Enterprise 2.0″ was coined by Harvard Professor Andrew McAfee, during 2006, in an MIT Sloan Management Review article entitled “Enterprise 2.0: The Dawn of Emergent Collaboration”, as opposed to Web 2.0 (which was popularized by Tim O’Reilly in 2004).

When asked “What is Enterprise 2.0?”, the typical response might be “The application of Web 2.0 in the enterprise”. AIIM, the Enterprise Content Management association, states that there is more to Enterprise 2.0 than that. “Web 2.0 is focused on consumer and public-facing Web sites although that distinction was not explicitly made in the original definition.Enterprise 2.0 is much more about businesses’ adoption of “2.0 mindsets” than with the consumer facing side of the coin.” Plus, there is the lack of preciseness around the term Web 2.0.

AIIM defines Enterprise 2.0 as: “A system of Web-based technologies that provide rapid and agile collaboration, information sharing, emergence, and integration capabilities in the extended enterprise.” (see AIIM Market IQ, Q1 2008, “Enterprise 2.0: Agile, Emergent & Integrated”, by Carl Frappaolo and Dan Keldsen).

A couple of frameworks for Enterprise 2.0 include one from Prof. McAfee, which goes by the mnemonic SLATES (Search, Links, Authorship, Tags, Extensions and Signals), and another from Dion Hinchcliffe which goes by the mnemonic FLATNESSES (Freeform, Links, Authorship,Tagging, Network-oriented, Extensions, Search,Social, Emergence and Signals).

In similar fashion to AIIM, Forrester Research believes that “the term Web 2.0 has come to embody both consumer and business use of next-generation Web technology but that this lumping together of services is too imprecise to be practical” (see Global Enterprise Web 2.0 Market Forecast: 2007 To 2013 by G. Oliver Young dated April 21, 2008). Young states that as a result, “most pundits and technology strategists segment the market between consumer Web 2.0 services and business Web 2.0 services.” Forrester thus refers to “the business Web 2.0 market as enterprise Web 2.0, which encompasses Web 2.0 technology and service investments for both externally facing marketing functions and internally facing productivity and collaboration functions.” So for example, Forrester doesn’t include Blogger, Facebook or Twitter as Enterprise 2.0 services even though they are Web 2.0 services.

Forrester believes that “enterprise Web 2.0 technologies represent a fundamentally new way to connect with customers and prospects and to harness the collaborative power of employees.” They specifically refer to Enterprise 2.0 technologies such as blogs, wikis, RSS, podcasting, social networking, mashups, and widgets.

The list of Enterprise 2.0 technologies provided by AIIM is quite similar and consists of mashups, blogs, wikis, RSS, podcasting, social voting/ranking and social bookmarking.

One has to remember though that even if Enterprise 2.0 technologies such as the ones listed above provide for rapid and agile collaboration and empowerment, there has to be a cultural openness to this within an enterprise for it to truly be successful. So it’s not only about aligning the technology with the business, but aligning the culture with the technology that now becomes the challenge.

In future posts, we will take a look at some of these Enterprise 2.0 technologies and the cultural issues around the adoption of Enterprise 2.0.

The post How is Enterprise 2.0 different from Web 2.0? appeared first on Making Data Meaningful.

Making Data Meaningful

Taxonomy: It sits in the critical path of …

Where is the excitement around this issue?

It seems that “Taxonomy” was my word for the week. This is my third post about it within 7 days. It’s not that I am in love with the word, rather, it’s just pretty darn important! With any big initiative, the first thing we look to is a solid foundation for communication. Think about it, we usually address taxonomy anywhere from casual discussions to formal governance groups for many initiatives – dare I say any initiative that strives to bring real change to an organization begins with taxonomy (either consciously or subconsciously). Thinking it through, here are some of my top-of-mind game changers that require a solid taxonomy:

 Master data management. By definition this is really an enterprise taxonomy that is the official reference data for an organization.
 Metadata management. Tagging data with information is best performed only after a taxonomy is well-established. Else, with-what-shall-I-tag-it plagues the process.
 Business Intelligence. Without a proper taxonomy, how do we bring together people from diverse business perspectives together to understand data from a central and enterprise viewpoint?
 Data Management. Well, of course we can’t properly manage data without knowing where things are in a hierarchy or what context the data should exist in.
 Data Quality. Here we are really measuring data against the taxonomy; whether implicit or explicit.
 Governance. Strategic decisions are made for specific purposes and they need to rely and depend upon a socialized and accepted taxonomy.
 Data Stewardship. This is the process of holding someone accountable for making tactical decisions to implement strategic direction.
 BPM. When we look to manage business processes, they depend upon real information. So, having a taxonomy to base these data points is crucial.
 SOA. Reusing software components and exposing them at the enterprise level demands a highly accepted understanding of the organizations data. Sure, this view is exposed as a group of web services that are published in a repository that is self aware – but without a canonical data model as your underlying foundation, consistency is not reached. A canonical data model is highly correlated to a mature taxonomy.
 Strategy, Solutions and Architecture. It’s near impossible to calibrate these three without a friction free flow of communication. Let’s not talk about what should’a, could’a and would’a – but let’s focus on the business problem at hand. Having a living taxonomy that is socialized, accepted and part of our DNA is key to gaining quick momentum as we put distance between us and our competition.

These are just some quickies that I bubbled up. What other initiatives need a solid taxonomy? Thinking about taxonomy, when you look to bring real change to an organization, what happens? From my experience, there are two choices:

1. Address taxonomy early and often. Realize that there are some things that are so important that we need to establish, socialize and enforce them.
2. Jump to build a solution. Then realizing there are terms misaligned, misused and duplicated, go back and either fix the data models (and all subsequent diagrams and code – this rarely gets done) or create a lot of code to hide these issues. When we do this, we establish a short term brittle foundation that breaks when the next change comes or we end up with a bunch of custom spaghetti that tries to tie things together but really ends with just a lot of confusion.

Bottom line
• Embrace taxonomy within your natural collaboration style. When something is unclear, pause, ask, record, check for understanding, agree on the outcome and move forward. It’s not a development phase, don’t sell it to senior management. No one cares about it. It’s an expected minimum of doing business. Add it to your culture’s DNA.
• Don’t underestimate the issues when terms are not aligned. It wrecks havoc to your foundational infrastructure and the costs (both hard and soft) can be big. Know it is there and plan for it.
• Scope your risk. If you are working within a group or team, the risk is small. Plan for it and cross it off your list as you develop it. However, if you are aligning silos or working across divisions or bringing others into alignment, or working cross-culturally, or introducing new teams, these issues can be big. Again, plan for it – put someone in charge of its care and feeding.
• Use it as a way to create excitement and ownership. Once you work together, it is always good to look for accomplishments to celebrate. Depending upon your scope, it can also be a way to generate a new level of buy-in. Manage the group right and they walk away with the justified feeling that they had part in it – that they created it and it reflects their slice of the business. Trust me here – then they will socialize it and ensure that its followed in their domain!

Now that is exciting stuff!

The post Taxonomy: It sits in the critical path of … appeared first on Making Data Meaningful.

Making Data Meaningful

Business Intelligence, Done Right

Business Intelligence efforts often fail, not because of the technicians, but because of the disjointed relationships we (IT) have with the business. We fear relationships. We think that our investments and communications are a waste of time because we have a tremendous amount of downward pressure to deliver – so we start quickly and build according to some document that we substitute as face time with our partner. To fix this problem, we need to take the best part of our building experience as ITers and put that together with the best part of relationship management – skills that we often don’t posses, but are vital to successfully delivering value and meeting our partners expectations.
{more on “The Right Way” below}

Kimbal’s methodology calls for the segregation of the three layers of BI and this is fantastic. There are in fact three separate BI tracks that can and should run concurrently. The three tracks are:
1.    Technical Architecture
2.    Data Architecture
3.    Application Architecture
After all, we need to make progress on the technical choices of architecture and tools. What is our environment like, what standard tools do we use, what is our technical strategy to frame and contain our BI effort?
At the same time, we need to start looking at our data. What data will we need to work with, what is it, how should it relate, where does it come from, how do we extract, transform and load it. These are considerations of the data architecture effort.
Again, at the same time, what level of innovation do we need to arrive at? How do we present this to the user and provide the desired interaction to help mine data to craft information.

Kimbal’s Methodology

This methodology is right on! It is triggered by and the scope is managed by the requirements. These requirements are derived from that box that is labeled “Program/Project Planning”.  But it’s not enough, it’s not there yet! In my mind, this is a bit limited and really only addresses the “Go” portion of the project – to build. All too often, BI projects fail because of thing that are ‘above’ or before the Go. Here is what I have found to be the differentiating factors.

Requirements should be built from Strategy. Why are we building this BI application? What are the important questions that need to be answered? What is the value that these answers bring; are we making things better or faster or cheaper? A well formulated strategy addresses these questions and should manage the deliverables. When we rely on a set of requirements to build a BI solution, we are abstracted from the strategy and this is not good. When this abstraction occurs, we are removed from the business value and are relegated as a commodity…simply build to this specification.
However, strategy is not born without a birthing process. The birthing process consists of a couple of elements. The first is an alignment or a focus at most senior levels. If we can’t align on the true aspects of the business, we are forced to work in a vacuum and can, at best produce a solution that is severely slanted. This slant often caters to the loudest voice or the biggest ego. Worse, our solution is then forever raised in a silo that contributes to expensive growth and maturity with a limited enterprise appeal. If and only if  we get this alignment, we are doomed to deliver value slices rather than holistic value.
This alignment is not enough. After we have a focus, we then have to dive deep within the organization to see what raw materials we have to work with. I refer to this phase as ‘Discovery’. Once we are aligned and focused on the business value, what do we have to work with? What data exists in what format and at what availability and quality?
To state this concisely, we have the following necessary phases that will lead us to success.
1.    Alignment – A common focus with enterprise level buy in.
2.    Discovery – An exploration to ascertain our resources and an evaluation to ensure it supports the common focus.
3.    Strategy – The actual strategy that is an outcome of “What do you want” + “What do we have”.
4.    Requirements – The plan of how to get there from here.

Now, we are ready to “Go” forth and multiply! We are ready to build! And we know what we are building (requirements) and how the effort will benefit the organization (strategy). We know what we really have to work with (discovery). And finally, we have those committed at the top who are committed at the deepest levels of the effort and will both support/defend the effort as well as ensure that its accepted and reused across the enterprise.

Putting it all together, let me suggest the following methodology…

The Right Way to do BI

Sometimes we are asked “Why are you just sitting there? Do something!” We need to rally and fight the urge to just do something. We need to change that around as ask “Why are you just doing something? Sit there” (and get focused, aligned, discover, produce strategy, birth requirements…then we can “Go” and do something excellent).

The difference between a commodity and a partner is the level of time you invest in the relationship.
Take the time to do it right and it will be celebrated!
The next series of blogs will outline the above components; alignment, discovery, strategy, requirements and the three team approach to building BI solutions. Feel free to shoot me some questions or thoughts you may have regarding these topics.
Enjoy the ride!

The post Business Intelligence, Done Right appeared first on Making Data Meaningful.

Making Data Meaningful

Data Vault: The Preferred “flavor” for DW Architecture in BI – Part II

In Part-I, I explained the place of Data Vault (DV) in Enterprise Data Warehouse Architecture. Now let’s look at different DV entities, rules for each entity and why Dan Lindstedt calls DV a “hybrid” approach. This minimal understanding is necessary before diving into the differences between the various modeling techniques.

The main entities of Data Vault are Hub, Link and Satellite.

HUB Entity (HUB_): This is a defining entity. It contains a unique list of business keys. These are the keys that businesses utilize in everyday operations. For example, employee number, SSN number, Product Code. So the attributes of HUB are:

  • Surrogate Key – This is a Primary Key of hub and holds 1-to-1 relationship with the Business Keys.
  • Business Key – This is a Primary Key of the source system. This can be a composite key. ETL checks this key’s existence in the hub table and inserts one if it doesn’t exist.
  • Load Date Time – The datetime of the key / record when it was first loaded into the table.
  • Record Source – The name of the source the record originated from. This is useful for data traceability.
  • Record Begin Date Time – The datetime when the record became active in the source (if available) or the datetime when ETL has been run.
  • Record End Date Time – The datetime when the record is closed. This can only be detected if the logical deletes are supplied or derived in some manner.

LINK Entity (LINK_): LINKS are constructed once all the HUBS are identified. Links are relationship entities.  These are the physical representation of m-to-m 3NF relationship. It represents the relationship or transaction between hubs. The link table contains the unique list of relationships between hub keys. When a relationship arrives, it simply gets loaded into the table if doesn’t exist. Typically, the link tables translate into fact tables in the datamart access layer. For example, the link between employee number and the project number. The other attributes of LINK are:

  • Surrogate Key – This is a Primary Key of the table and is useful when a link contains more than two hub keys as composite key might cause performance problems. This is also
    useful when the granularity of the link changes (a hub key is added) or history needs to be maintained on the relationships.
  • Hub Key 1 to Hub Key N – The surrogate keys from the hub tables that are involved in the relationship.
  • Load Date Time- The datetime when the record was loaded into the table.
  • Record Source – The source system name from where the record or relationship was loaded from.

SAT Entity (SAT_): SATS holds descriptive information about the hub keys or the relationships. The satellite is most closely resembles Type 2 Dimension. When the data changes, a delta record is inserted into the table and if the certain columns changes faster than others then these can be split into two different tables to avoid data replication. For example, employee details such as employee name, address, phone number, email address in the satellite off of hub  or time spent by an employee on a certain project in satellite off of LINK that stores the relationship between an employees and projects. The other attributes of SAT are:

  • Hub or Link Surrogate Key from HUB or LINK table. This is part of the primary key.
  • Load Date Time – The datetime when the record was inserted into the table. This is part of the primary key.
  • Surrogate Key – This is optional. It is useful when satellites have multiple values such as multiple home addresses.
  • Record Source – The name of the source.
  • Record Begin Date Time – The datetime when the record became active in the source (if known) or the datetime when ETL has been run.
  • Record End Date Time – The datetime when the record is closed.

And stand-alone tables such as calendars, time, code and description tables may be used.

Modeling Rules for Each Part of the Entity:


  • Hubs keys cannot migrate into other hubs (no parent/child like HUBS).
  • Hubs must be connected through links.
  • More than two hubs can be connected through links.
  • Surrogate keys may be used.
  • Business keys are 1 to 1 relationship with surrogate keys.
  • Hubs primary keys always migrate outward.
  • Hub business keys and primary keys never change.
  • If a hub has two or more satellites, then a point-in-time table can be built for ease of joins.
  • An ‘UNKNOWN’ business key record can be inserted into Hub that can be used to tie other data in links and sats that has no business keys in source. This kind of data is usually a bad/incomplete source data.


  • Links can be connected to other links.
  • Links must have atleast two hubs associated with them in order to instantiated.
  • Surrogate keys may be used.
  • The combination of surrogate business keys made a unique key.
  • Does not contain descriptive data.
  • Does not contain begin and end dates.


  • Satellites may be connected to hubs or links.
  • Have 1 and only 1 parent table.
  • Satellites always contain either a load date-time stamp, or a numeric reference to a stand-alone load date-time sequence table.
  • Primary key is a combination of ‘surrogate key’ from either hub or link and the load datetime stamp.
  • Surrogate keys may not be used.
  • Must have a Load End Date to indicate when the CHANGE to the data set has occurred.
  • Satellites are always delta driven. Duplicate rows should not appear.
  • Data is separated into satellite structures based on 1) type of information 2) rate of change.

DV model utilize bits of both 3rd Normal Form and Dimension Modeling concepts.  This approach has made the model simple, flexible, expandable, adaptable and consistent.

  • Adapted many-to-many physical relationship structure from 3NF that became a LINK table.
  • The LINK table is also similar to factless fact in Start Schema.
  • Adapted the notion of 1 to 1 (business key to surrogate key) tracking from dimensional modeling (type 1 dimension).
  • Adapted the notion of “data over time in a separate table/structure” from dimensional modeling (type 2 dimension). This resulted in a SAT table however it is fundamentally
    different, in that it is a child dependent table, whereas the dimension is a parent table to the facts.

This is it for now. In next post(s) we will look into some examples which will show how Data Vault technique overcomes the limitations of 3NF and Dimensional Model structures when applied as an Enterprise Data Warehouse.

The post Data Vault: The Preferred “flavor” for DW Architecture in BI – Part II appeared first on Making Data Meaningful.

Making Data Meaningful

Data Vault: The Preferred “Flavor” for DW Architecture in BI – Part I

Business Intelligence (BI) is todays ‘MANTRA’ chanted by almost every business. Companies want to outsmart the competition. Companies are ready to invest big bucks and human power to build a sophisticated BI system so that they can have the knowledge that others don’t and seize on the opportunities in the market before others do. BI shows the Future Value of Your Business.

BI systems need DATA and every business has terabytes of real data which can provide them with the information and knowledge they need to make the right decisions on time. But the key is to turn that data into information in a timely, efficient and effective manner once the WHAT AND WHY questions are answered i.e., what information is needed, what matters and why that is required.  In today’s market, every business is in a RACE. The race to conquer others. The race to generate more gains/profits. The race to foresee the risks early on so that they can be avoided.  So time is of the essence here.

An optimized BI system integrates large volume of external and internal near real time data to allow management to create opportunities by making intelligent decisions after performing predictive analysis of their approach on the business. A good BI System is like a GPS. An effective GPS is one that not only shows you a route to your destination but also guides you when you hit roadblock, gives up-to-date external conditions (constructions / traffic) information, provides multiple routes to choose from, suggests you with alternatives for shorter and fastest routes, predict the total time based on your driving behavior, tells you what to expect next etc. Just knowing the path to your destination is not sufficient. You need to know many other factors during the whole ride to reach destination on time and without any hurdles.

For a good integrated BI system, a good Data warehouse architecture needs to be in place.  Data warehouse architecture is “an integrated set of products that enable the extraction and transformation of operational data to be loaded into a database for end-user analysis and reporting”. Below are the pictorial representations of different “flavors” of DW architectures.

Methodologies used by different architecture:

Kimball’s DW Architecture – Is based on ‘Bottom-UP’ methodology.

Inmon’s DW Architecture – Is based on ‘Top-Down’ methodology.

Dan Lindstedt’s Data Vault DW Architecture – Is based on ‘HYBRID DESIGN’

The first two design methods have some limitations for Data Warehouse layer such as inflexibility and unresponsiveness to the changing departmental needs during the implementation phase, insufficient auditability of data back to its source system, inability to integrate unstructured data, inability to rapidly respond to changes (organizational changes, new ERP implementations) or difficult to load type 2 dimensions in real time. This is where DATA VAULT came in to rescue. Data Vault follows a ‘HYBRID DESIGN’ methodology which follows ‘TOP-DOWN ARCHITECTURE WITH A BOTTOM-UP DESIGN’.

The model is a mix of normalized modeling components with type 2 dimensional properties. In this model, the DW serves as a backend system that houses historical data which is integrated by the business keys. All data ‘good, bad, incomplete’ gets loaded into the data vault and all the cleansing and application of business rules takes place downstream i.e., out of DW. This means that Data Vault model is geared to be strictly a data warehouse layer, not as a data delivery layer which still requires physical or Virtual star schemas or cubes for Business Users or BI tools to access.

Bill Inmon in 2008 stated that the “Data Vault is the optimal approach for modeling the EDW in the DW2.0 framework.”

In Part 2 and 3, I am going to explain different components of Data Vault and it’s power with the help of some examples.  That will clearly explains why the Data Vault should be a preferred “flavor” for different businesses.

The post Data Vault: The Preferred “Flavor” for DW Architecture in BI – Part I appeared first on Making Data Meaningful.

Revolution Analytics

A small logical change with big impact

In R, the logical || (OR) and && (AND) operators are unique in that they are designed only to work with scalar arguments. Typically used in statements like while(iter < 1000 && eps...


October 14, 2018

Making Data Meaningful

How Necessary is a Data Warehouse?

The largest and most complex aspect of Business Intelligence (BI) is the data warehouse.  In this context, the data warehouse is the repository of data generally fed from many sources to keep historical perspectives of an entity’s data.   It is a behemoth that is generally expensive, slow to build, complicated in structure and difficult to maintain.  How necessary is it?  Does a company need the actual, physical data warehouse to have a successful and sustainable business intelligence (BI) program?

There are many design methodologies that take these issues into consideration.  There are advantages and disadvantages to both traditional (and non-traditional) methodologies which I do not cover in this post.  My goal is to bring up points of view of why and when a data warehouse may or may  not be used.  What I would like to cover is:

  • The Corporate Information Factory (CIF), based on the Inmon approach
  • The Kimball Style of data warehousing
  • BI using no data warehouse at all

Corporate Information Factory

The Corporate Information Factory methodology, in a nutshell, says there is no way of getting around this inevitable fact of the need for a data warehouse.  In order to have a successful and sustainable BI program, a data warehouse is needed.  Not only is it needed, it needs to be completely designed, built and populated prior to any further analysis or BI work can be done.  This is due to the nature of how business concepts are intertwined within each other necessitating the big picture view.  This style also views the architecture process more from the IT/data perspective compared to the business need point of view.

Kimball Methodology

The Kimball methodology of data warehouse design is not as structured and regimented as the Corporate Information Factory.  The Kimball data warehouse is the sum of its parts; meaning one area of the business could be designed, developed and deployed providing BI insight while other aspects of the business have not been discussed.  This concept will speed the development of the data warehouse compared to the CIF, but the underlying data warehouse can become much more complex as more and more is added to it along with the possibility of rework.  This style views the architecture process from the business needs point of view compared to the IT/data perspective.

No Data Warehouse?

What about not using a data warehouse?  In the new age of Data as a Service (DaaS), Master Data Management along with Service Oriented Architecture (SOA), why re-store data from disparate systems?  Why not store the metadata of where the data is found and attach the business logic to the SOA call?  This can be a very powerful way to gain insight into data.  The idea that the development of a data warehouse can be done without the data warehouse.  There are already tools that will do this.  One of them is Qlikview from Qliktech.  The basic premise behind this tool is to allow the user to develop the Transform and Load aspects of ETL (Extract Transform and Load) in memory to delivery very quick analytics in a solid visual manner.  This tool is not a methodology, but SOA could be used in a larger context with the same principles. This style views the architecture process as something the business could do, but IT does not have to do.

The idea that a data warehouse is necessary for a successful BI implementation is not necessarily true.  A data warehouse is not necessary to have analytics or provide a picture of the data you have.  I believe it is very questionable to say this process is sustainable to leverage every benefit for BI.  The very important aspect of BI that cannot be overcome by SOA, or in-memory analytic tools like Qlikview, is the entire reason the data warehouse first came about.

The decision for building or not building a datawarehouse is all about the history of the data.  Not the history that is required by law to be kept like financial data or what in many cases is considered ‘facts’ in the Kimball style.  If this were the only history needed, a data warehouse would be less necessary.  The type of history that is important is the history that cannot be reproduced within the source systems.  This is the history of changes made that are not kept by the source system.  In many cases a customer’s address may not be historically important in a transactional/source system so only  the most current record is kept.  If that history is not kept somewhere (like a data warehouse), analytics of historical purchases of products will not show a true picture of what actually happened.  It will only show the picture of what is in the source system at the current point in time.  This situation is the quinticential lynchpinn for why a data warehouse should be necessary.  The ability to track and keep history that is not kept in the source system is something SOA, or in-memory BI is not capable of reproducing.

If the desired BI capability for the business is operational in nature, a data warehouse will not offer any significant benefit over SOA.  This is a short sighted tactical means of looking at data and cannot provide strategic insight, but it certainly could be the best way to answer that need for data given the circumstances.  This would not be the end-all-be-all for BI, but it certainly can provide means to start a program.

So does this completely answer the question “Is a data warehouse necessary for BI?”  The data warehouse is necessary for a complete and sustainable BI program, but it does not have to be the start of the program.  So… of course the answer to that is still…. “It depends…”

The post How Necessary is a Data Warehouse? appeared first on Making Data Meaningful.


October 12, 2018

Revolution Analytics

Because it's Friday: Hey, it's Enrico Pallazzo!

It seemed like such a simple movie. The Naked Gun (1988) is slapstick comedy through-and-through, but I never would have guessed (h/t Steven O'Grady) how much detail and planning went into the jokes,...


Revolution Analytics

The Economist's Big Mac Index is calculated with R

The Economist's Big Mac Index (also described on Wikipedia if you're not a subscriber) was created (somewhat tongue-in-cheek) as a measure to compare the purchasing power of money in different...


October 11, 2018

Revolution Analytics

How R gets built on Windows

I wasn't at the Use of R in Official Statistics (uRos2018) conference in the Netherlands last month, but I'm thankful to Jeroen Ooms for sharing the slides from his keynote presentation. In addition...


October 09, 2018

Revolution Analytics

R Consortium grant applications due October 31

Since 2015, the R Consortium has funded projects of benefit to, and proposed by, the R community. Twice a year, the R Consortium Infrastructure Steering Committee reviews grant proposals and makes...


October 06, 2018

Simplified Analytics

Interview with Sandeep Raut, Founder & CEO - Going Digital

Sandeep Raut – Founder & CEO – Going Digital published by Onalytica Key Topics: Digital Transformation, AnalyticsLocation: Mumbai, IndiaBio: Sandeep is Digital...


October 05, 2018

Revolution Analytics

Because it's Friday: If IKEA did algorithms

Thanks to Mike Loukides I recently discovered IDEA, a series of algorithm explainers presented as IKEA assembly instructions. It's a brilliant concept: IKEA instructions have to be clear and easy to...


Revolution Analytics

A few upcoming R conferences

Here are some conferences focused on R taking place in the next few months: Oct 26: Nor'eastR Conference (Providence, RI). A one-day R conference, organized by grassroots R community members in the...

Making Data Meaningful

The Heart of the Purpose-Built Cloud

At a cocktail party this past weekend, I spoke with some local business leaders whose job roles run the gamut from IT to finance to operations. When asked about the type of business I’m in, I gave my standard answer: “My company, Data Intensity, provides managed services both on premises and in the cloud for companies’ most critical applications and databases.” I described how the hype of the cloud has impacted our business, and our work in building a purpose-built cloud for Oracle enterprise applications. The response I got was, “Oh, so you guys are like Amazon?”

“Not exactly”, I said.

For me, this conversation captured one of the core issues with understanding cloud computing. All clouds are not created equal. A general-purpose cloud such as Amazon, Softlayer, Rackspace, or Microsoft provides markedly different services from the clouds provided by companies like Data Intensity and even Oracle. It’s not just whether it is a question of public cloud versus private cloud, either. In the daily course of my job, I encounter business and IT leaders struggling with one fundamental question: What is the most efficient, flexible, cost-effective model possible for application deployments? At the core is whether to keep IT infrastructure and applications in house, where SaaS fits in to the company’s application lifecycles, and how to provide IT infrastructure most easily and effectively. The appeal of a one-stop shop for cloud services versus multiple sources for specific-use cases is apparent. But the reality of building and supporting a diverse-use cloud, even for Amazon, is challenging. It’s clear to see why projects and initiatives that develop common frameworks to integrate across clouds, like OpenStack  or, are gaining in popularity.

In the discussions I have in my role as CTO, I prompt business and IT leaders to give careful consideration to not just the “where”, but the “who”. Who is going to support these applications and how will your company’s operating model change when constrained by a provider’s restrictions and guidelines? Even after removing the physical infrastructure from the equation, employing talented knowledge workers who understand how to navigate the various delivery paradigms chosen is essential. The cloud doesn’t necessarily remove complexity, it just changes the profile of the complexity. The Data Intensity Cloud is not only distinguished by the purpose-built infrastructure that it runs on, but by the application and database management services that are tightly integrated with it.

At the end of the day, it’s the people who make the proverbial trains run on time. As long as the infrastructure an application or database runs on is secure, stable, fast, and flexible, most CIOs, and certainly most business users, don’t really care where it lives. However, it’s important to remember that an exceptional user experience is limited by the weakest link in the system. So even with the most robust, integrated clouds in today’s market, the user will not have a positive experience if the applications are not properly configured and integrated for the particular cloud implementation in which they are installed. For this reason, Data Intensity invests in engineering and automation and organizes in an inter-disciplinary manner that ensures we have captured the full benefits of a cloud solution in our model. Indeed, hiring and retaining competent knowledge workers is one of the most challenging tasks in today’s IT landscape. Even in the case of a pure-play SaaS offering like, the greatest software is completely ineffective if you don’t have the people to configure and manage it.

With Oracle OpenWorld shortly upon us, the buzz is all about Larry Ellison’s new role as CTO and Oracle’s big push to become the leading cloud provider for Oracle software. Oracle has the financial resources to build and support a robust, successful cloud model, yet keep in mind that Oracle produces many different kinds of software. Supporting their SaaS products, like Fusion Applications, is very different than building a repeatable, template-driven deployment model for Oracle Database. These products have different performance profiles, security requirements, and maintenance procedures, and even these vary by each customer’s individual use case. And you’re not likely to get a solution that is fully tailored and supported for your particular use case under Oracle.

What really matters is that the infrastructure is tailored to the needs of the individual application, which is then tailored to the individual needs of the specific customer. But the weakest link is the people that support it all, which is why I take heart in knowing that I work with some of the best minds and most dedicated professionals in the industry. And while I believe the cloud that Data Intensity built is unsurpassed for the use cases of each of our individual customers, it’s these people that truly make it a success. That’s evidenced by the various infrastructures our customers run on – our cloud, third-party clouds, and customers’ own internal data centers – and our continued ability to adapt to diverse infrastructures to provide an exceptional service experience. As the cloud landscape continues to unfold before us and evolve around us, I know that will never change.

Stay tuned for more updates from Oracle OpenWorld…

The post The Heart of the Purpose-Built Cloud appeared first on Making Data Meaningful.

Making Data Meaningful

The Time is Now to Create Your Oracle E-Business Suite Roadmap

“There is plenty of life left in Oracle E-Business Suite,” says Marc Caruso, Data Intensity’s Chief Technology Officer in a recent webinar. He points out that Premier Support for Oracle E-Business Suite 11i and R12 releases have been extended into 2021 and 2023, respectively, and the upcoming 12.3 release will ensure at least eight more years of support upon general release.

Because Oracle has extended its support timeline for the E-Business Suite, it’s time to map out a fresh approach to your E-Business applications based on where they fit, what hardware they’re running on, your cloud strategy and the new functionalities they provide.

Key to-dos for Release 12.1.3 E-Business suite users

If you are on 12.1.3 and plan to stay there, implement these four steps:

  • Apply Recommended Patch Collection 5. This release full of important fixes and improvements came out in September 2016. It will bring your 12.1 implementation up to date.
  • Use database version or higher. It’s the only supported release of Database 11g, so it’s time to start thinking about upgrading to Database 12c.
  • Make sure you have a browser and JRE compatibility roadmap. Having a roadmap that accounts for Windows 10 and IE 11 is important for maintaining compatibility.
  • Keep current with Oracle CPUs. Schedule testing cycles at least once a year for peace of mind.

New 12.2.6 features may make it an attractive upgrade

12.2.6 shipped in September 2016 with hundreds of functional improvements. New 12.2.X features including subscription billing capabilities, outcome-driven procurement and information discovery are compelling functional enhancements. However, the online patching feature is probably the biggest news. It answers the needs of the many customers with 24 x 7 uptime requirements who have long urged Oracle to develop that functionality. It’s important to be aware of tradeoffs, which include a 2-3X increase in application tier storage capacity and database growth from shadow tables ranging from 5% to 35%. There are many best practices to consider, and Data Intensity offers guidance on how to prepare for the online patching feature, as well as redevelopment of CEMLIs that need to be brought into compliance with the feature.

There are some other features that may be of interest to customers too. First, Oracle has expanded testing automation products to make it easier to go through change and release cycles. Second, there are enhancements to both mobile functionality as well as an expanded lineup of over 25 smartphone applications, that include functionality for things like workflow approvals and time and expense entry.

Users already on 12.2 and considering the 12.2.6 release should know that:

  • There were some helpful user interface improvements that were made that should improve the overall user experience.
  • There are important AD and TXK RUP patches (R12 AD.C.Delta 8 Patch 21841299 and R12.TXK.C.Delta.8 – Patch 21830810) that you don’t want to get behind on.

With 12.2X, the following modules are no longer supported:

  • Daily Business Intelligence
  • Balanced Scorecard
  • Embedded Data Warehouse

Also, these 10G product line technologies have been put out to pasture:

  • SSO 10G
  • OID 10G
  • BPEL 10G
  • Portal 10G

Infrastructure and technical management considerations should be part of planning

Factor in infrastructure and technical management issues when doing E-Business Suite lifecycle planning. For instance, when moving to 12.2, you will surely require additional storage capacity. But you can offset that need by leveraging snapshots, deduplication and thin provisioning. Plan to boost I/O, add both CPU and memory resources to support online patches, and prioritize processes by working with OS administrators. These steps will all contribute to a smoother transition.

On the technical management side, your DBAs will need training on

  • managing WebLogic Server
  • understanding E-Business suite’s topology architecture
  • how online patching works and how to troubleshoot it

Finally, consider offering a virtual desktop for users of older versions. This will allow you to continue offering the interface that they’re already familiar with, accelerating their acceptance of the new release.

Changes in Oracle database support may positively impact plans for 2017 and beyond

If you were pushing forward with a database upgrade out of concern for extended and support fees, the pressure is off. Oracle has waived those fees through December 2018 and July 2019, respectively. While the immediate cost impact is no longer a concern, that doesn’t mean you should put thoughts of a database upgrade out of mind. Extended support is still extended support. Start planning your upgrade to Database 12c soon to ensure the business has adequate time to test.


Which upgrade is for you? 12.1 or 12.2?

While 12.1.3 is the path of least resistance – lower effort and cost especially if you have a lot of customization it’s probably a better bet to move to 12.2.6. That’s because it gives you nearly two additional years of Premier Support, and possibly more if Oracle extends it. Of course, anyone who wants to take advantage of 12.2’s online patching will want to make the move.


What about Cloud?

While a lot of the material we see from Oracle today is about Oracle’s Cloud Applications, that doesn’t mean there aren’t options for Oracle E-Business Suite in the cloud. Data Intensity provides an industry-leading, purpose-built private cloud for Oracle E-Business Suite and other enterprise technologies. We also provide support for public cloud solutions for E-Business Suite using either Amazon Web Services or Oracle Public Cloud. In fact, we are working closely with Oracle Development on the automation tools that will be used to move E-Business Suite workloads into and out of Oracle Public Cloud.

Just because it’s not the right time for you to move to Cloud Applications doesn’t mean you can’t say you’re in the cloud. The bottom line is that you can remove infrastructure management from the equation, get an improved user experience, and benefit from the automation and higher SLAs that a cloud model can provide.

Roadmap recommendations

The following actions will give you a good head start toward meeting your upgrade goals:

  • Develop a plan that considers Oracle’s support timeline extensions. Involve the business to create the justifications you need, whether you’ve decided to upgrade, keep current with your release, or make technical changes – like CPU patches, TXK or AD updates – that sometimes fall by the wayside during planning.
  • Develop CEMLIs to comply with 12.2 standards. Have a system integrator do a scan to identify all your CEMLIs. Be sure to search outside of Oracle E-Business applications to catch CEMLIs embedded in integrated third-party applications.
  • Ensure desktop compatibility with your browser: Windows 10 and IE 11, JRE
  • Use an upgrade as an opportunity to modernize and update platform security. Review and correct processor core sizing as well as older operating systems and compatibility limitations that can crop up when running an older OS with virtualization and cloud platforms.
  • Seize the opportunity to add more up-to-date security components like privileged user access and role mapping. Think about a move to a cloud platform and what that means for the agility of your E-Business Suite investment, and do it with a company that can help you in your journey from E-Business Suite to Oracle Cloud Applications.
  • Licensing. Be safe rather than sorry regarding licensing! If you haven’t been audited by Oracle in a couple of years, you are probably on the list. Stay ahead of this and avoid being hit with unbudgeted expenses. In many cases, you can convert unused on-premises licenses to corresponding cloud capacity, which is certainly worth exploring. Data Intensity provides both one-time soft audit as well as ongoing License Management-as-a-Service (LMaaS) services to help you find compliance issues, resolve them, and stay in compliance.

Find out how Data Intensity can help you make the most of your Oracle E-Business Suite investments by helping you develop the right plan for your organization. Contact us today »

The post The Time is Now to Create Your Oracle E-Business Suite Roadmap appeared first on Making Data Meaningful.

Making Data Meaningful

Five Oracle Tuning and Diagnostics Pack Facts You Should Know

As Oracle continues to audit customers with increasing regularity, one topic that we are frequently hearing about from our clients is related to the licensing of Tuning and Diagnostics Packs. While many customers have been using them for years – some even unwittingly – most are unaware that these tools provide a very critical function in managing the performance of their applications. And at $12,500 per processor, licensing them is not an insignificant spend, as it represents more than 25% of the cost of an Oracle Database Processor license. For this reason, when pricing Oracle Database Enterprise Edition licenses, we tend to bake this number into the cost, which increases the $47,500 list price to a cool $60K.

With all of this in mind, here are five “pack facts” you should be aware of:

  1. The Tuning and Diagnostics Packs are only available with the Enterprise Edition of the Oracle Database.
  2. Diagnostics Packs are a pre-requisite for licensing Tuning Packs. We’ve seen situations in which customers own Tuning Pack licenses without Diagnostics Pack licenses (which makes us wonder why there aren’t better controls in the Oracle sales process for preventing such a situation).
  3. The counts for all optional packs must match the metrics and counts of Oracle Database licenses for each database in which the packs are being used. We see frequent examples where the counts do not match.
  4. If you are using Enterprise Manager (EM) to manage your databases, you are most likely using Tuning and Diagnostics Packs. In 10g, EM went from using legacy statspack utility data to using AWR data and, while this was not a highly published event, that fact does not absolve customers from following the rules. There are some steps you need to take to remove links in EM that could trigger a license event.
  5. You don’t need to be using EM to have to license the packs – simply accessing the underlying data structures could trigger a license event. Though there are some exceptions, keeping track of them is not straightforward.

All of this confusion makes both licensing and supporting Oracle Database a challenge. Often times Oracle Support will not ask a customer if they are licensed for these packs prior to requesting performance data. In addition, your managed service provider will often assume that these packs are licensed if they find evidence that they have been used previously.

How to determine Tuning and Diagnostics Packs usage

In Oracle Database 11g and higher, determining whether the Tuning and Diagnostics Packs are enabled in your database is as easy as reviewing the control_management_pack_access parameter in v$parameters. Values could be NONE, DIAGNOSTIC, or DIAGNOSTIC+TUNING. To determine whether AWR is actually in use, you would need to run a query on the dba_feature_usage_statistics view. If it is, you are most likely benefiting from using the packs and just didn’t know it. Oracle does provide both the option_usage.sql and used_options_details.sql scripts to determine which packs and features are being used.

If you are unsure whether you are using the Packs, ask your DBA to check for you. Then reconcile that information against your latest Oracle Support renewal. Remember that licensing is considered on a CPU basis, so you will need to check each database you have under management and cross reference that information with the servers on which those databases are running. While most technical resources aren’t familiar with Oracle licensing policies, Data Intensity does have licensing specialists that can help with this complicated exercise.

If you’re not currently utilizing (or are licensed for) Tuning or Diagnostic Packs, you may want to consider the following benefits of each:

Diagnostics Pack

  • Automatic Database Diagnostic Monitor (ADDM) – An automated tool that focuses on the database’s most intensive operations, drilling down into the performance to proactively determine root cause.
  • Automatic Workload Repository (AWR) – A repository that collects statistics at predetermined intervals on the workloads within the database. The AWR provides an historical reference for performance changes over time, including establishment of performance baselines, and adds great value to the capacity planning process.
  • Active Session History (ASH) – A key component of AWR, ASH samples session activity every second and stores it in views, replacing the need for more manual utilities such as SQL trace. DBAs typically use the v$active_session_historyview to isolate performance problems with individual database sessions.
  • Data Dictionary Views – With some exceptions, data dictionary views beginning with dba_addm, dba_hist ordba_advisor are part of these management packs, and accessing them trigger a licensing event.

Tuning Pack

  • SQL Access Advisor – Advice on how to optimize schema design in order to maximize query performance. This feature takes input from a variety of sources, including AWR, to analyze a workload and provides recommendations on index creation and deletion, partition creation, and materialized views creation.
  • SQL Tuning Advisor – Statistics analysis, SQL profiling, and access path analysis with recommendations on how to optimize SQL. There is also an automatic mode that allows the database to automatically implement recommendations for conditions in which at least a three-fold improvement would result.
  • Real-Time SQL Monitoring – The most frequent use of Tuning Pack is typically real-time SQL monitoring. If a production environment experiences a performance issue, this is the only way for a DBA to determine what SQL statements are running while the problem is occurring.
  • Data Dictionary Views – Access to the sql_monitor, and sql_plan_monitor views require Tuning Pack licenses.

So what’s your next move if you have a performance issue and don’t own licenses for Tuning and Diagnostics Packs? Here are a few suggestions:

  • Customers can use the legacy “statspack” utility, but the functionality will be significantly less than the features contained within the packs. Data is stored in tables owned by PERFSTAT. It’s also much more cumbersome to manage, which can impact the efficiency of your internal DBA team or external managed service provider.
  • Views for dba_hist snapshot, database_instance, snap_error, seg_stat, seg_stat_obj, and undostat are exceptions that can be accessed as part of the “Automatic Segment Advisor” and “Undo Advisor” features without licensing a Diagnostics Pack. SQL Trace and TKPROF are still available, but they will require significant manual work to develop recommendations and conclusions.

Need more information?

Check out My Oracle Support Notes 1490798.1, 276103.1, 94224.1, 1361401.1, 1490798.1, and 1674024.1. These are all good sources of reference material. In addition, Chapter 2 (Options and Packs) of the Oracle Database Licensing Information User Manual contains a significant amount of information on this topic.

The post Five Oracle Tuning and Diagnostics Pack Facts You Should Know appeared first on Making Data Meaningful.

Making Data Meaningful

The Business Intelligence Partnership – A Key to Success

“It takes two to tango.”

“There is no “i” in Team.”

“The sum is greater than the parts.”

There are so many sayings that capture the importance of team work.  Whether it’s a dance or a sports team, success is often found when team members have a symbiotic relationship; one can’t be successful without the other.   Business Intelligence programs are no different.   They, too, require a team effort or partnership to be successful.  That partnership is between the business and the technology functions of a company.

While IT departments are accustomed to working with business associates for application development purposes, the relationship in a BI program is different.

Application development is mostly focused on delivering a transactional system that is rooted in functional requirements.  Requirements are generally known upfront, detailed, and static for the foreseeable future.  Interaction between the business stakeholder and the IT development team is likely intermittent and as needed.  Once the application is developed, the project ends and any temporary team is disbanded.

By contrast, business intelligence really never ends.  BI exists to meet the ever changing analytical needs of the business.  As a result, a dedicated joint business and IT team, often referred to as a BI Competency Center (BICC), is usually formed to meet those needs.   The BICC team is long-lived with ongoing responsibilities to deliver reliable data and meaningful analytics.  Business questions that likely drive requirements are not usually fully known upfront and will come to light through joint prototyping and iterative development.   Additionally, a BI program will have multiple iterations or development projects underway at once.  Therefore, interaction between the business and IT components of the team is likely to occur on a daily basis.   Because business intelligence is often an enterprise wide, strategic initiative, the BICC is ideally led by an executive sponsorship committee jointly comprised of business and IT leaders.

The interdependency of business and IT in a BI program starts from the very beginning and continues through all functions of the BICC.  For example, establishing an over-arching BI strategy and designing the blue print for the related architecture requires the business members to articulate the company’s strategic direction, how it operates and the relationships in the data, their analytical needs, the timeliness of the analysis, etc.  This enables the technology side of the team to design the architecture, recommend proper tools, and design databases that best support the needs of the business.

Another example is the process to manage the company’s demand for BI.  While the business members of the team will likely collect, inventory and assess the benefit of the ever changing needs of the company, the IT side of the team will provide input to a cost and effort analysis which will help to jointly prioritize the iterative development.

Joint participation is especially apparent in one of the core functions of the BICC, which is to manage the data as a strategic asset.  Data governance is all about managing the data assets to ensure integrity, usability, accessibility and security.  It ensures the right data is available at the right time so that informed decisions can be made based upon reliable data.    Data governance focuses on managing data quality, master data and metadata.  In each of these areas, the business and IT members of the BICC work together to define, measure, cleanse, and publish necessary information to ensure the data is consistent, understood, reliable, and has relevant business context.  Usually, specific processes and tools are implemented to govern the data.

The joint partnership between the business and IT is one of the keys to a successful BI program.  Working via a long term, valued added relationship brings many benefits.  The business and IT components of the BICC will consistently drive toward the same goals, speak the same language, better understand and even anticipate each other’s needs,  respect and leverage each other’s talents, gain momentum faster, and produce a better product overall.  Just like the dance “The Tango”, in the world of BI, it takes two to be successful.

The post The Business Intelligence Partnership – A Key to Success appeared first on Making Data Meaningful.

Making Data Meaningful

Point Fingers! Who Killed ‘My Favorite’ Project?

In the not too distant past, My Favorite Company, embarked on My Favorite Project to revolutionize My Favorite Industry and leave competitors in the dust. When competitors got wind of this plan, mysterious things started to happen. And before too long, mysterious turned to sinister. Then one day, My Favorite Project was found terminated. How could this happen? Who would do such a thing? A special task force was commissioned to investigate. Who killed ‘My Favorite’ project? You decide.

Victim: ‘My Favorite’ Project

Injuries: Depleted budget, fractured Scope, and fatal hit to Quality

Description of the Suspects: Project Manager

The Project Manager was assigned to achieve the project objectives by understanding and applying tools and techniques to manage the project including the authority to approve certain types of change requests as defined the project’s roles and responsibilities.

The Project Manager was diligent about identifying project objectives, addressing the various concerns and expectations of the Stakeholders and balancing the project constraints including: scope, quality, schedule, budget resources and risks. Because of the potential for change, the Project Manager crafted a plan that was iterative and went thru progressive elaboration throughout the project’s lifecycle which allowed for continuously improving and detailing a plan as more specific information and more accurate estimates became available.

Software Product Manager

The Software Product Manager’s main role was representing the product to the customer. The Software Product Manager investigated, selected, and developed the products for the organization, performing the activities of product management at all stages of the product lifecycle. (Note: The four main stages of a product’s life cycle are: 1) Market introduction stage, 2) Growth stage, 3) Maturity stage, 4) Saturation and decline stage.)

The Software Product Manager was also responsible for gathering software requirements using a Marketing Requirements Document (MRD) developed by the product planning/marketing team, and developing a high level Product Requirements Document (PRD). In addition, the Software Product Manager created an elaborate Software Requirements Specification (SRS) for the software engineering/development organization for subsequent design, development, and testing activities. Lastly, the Software Product Manager created User Acceptance Test (UAT) procedures, facilitated UAT sessions with end-users, and ensured that the product met the specifications and that it was deployed successfully.

Possible Motives/Scenarios:

1) The development of a product was a project on its own

2) Or, an existing product might have benefited from a project to add new functions or features

3) Or a project might have been created to develop a new model

Clue1: The product lifecycle consists of generally sequential, non-overlapping product phases determined by the needs of the organization. The last product lifecycle phase for a product is generally the product’s retirement. Project lifecycles on the other hand, occur in one or more phases of a product lifecycle. All projects have a purpose, but in those cases where the objective is a service or result, there may be a lifecycle for the service, not a product lifecycle.

Clue2: The project lifecycle is a collection of generally sequential and sometimes overlapping phases determined by the management and control needs of the organization. While every project has a defined start and end, the specific deliverables and activities that take place will vary widely with the project. The project life cycle provides the basic framework for managing the project, regardless of the specific work involved.

Hint: Many facets of the product life cycle lend themselves to be run as projects (e.g. performing a feasibility study, conducting market research, installing a product). In each of these examples, the project life cycle would differ from the product lifecycle. Did the Product and Project Managers have different agendas? Did the two not make it clear who was driving My Favorite Project? Or did the two simply not communicate, which dealt the fatal blow?

Who Dun it?

The Jury is out for deliberation.

You decide.

The post Point Fingers! Who Killed ‘My Favorite’ Project? appeared first on Making Data Meaningful.

Making Data Meaningful

The True Value of Agile…Don’t Point Fingers!

In my previous post, I proposed that the True Value of Agile is derived from fostering an environment of Continuous Improvement through Communication and Collaboration. To be sure, Communication and Collaboration alone will not bring success. As Vin D’Amico ( commented, a critical component to Agile Value and success is a shift in corporate culture to support an environment that encourages risk taking and refrains from seeking to assign blame on any individual when things go awry.   The challenge here is that this goes against most traditional Western, and Asian, cultures for that matter.   In order to improve we must identify what we do well as well as what needs to be addressed. Sometimes finding a better way requires taking the risk to do something no one has thought of before. That does not always lead to success; however, it provides an opportunity for learning and growth. In “Agile,” teams we do not work and make decisions in isolation only to be the single “neck in the noose.” To be an effective Agile team there is no “You” and there is no “Me”, and definitely no “Them.” To be successful the group must live and die as a team.

I am not suggesting Agile teams are like hippie communes, but there are similarities (remember that Steve Jobs lived in a commune before his 15 minutes of fame). Everyone works for the greater good, which is moving the product forward and delivering business value. There is much work to be done and defined timeframes in which to do it. We have tools, and skills, and ideas about how best to complete those tasks. We are jacks of all trades and masters of a few, not one. Effective agile development teams are made up of generalizing specialists. It is this cross-training that helps mitigate singled threaded development as well as unforeseen events that may remove someone from the team for a period of time such as attrition or illness. Once we have our list of work items, which becomes our product backlog, we are able to self-select which tasks we can commit to completing. This is another shift in culture. Traditionally, work is assigned by management to the individual specialists. In an Agile culture the people performing the work are trusted to know best how to distribute the load so that they can successfully meet their commitments and provide value quickly. There should always be something for everyone to do to move forward.  To this end, no one should be waiting for work.

We are not alone in our agile commune either. The business product owner is there with us, every day, to prioritize and test, and yes, re-prioritize and add or defer features based on our velocity and changes in scope. This is not normal for the business owner in relationship to traditional requirements planning for software development.   The scrum master, our servant leader, is also with the team, every day, to work with the product owner to find the most valuable work that can be reasonably fit into the sprint. The scrum master is a servant leader in that it is this role that both serves the needs of the development team to remove obstacles, and provide whatever is necessary for the team to be successful and at the same time leads the team by working as a bridge between the product owner and the team, working on the backlog and the schedule. The dynamic is not unlike an orchestra. The Product Owner is the composer. The scrum master is the conductor, and the members of the development team are the musicians.

What I have described is contrary to most traditional organizations where the business and IT only communicate at the beginning and the end of the project, where divided IT teams are comprised of specialists like a relay team that wait on each other to pass the baton. We need to be more like a volleyball team who work together and support each other to move that ball over the net in the most effective way possible.

Changing corporate culture where failure is treated as a cardinal sin to one where it is addressed, reviewed and used as a learning point is challenging and requires buy-in from all levels of the organization. These concepts are taught in Agile training courses; however, they are not often embraced and supported by corporate management culture. The folks sent to class are typically the developers and the project managers, but rarely the upper management and business sponsors that play a key role in making “Agile” successful.

To achieve the true value of agile we must all work together for the greater good of the company by creating and supporting a nurturing environment of continuous improvement and stop pointing fingers.

The post The True Value of Agile…Don’t Point Fingers! appeared first on Making Data Meaningful.

Making Data Meaningful

Does Your Data Keep You Up At Night?

I recently purchased a new car and the process clearly took longer than it needed to.  I am not a person that is typically prone to long-drawn out decisions, so I began wondering why this one decision took so much longer than previous car purchasing decisions.

I reflected upon one of my earlier car purchases (back in 1997) as a comparison.  When I purchased this car, the process was fairly straight forward.  I drove to a few local dealerships and browsed the showroom until I saw something in my price range that caught my eye.  Over the course of several weeks, I test drove several cars and talked with family/friends/co-workers to see if anyone had any input on the reliability of the vehicles.  Quickly, I narrowed it down to one vehicle. I didn’t lose any sleep thinking about the decision.  I just went with my “gut feel” and stayed within my budget.   I was very happy with my purchase and kept the car for 140,000 miles.

With the most recent car purchase, the process seemed to take on a life of its own.  I had so much information available to me that I spent months researching cars using every automobile search engine and consumer report that I could find.  I often used my SmartPhone to identify nearby car dealerships just so I could do “drive-bys” and look at cars.  Each car I researched had so much positive and negative input available that it became overwhelming as to what was the “right choice”.  I have to admit that I had so much information coming at me that I started dreaming of cars at night and often awakened in the morning with cars as the first thought on my mind.  I had clearly found myself with information overload and was in “analysis paralysis”.

I’ve seen this happen in business too. There is so much data and information available at every turn that it is hard for Business Leaders to weed through what is truly relevant and what is just “noise” getting in the way of making the decisions needed to move the business forward.

If we want to keep our business from getting stuck in “analysis paralysis” – we need to identify what information is truly necessary for us to achieve our business goals and implement the correct tools to visually present this data in a way that facilitates our decisions and doesn’t hinder them.  The challenge is determining what data is meaningful to the decisions that need to be made.

To have effective Business Analytics, the business goals must be clear. If the data you are looking at doesn’t help you make decisions to achieve these goals, then this data may be just “noise”.  How much data do you look at each day that really isn’t helping you achieve your goals?

If your data is keeping you up at night, maybe you have too much of it.

The post Does Your Data Keep You Up At Night? appeared first on Making Data Meaningful.


October 04, 2018

Ronald van Loon

AI Transforms Industrial IoT

If you’ve been studying artificial intelligence and its growth, you’ll know that the industry is well past its nascent stage now. There is significant maturity in its growth, and companies from diverse backgrounds are realizing the impact of incorporating data and AI into their ecosystems.

In a bid to understand the dynamics of this data-centered growth, I teamed up with Hewlett Packard Enterprise (HPE) to do an analysis of their international survey on the present and the future of AI within the industrial sector. Through the survey HPE wanted to find answers to some of the most asked questions related to the field of AI:

What percentages of companies are working on AI?

How can AI transform industrial IoT for the better?

Will it prove to be a job killer for us humans?

The survey answers these and many other questions that shroud the mind of everyone who is currently a doubter.

The Promise of AI
AI has promised exceptional value addition to organizations that have incorporated it into the activities of their value chain. Not only has it enabled organizations to increase the efficiency of their operations, but it has also led to the creation of a better customer experience.

This progress has been enabled, among others, by the exceptional processing power and capabilities of systems that run AI, both at the ‘edge’, i.e. close to the industrial equipment, and in corporate data centers or clouds. In combination with years of experience working with analytics tools, and an increasing availability of AI frameworks, consulting and professional services, this has enabled organizations to use AI as a tool to add new features to their products and services, and center them on customer needs.

Why is AI an Opportunity Now?
AI is deemed by many organizations as an opportunity to achieve certain aims and goals. Based on results from within the survey, we had a look at what some executives wanted to achieve through the use of AI in their business.

Interestingly, 57% of the respondents said that they wanted to increase their operational, supply chain, and maintenance efficiency through the use of AI. In addition, 45% wanted to improve customer experience and 41% were looking forward to enhancing the practicability of their offerings.

Industrial companies also wanted to use AI to increase employee productivity, and to create new business models, products, and services. This offers a hint that the end goal these organizations are seeking from using AI is rapidly changing. As time goes by, organizations are beginning to realize that AI has multi-faceted benefits, and should be used accordingly.

The Hindrances and Challenges
From the results of HPE’s survey, we learned that over 50% of companies said they are engaged with AI, with 11 percent having already implemented the technology in core functions or activities, 14 percent planning to do so within the next twelve months, and 36 percent evaluating the implementation. However, 39% of companies still had no plans to implement AI.

Additionally, respondents were asked about the obstacles to implementing AI. Nearly half of respondents consider the lack of data quantity and quality as a hindrance in the implementation of AI.

One example that springs to mind is a client HPE worked with, from the tobacco industry. They had five separate machines operating within the same area. However, one of them would fail every once in a while. It was extremely perplexing and confusing for the client, as they now had a machine that would fail numerous times, whereas all others in the same place operated well. The reason for the continuous failures in the machine, after endless research, was found to be present in the humidity levels in the factory. This caused issues with the metal used in that specific machine. Now, this was an issue that an organization would never be able to find out about. They couldn’t correlate humidity during the day with the type of metal. Fast-forwarding to now, these are the types of issues you can handle with the proper use of AI.

Moving onwards, 42% of all respondents believed that lack of AI skills and knowledge was a hindrance in the implementation of AI in their organization, while 34% also believed that lack of data governance and enterprise data architecture was an obstacle providing high quality data for AI models and stopping them from widespread implementation.

How to Start with AI Now
When asked about the departments they would like to implement AI solutions in, most executives pointed towards obvious answers, but in a rather interesting chronology: 38% mentioned they would use AI in research and development if they had the opportunity. Maintenance and operations departments were tied at 34% and 32%; services followed with 29% of people saying they would implement AI solutions in this department.

The interest in the use of AI in research and development has led to an interesting use case for AI.

Why are organizations currently bent on using AI in research and development, more than any other department? A couple of reasons for this could be:

As new product development is one of the end goals that organizations want out of AI, they believe that research and development should be the one department benefitting the most from it.
Data in research and development has significantly increased over time. The research and development team provides proof of concepts to test new AI technologies before they are moved into full production.
While R&D received the most votes in the survey, respondents are deploying AI in broad range of departments and working together with partners in the ecosystem. An ecosystem stitches the model together and makes it work.

What to Expect from the Future: Is It a Job Killer?
With AI expected to become a bigger and better part of the organizational ecosystem in the future, HPE’s survey also asked respondents which scenarios they were expecting from the technology by 2030. Almost 55% of respondents voted in favor of predictive manufacturing, which includes superior predictions of demand and other metrics.

Moreover, 45% voted for self-repairing and self-configuring machines, and 44% for mass customization. This is not only about making maintenance easy by repairing and configuring their mechanism themselves, but it’s a foundation for implementing key visions of Industry 4.0, for example order-driven production, and others.

When asked about AI phasing out human input by 2030, two-thirds of respondents mentioned that AI will not be a job killer. AI will enable us to work better, more efficient and do new things we have never been able to do. It also holds the potential for changing and disrupting business processes and models for the better. While results from the survey are positive, how it actually pans out is yet to be seen.


Ronald helps data driven companies generating business value with best of breed solutions and a hands-on approach. He has been recognized as one of the top 10 global influencers by DataConomy for predictive analytics, and by Klout for Data Science, Big Data, Business Intelligence and Data Mining and is guest author on leading Big Data sites, is speaker/chairman/panel member on national and international webinars and events and runs a successful series of webinar on Big Data and on Digital Transformation. He has been active in the data (process) management domain for more than 18 years, has founded multiple companies and is now director at a Data Consultancy company, leader in Big Data & data process management solutions. Broad interest in big data, data science, predictive analytics, business intelligence, customer experience and data mining. Feel free to connect on Twitter or LinkedIn to stay up to date on success stories.

More Posts - Website

Follow Me:

Author information

Ronald helps data driven companies generating business value with best of breed solutions and a hands-on approach. He has been recognized as one of the top 10 global influencers by DataConomy for predictive analytics, and by Klout for Data Science, Big Data, Business Intelligence and Data Mining and is guest author on leading Big Data sites, is speaker/chairman/panel member on national and international webinars and events and runs a successful series of webinar on Big Data and on Digital Transformation. He has been active in the data (process) management domain for more than 18 years, has founded multiple companies and is now director at a Data Consultancy company, leader in Big Data & data process management solutions. Broad interest in big data, data science, predictive analytics, business intelligence, customer experience and data mining. Feel free to connect on Twitter or LinkedIn to stay up to date on success stories.

The post AI Transforms Industrial IoT appeared first on Ronald van Loons.


October 03, 2018

Revolution Analytics

In case you missed it: September 2018 roundup

In case you missed them, here are some articles from September of particular interest to R users. R code by Barry Rowlingson to replicate an XKCD comic about curve fitting. The rayshader package...


Multi-channel marketing campaign optimization system

A real-time application based on predictive Data Mining algorithms, supporting the selection of offers, products and/or additional services, as well as the selection of channels of communication with customers.

Who Will Benefit

The above-mentioned solution is addressed to companies with a large customer databases with access to data concerning their shopping preferences. Such information makes it possible to create individual customer profiles.

Nowadays, companies that are oriented to maximize profit and operate in a competitive market, have to adapt their offer to the individual needs of each customer. Therefore, they strive to offer as many additional products or services as possible.

At the same time, finding the optimal offer is becoming more and more difficult, as product offers tend to change often. Moreover, companies sell dozens or hundreds of product variants instead of a single product. On the other hand, customers should not be spammed with offers that are not tailored to their preferences. Nor should they be recipients of inconsistent marketing communication. It is also important to bear in mind the budgetary constraints of marketing activities.

Conclusion: tailoring the offer to customer needs and choosing the best time and communication channel are key to successful closing of the transaction.

Solving the problem

The solution is to implement a system for creating offer recommendations (linking a given offer with a specific customer) and building a marketing campaign aimed at meeting customer expectations.

Such solution to be effective must meet following conditions:

  • The client receives marketing communication only when his expected financial effectiveness exceeds a certain threshold.
  • The maximum workload of the communication channels is taken into account.
  • The campaign plan aims to maximise one criterion, e.g. sales effectiveness.
  • Budgetary constraints are taken into account.


Benefits of implementation:

  • higher percentage of positive responses to campaigns, increased sales and, consequently, increased income
  • a positive image of the company that meets the expectations of customers
  • lower campaign costs
  • improved customer satisfaction, lower number of complaints
  • lower probability of churn
  • possibility to determine the optimal sales strategy

Examples of applications in specific sectors

Banking and finance

Currently, banks and financial institutions offer a wide range of products, starting from traditional financial products (loans, savings accounts, credit cards, etc.), through additional ones (investment funds, insurance, leasing, currency exchange), to products offered by their partners (e.g. accounting, telephone subscription, Internet, electricity sales). A properly matched offer increases the chance of customers acquisition and building relationships with them, as well as gaining trust, which is of key importance in the case of institutions managing financial assets.


Apart from selling traditional products, such as phone subscriptions or prepaid cards, telecommunication operators also offer other additional products, e.g. data transmission packages, devices, products of their partners (applications, music, banking). At the same time, part of the offer is addressed to a specific group of recipients for whom time and communication channel are of key importance. Problems with effective sales are caused by a wide range of products and coordination of marketing activities, which should be aimed at increasing communication consistency across multiple channels.


E-commerce companies use the Internet as their main sales channel, however they operate in a difficult and competitive business environment. Their main task is active communication by means of the Internet, which correlates with an increased positive response to campaigns. E-commerce services are aimed at optimizing marketing communication and the how it is targeted (addressing a specific offer to specific customers). What is more, if a company wants to offer an additional product when a customer makes a purchase, it has to do it quickly and in line with the customer’s preferences.

Utilities – electricity

Similar to the telecommunications, the electricity market is becoming more and more difficult and the offer of companies operating in it is becoming more and more complex. In addition, the possibility of changing energy supplier makes the competition grow, which also leads to offers better customized to customers. At the same time, with the development of new technologies, the concept of the intelligent network (network which automatically informs about the level of energy consumption) is becoming more and more important. Based on such data, suppliers may tailor their offer to the needs of customers. In order to achieve the best result, the offer must be targeted at the right customer at the right time and through the right communication channel.

How the system works

For each product offered by the company, the system creates a predictive model that forecasts the impact of marketing communication on the customer’s purchase decision. It is also possible to estimate the profit for each product. By using these information, the efficiency factor for a given marketing communication is calculated. If the product database is large and changes dynamically, this task should be automated. The application uses the tool to build scoring models (ABM). The next step is to implement optimization, which takes into account the limitations introduced by the company: budget size, capacity of communication channels, etc. As a result, the application exports to the transaction system a database in which each customer has an assigned campaign plan, i.e. a product and communication channel corresponding to its needs.

Available components developed by Algolytics used to create the solution

  • Scoring One – highly efficient calculation of results for millions of records using hundreds of models
  • Automatic Business Modeler (ABM) – a tool for automatic construction and updating of predictive models. It provides full automation of necessary but time-consuming actions, such as variable selection, transformations, interaction modeling or selection of the best model.

How to use ABM – the machine learning part of the system

In the following steps we will how you how to use Automatic Business Modeler. What is more, you will learn how to create and use models on-line, as well as by using API.

Automatic Business Modeler:

Information about the API in ABM can be found here:

Uploading a table with data for modeling

Uploading a table to ABM can be done in two ways: from the website or using the API. It is simple and intuitive on the website. Go to the repository and press the Uploadbutton. Then search for a .csv file you want to use.

Wielokanałowy system optymalizacji kampanii marketingowych - 1

API requires the following curl and POST method:

$ curl '' -i -X POST -H 'Accept: */*' -H 'Content-Type: application/json; charset=UTF-8' -d '{ "content" : "var1,var2,target\\n1,2,1", #names of the columns in the table "fileName" : "fileName.csv", "tableName" : "tableName1", "csvSettings" : { "columnSeparator" : ",", "decimalSeparator" : ".", "textSeparator" : "\\\"", "hasHeader" : "true", "encoding" : "UTF-8" } }' -H 'Authorization: Bearer <<ENTER_YOUR_TOKEN_HERE>>

The token, which is an API key generated in ABM in the API tab or via curl using the POST method, has been written in bold:

$ curl '

In response, “access_token:” is returned, which is active for half an hour. It must be generated each time it is to be used.

Creating a predictive model – a new project

Model is the result of the created project.

In the GUI, add a new project, select the data on which you want to create a model, and then adjust the appropriate settings.

Wielokanałowy system optymalizacji kampanii marketingowych - 2

Wielokanałowy system optymalizacji kampanii marketingowych - 3

Create a project using the POST API:

$ curl '' -i -X POST -H 'Accept: */*' -H 'Content-Type: application/json; charset=UTF-8' -d '{ "name" : "project1", "tableName" : "dataTable", "processType" : "classification”, "processParameters" : { "processMethod" : "quick", "target" : "Class", "positiveTargetValue" : "good", "variableRoles" : [ { #optional "name" : "var1", "role" : "ACTIVE", "type" : "NUMERICAL" }, { "name" : "var2", "role" : "ACTIVE", "type" : "CATEGORICAL" } ], "samplingSize" : 30000, #optional "samplingStratificationMode" : "CONST_NUM", #optional "samplingPositiveTargetCategoryRatio" : 0.5, #optional "samplingMode" : "MANUAL", #optional "classificationThreshold" : 0.5, #optional "cutoff" : 0.1, #optional "qualityMeasure" : "LIFT", #optional "useTestData" : true, #optional "classificationThresholdType" : "MANUAL PROBABILITY", #optional "profitMatrixOptimized" : false, #optional "profitMatrixCurrency" : "euro", #optional "profitMatrixTruePositive" : 1, #optional "profitMatrixFalseNegative" : 0, #optional "profitMatrixFalsePositive" : 0, #optional "profitMatrixTrueNegative" : 1 #optional } }' -H 'Authorization: Bearer <<ENTER_YOUR_TOKEN_HERE>>

Project building is done via curl using PUT method.

$ curl '{project_id}/build' -i -X PUT -H 'Accept: */*' -H 'Content-Type: text/plain; charset=ISO-8859-1' -d '{ "action" : "start" }' -H 'Authorization: Bearer <<ENTER_YOUR_TOKEN_HERE>>

The project building process can be verified using the GET method and:

$ curl '{project_id}/build' -i -X GET -H 'Accept: */*' - -H 'Authorization: Bearer <<ENTER_YOUR_TOKEN_HERE>>

Model statistics

ABM also allows you to export statistics about the project. This step can be done via the POST method:

$ curl '{project_id}/statistics -i -X GET -H 'Accept: */*' - -H 'Authorization: Bearer <<ENTER_YOUR_TOKEN_HERE>>

In ABM you can view the statistics by pressing Model statistics, which is the penultimate stage of building the model.

Wielokanałowy system optymalizacji kampanii marketingowych - 4

Wielokanałowy system optymalizacji kampanii marketingowych - 5

Wielokanałowy system optymalizacji kampanii marketingowych - 6

Application of the model

Automatic Business Modeler allows you to use the created model through the API. You can use the model to score a single record, as well as to score the entire file with records.

In order to score the table through the website, simply select Score data and follow the instructions.

To use a model through the API you will need a model identifier.

It can be found in the project menu. After you press Deploy, a message with an identifier will appear.

Wielokanałowy system optymalizacji kampanii marketingowych - 7

You can also use the following curl (POST method):

$ curl '{project_id}/deploy -i -X POST -H 'Accept: */*' -H "Content-Type: application/json" -d '{ "overwriteIfExists" : false }' -H 'Authorization: Bearer <<ENTER_YOUR_TOKEN_HERE>>

When the model identifier is known, it is time to proceed to prediction. Below there is a curl allowing the use of a predictive model on a single line.

$ curl '{model_id}/score' -i -X POST -H 'Accept: */*' -H 'Content-Type: application/json; charset=UTF-8' -d '{ "dataRow" : "{ \"col1\": \"value1\", \"col2: \"value2\", \"col3\": \"value3\" }" }' -H 'Authorization: Bearer <<ENTER_YOUR_TOKEN_HERE>>

The probability of success for the given data will be returned.

Scoring code

Sometimes you may need more than only score value – the scoring code may also be at the centre of our interest. Naturally, there are two options: click in the project menu or API.

First option: select Scoring code in the project menu.

ABM - Wielokanałowy system optymalizacji kampanii marketingowych - 8

The API can be used with the GET method:

$ curl '{project_id}/scoringCode?dialect=JAVA’-i -H 'Accept: */*' ' -H 'Authorization: Bearer <<ENTER_YOUR_TOKEN_HERE>>

The language in which the scoring code is to be exported should be entered as a dialect: JAVA, SQL, SQL_MSSQL, SQL_TERADATA, SQL_ORACLE, SQL_POSTGRES.

You can easily and primarily – quickly create a model, then connect it with the application, and scan new data in real time. All of this in a fully automated way.

Artykuł Multi-channel marketing campaign optimization system pochodzi z serwisu Algolytics.


October 02, 2018

Revolution Analytics

AI, Machine Learning and Data Science Announcements from Microsoft Ignite

Microsoft Ignite, Microsoft's annual developer conference, wrapped up last week and many of the big announcements focused on artificial intelligence and machine learning. The keynote presentation...

Making Data Meaningful

A Piece of Christmas Pie

The most criticized tool in Data Visualization is the pie chart. There are many areas of debate in the world of Data Visualization, but there is little debate among the experts about the pie chart. The number one rule about pie charts is “Don’t Use Pie Charts”. Personally, I’m not offended by them. I understand that it has been the tool of choice and that it has become ingrained into society and business. However, I am in complete support of the expert opinions. Pie charts are deficient in displaying and comparing data. There are a few acceptable uses for them, but in most cases a simple bar chart would be a better tool overall and provide a much better visual comparison.

I have heard people argue that pie charts take up less space or that they are easier to understand, but even these arguments are not valid. There are just too many fundamental problems with pie charts and this is why I advocate that they should not be used. Let’s examine a very simple data set and compare. Here is a table of The Twelve Days of Christmas.

Below is a pie chart of the Twelve Days of Christmas and basically the default view from Excel. To help this visual I’ve followed a common rule of pie charts which is to start at noon and move clockwise from the largest to smallest. The other common practice, as described by Dona Wong in The Wall Street Journal Guide to Information Graphics: The Dos and Don’ts of Presenting Data, Facts, and Figures, is to place the largest slice at noon and the second largest slice to the left of noon and then clockwise with the remaining largest to smaller. I find this practice to be even more confusing, unless the last category is “Other” or “Misc.” and therefore an aggregation of the remaining smaller categories. Also, I added the data to the legend and resized it as large as reasonably possible to make the text readable.

Note the following problems with the pie chart:
• To visually compare the reader must go back and forth from the pie chart to the legend to determine which present matches which color. It would be impossible to list the labels within each slice because the text would be too long. Another popular option is to create lines from the pie chart pointing to each label and place the labels around the pie chart. This creates a very busy chart and clutters the chart with extra lines.
• The use of many different colors is required to create a categorical comparison color scheme. This makes it difficult to see the difference in colors from the shades of blue, red and purple.
• The comparison between the categories is very difficult. The eye cannot easily discern between the size of the “Drummers Drumming” and the “Pipers Piping”. This is because the size of the pie slice is not easily calculated.
• The beginning of one category starts at the end of the previous category. This means that you cannot compare multiple categories from the same baseline, because the baseline shifts from one category to the next.
• Finally, to generate a pie chart it is necessary to calculate the percentage of the categories, after all a pie chart is by nature showing 100% and not 78 total gifts. This may be done manually, but that is not necessary as the software used to create the pie chart will do this automatically (these example charts were built in Microsoft Excel). Now in some cases a percentage might be the correct measure, but in other cases the values may be more appropriate. Below are the calculated fields for what the pie chart is actually showing.

There is nothing wrong mathematically with the pie chart. There are twice as many Geese a Laying then there are French Hens and three times as many Ladies Dancing and French Hens. However, the comparison between these is exactly the point. The pie chart does not make it easy to tell that comparison. It’s hard enough to tell which slice is bigger. It would be impossible to discern twice as much or three times as much.
Here is the same data graphed using a simple bar chart.

This chart solves all of the problems mentioned above.
• Comparisons are made easily from one category to the other because the baseline is now the same for each category. Turtle Doves is clearly twice as many as the Partridge in the Pear Tree. There is no question if there are more Ladies Dancing or Maids a Milking.
• Color is easily managed. There is no color requirement to discern between categories. In fact, this graph could be done in gray scale and printed on a black and white printer or copy machine and it would still be usable.
• The axis labels are now adjacent to the data and the bar. This allows for a very compact chart and is easy to read.
• Finally, unless the pie chart is shrunk to a tiny graphic, for example as a data layer on top of a map, then there is no real space savings. In fact, the bar chart takes up less room on the page and is more readable than the pie chart.

Hopefully this holiday example illustrates the problems associated with using pie charts and the better alternatives. Best wishes for a safe and happy holidays and please keep checking back for more on Data Visualization.

The post A Piece of Christmas Pie appeared first on Making Data Meaningful.

Making Data Meaningful

Project Management – Critical Success Factors for Consultants

Project Management can be intimidating for internal project managers, but for an external consultant the challenges are amplified.  In addition to the typical responsibilities of defining tasks, resources, durations and dependencies, consultants also must address client policies, procedures, organizational structures, and the ever-present internal politics, while remaining focused on the project deliverables.

The following provides some suggestions for achieving success as a consultant project manager.


–          Every organization considers themselves unique and different.  It is critical for you to understand what makes this specific entity different in their eyes.  It might be the organizational structure, their culture, or their operating philosophy.  A good place to start your search for these answers is to review the company’s annual report and/or its website.  Look for their Mission Statement, Investor Relations, Company History, and Goals & Objectives.  If at all possible, you should do this research before your first day at the client

Project Sponsorship

–          When you start any consulting assignment, it is imperative to identify the key individual who is responsible for you being there.  This person may not always be the project sponsor, so you need to take the additional step of determining who has the final decision-making authority for the project and introducing yourself to them.  This person is typically at an executive level, so getting an audience may be difficult.  Ask for the meeting by providing a clear objective to the sponsor.  For example, “I would like 15 minutes of your time to introduce myself and discuss your specific goals for the project including how you would like me to communicate with you going forward.”

Project Team

–          The odds are good that you will not be the only resource working on the project.  How you manage the pool of resources that is available to the project is arguably the most challenging aspect of project management.  They normally have full-time jobs, and the project work only adds to their workload.  They also possess the expertise and skills necessary for project success, or at least their management thinks they should or they wouldn’t be assigned to the project.  As project manager, you are expected to make this group of individuals into a team with a focus on achieving the project goals, but as an outside consultant, you will not have any context from which to determine a person’s capabilities or performance.  An excellent way to assess the available talent is to have them provide input into the detailed project plan.  I will explain how to do this in more detail in the “Project Methodology” section below, but people will typically perform at a higher level when they not only understand the goals, but have helped to define what needs to be done to achieve those goals.  If some tells you what, when, and how they are going to do something, they will feel a sense of ownership much more than if someone else just tells them what to do.

Project Communications

–          There is a myriad of tools available to manage a project.  The most common is Microsoft Project, which contains all the functions needed to manage a typical project.  While I strongly recommend developing an expertise in MS Project, I have also found this tool can be overwhelming for non project managers.  Just because you need a good project management tool to do your job, don’t force it upon everyone else involved.  As indicated above, you should already know what your Project Sponsor is looking for in terms of communications, but also find out what the management team and the project members want in terms of details and determine what the organization uses for these types of communications.  It may be Excel, Word, or maybe even just email.  Whatever it is, use it!  You should structure your project using phases, summary tasks and milestones so you can easily report at the level of detail needed.  I have found a simple Excel file with four spreadsheets in it is a great way to communicate key project items.  The document is called a RAID.  This is an acronym for the four sheets: Risks, Action Items, Issues, Decisions. The actions items come straight from the project plan, but lists only key tasks in the period being reported.  The risks, issues, and decisions tabs provide a historical record of those key areas that make or break any project.  Whether you use Excel as I do, or some other tool, these four sections are important ones to communicate.

Project Methodology

–          If you have taken all the above steps you are well on your way to being a successful project manager, but there is one last area that pulls all this together: the methodology you use to define and manage the project.  I am a strong proponent of having an involved project team.  The best way to get and keep people involved is to provide an environment that fosters participation, ownership, and responsibility.  Here is an approach that has served me very well:

  • Clearly define the project goals. (Remember your meeting with the project sponsor?)
  • Gather the management team and define key project milestones and resources.
  • Invite the project team to a kick-off meeting where you should explain the goals and the key milestones, and use this meeting to define the key tasks necessary to achieve each milestone.  Once a task is defined, ask who should perform the task.  As the group dynamics take effect, it will be easy to identify the leaders, the followers, the supporters, and the detractors.
  • After the kick-off meeting, publish the basis task plan showing the key milestones with the tasks and resources supporting each to the project team.  Ask each person to provide a time estimate for each of their tasks, and a list of what they need in order to complete them.
  • Once you have the above information, you have everything you need to develop your detailed project plan with resource assignments, durations and dependencies.  If you use MS Project, gaps and conflicts will be easy to identify and address.
  • Now you can publish the detailed project plan to your team for final review before it goes out to the management team.

As a Consultant Project manager, you may face some additional challenges, but I trust that the above is useful in helping you to overcome them and keep you on track for achieving customer success.

The post Project Management – Critical Success Factors for Consultants appeared first on Making Data Meaningful.

Making Data Meaningful

Big Data, Intelligence and Analytics Strategy for the CIO

I’ve had the opportunity to meet and collaborate with over 50 CIO’s and IT executives over the last 6 months. Through these encounters, I have learned by my own trials as well as through the examples of my colleagues. For those CIOs that may not have already fully embraced a Big Data and Analytics strategy, here are my thoughts and reactions as to how a CIO should shape a strategy for the remainder of 2014.   Disclaimer:  I believe this strategy will not be static and will not remain holistically the same for 2105. We’ll update this article by the end of December to see how our predictions turned out and what’s in store for 2015.  

To begin, I have to put myself in the shoes of my CIO colleagues and have seen how strategy is created and then executed in today’s corporate IT world. These strategies remind me of Mike Tyson’s famous quote: “Everyone has a plan until they get punched in the face”. In the world of data, reporting, analytics, business intelligence, and Big Data, the challenge is that all of the planning you would normally do on a large IT project doesn’t necessarily count. Do you plan?  Of course you do. It’s just different.

Through my experience over the last six months, the 3 Swim Lane Strategy has come up time and time again. Below is the description of what exactly the strategy is as well as how to efficiently execute it into your IT projects.

First and foremost, CIO’s are constantly being asked by their CEO’s “What is the IT strategy?” for Big Data and Analytics. I’ve heard this time and time again.   I believe it is almost the right question, but certainly NOT the right question. Rather than mentioning an “IT strategy”, the CEO should ask: “How do we increase short-term productivity via data in our business?” and “Where can we use data analytics as a tool to increase the quality and quantity of our revenue and margins?” Nowadays, it appears that a good amount of businesses are short of seasoned and knowledgeable workers.  The CEO should question, “How do I leverage data analytics to fill in the job gap by giving the existing personnel better information that may be timelier, more granular, more varied, and more accurate?”

I am not questioning that the CIO needs an IT strategy.  Rather, it’d be a better description to call it an “iterative strategy process”.  In this area of IT and business, the strategy develops over time and needs to have constant management and measurements from both IT and business perspectives.

It would be setting false expectations if a CIO labeled Big Data an IT project. By calling Big Data an IT Project you would be implying that it would have a beginning as well as an end just by pure definition. The truth is Big Data is a strategy to increase productivity in many areas of an enterprise.   Big Data has large implications that range from increasing the intimacy of “one view of a customer” to measuring the frying temperature of every deep fryer in a large fast food chain. Applications leveraging existing data are everywhere.  They are not in a formal relational database table every time.  Largely, they are in a mostly unstructured format.  Remember, keep saying “This is a Journey, not an event”.  To both the business sponsor and the IT professional, it’s not a classic waterfall project; it is very counter-intuitive and unnatural.

Here’s the second “swim lane”. Rather than shoe-horn the journey into your existing infrastructure, the CIO needs to consider whether to “outsource” the platform to the cloud. Whether you choose Amazon, Azure, Peak 10, or another platform is not the issue. Choosing a platform that minimizes the Internal IT staff support and maximizes the business value is the key. Use the old licenses and old platform tools? Maybe. Maybe not. I believe you need to look at starting over, with the business requirements fully in mind. One of my clients recently bought 250 web visualization tool user licenses. By buying the client-based tool, the business can immediate leverage their power users and IT support to catch-up on the backlog of information reporting requests. By providing immediate momentum in information reporting functionality, the business outcome leads to IT gaining credibility that they are engaged in for the success of the business.

To summarize our 2nd admittedly tactical swim lane:  There’s a lot of pent-up demand for more information, updated reports, and general data.   Do we wait until we can truly get the Big Data strategy and subsequent platform?    “What would John do?”  Two things.   Provide more reporting in both the old format and new format.  Even if you have to buy or renew software licenses and hire consultants, you still have to keep credibility of the present demand while looking long-term at a more sustainable environment.

The third “swim lane” is all about Technology. With the advent of Big Data, Cloud, and new database formats, there’s going to be a large requirement for new technology while continuing to upgrade the transport of that technology. Furthermore, a well thought through roadmap leveraged by key technology decisions accelerates the organization’s momentum for Big Data. A well-developed plan should show timeframe, functionality, service level (SLA), resources, and budget.  This “refresh” will have to be managed by a savvy PMO combining the new technology with a highly iterative project plan. A lot of project managers have harnessed themselves into the well-developed Waterfall Project. But, this is Big Data. This is not the traditional project:  signed off, fully detailed requirements followed by a discrete plan with detailed milestones with discrete deliverables. It’s more focused on Research and Development than Production.   In manufacturing terms, it’s a pilot plant. With the “Journey” in mind, it’s not a bad thing for Pilots to become full Production environments as long as we understand some of the potential rework that may come later on from this approach.

I hope the three swim lanes make sense. I have witnessed several CIOs leveraging this formula for both short-term and long-term survival and growth. To conclude, I hope that you start “swimming” for success!

The post Big Data, Intelligence and Analytics Strategy for the CIO appeared first on Making Data Meaningful.

Making Data Meaningful

What Good is a Data Warehouse If I Can’t Trust the Data?

In this age of “Big Data”, and even bigger plans for how companies are planning to use their data to gain a competitive advantage, one pre-Big Data era concept still remains relevant – those data warehouses are not of any value if the data cannot be trusted.

Data Quality is the true backbone of any data warehousing solution, and that term can be split into 2 simple concepts:  1) understanding what you have, and 2) comparing what you have to what you want.  In technical terms, we call #1, “Data Profiling” and #2, “Data Auditing”.

At Making Data Meaningful, we have designed a successful Data Profiling process that enables us to extract the underlying source table structures within seconds and share that information in a meaningful way with Business Users.  Once we have completed that initial assessment, we then work with the Business Users to determine the business rules that will become the basis for the Data Auditing processes.

In order to make it easier for the Business Users to participate in the Data Auditing process, Making Data Meaningful has designed a system that uses Error Handling tables to capture data that does not conform to the business rules they have identified. The data in those tables are exposed to the Business Users using user-friendly BI tools like Tableau, MicroStrategy, Business Objects, etc. or even Excel so that these records can be viewed by the Business in a format that is simple to understand.

Lastly, Making Data Meaningful understands that data quality is an ongoing process so we work with our clients to ensure that they have a Business Intelligence Competency Center (BICC) in place so that Data Governance remains one of the top tasks within the data warehouse solution even after we’re gone.  We help organizations define the role of Data Stewards and assist those Stewards with the Data Auditing process by working with them to generate a plan of action to either correct non-compliant data or modify the business rule.

Although Data Profiling and Data Auditing may not get the same glamorous attention that Big Data does, they are terms that are critical to the success of any data warehousing solution.  They are processes that Making Data Meaningful has invested a lot of effort into refining for the benefit of our clients.

The post What Good is a Data Warehouse If I Can’t Trust the Data? appeared first on Making Data Meaningful.


September 30, 2018

Simplified Analytics

Ask the Expert: What are the most important elements to consider when undertaking a digital transformation?

CEO and founder of Going Digital Sandeep Raut tackles the obstacles and essentials companies must keep in mind with regard to digital transformation to Enterprise Management 360Ask the...


September 29, 2018

Making Data Meaningful

It’s All About the Data

I have worked on many projects over the years and the success or failure of those projects boils down to one thing: Data. One of the most under estimated parts of a project is gathering and cleaning the data. Management is focused on the final reporting, sponsors are focused on the cost, IT is focused on the technology, and users are focused on the interface; but none of these things can work correctly without clean quality data sources.

Why Is the Data So Important?

A question that should always be asked at the start of a project is, “Why is this system being built?” It might be used to input sales, or ship orders, or record service time, but what is its ultimate benefit to the business? It’s benefit should be to provide metrics for business leaders to make decisions. Sure, it may have other important functionality, but one reason to automate a process is to get the data business leaders need to answer questions like: what areas or products have our highest profit margin? Who is selling the most products? Who are our best customers? Is customer service treating the customer right? In order to answer these questions, quality data must be captured. Having incorrect or incomplete data will only produce inaccurate analysis and reporting. Inaccuracies in reporting can cause business leaders to make decisions that lower company profits or lead the business in the wrong direction. Even a small misstep can take a lot of time and manpower to correct.

Data may come from many sources, but can be categorized into three areas. Depending on the data integrity of the sources, there may be different levels of cleaning that need to be performed to achieve “grade A” data.

Common Data Sources

1. External to company – Records are sent from a vendor or partner company supplying details on orders or sales. Make sure that records contain all the fields needed, the data is clean, complete, correctly ordered and there are keys to join this data to other related datasets. It may be difficult to get corrections made to data at this source.

2. Internal to company – Records are sent from another department or division within the company. These may be supply manufacturing costs, quantity produced/sold, sales or client information. Make sure that records contain all the fields needed, the data is clean, correctly ordered and there are keys to join this data to other related datasets. It should be easier to work with these source owners if corrections are needed.

3. Data entry – Records are keyed in by customer service or the sales department through applications or tools like Salesforce. This source is one of the most difficult to get clean, consistent data from. Application changes may need to be made to ensure the clean entry. These can cause delays as changes may take months to complete.

Getting the data sources lined up can be very time consuming if changes are required. Source systems may take months to accommodate a change request. Find the problems and start addressing them early.

One common solution to data problems is to push them off until later in the project because deliverables and timelines have to be met. “We’ll come back and fix that later,” or “It’s not that big of a problem. It will be fixed in phase 2,” are often costly oversights on management’s part. When that data issue is not fixed or the project is rushed into production early, these seemingly little issues can cause major problems. Below are a few examples that will hopefully drive my point home.

A Costly Problem

Applications that are used to enter data into a system should be thoroughly tested to make sure validation and record creation is correct. It’s difficult, time consuming, and expensive to fix data issues made months ago by an application bug.

A few years ago I went to a client to help resolve problems in a CRM billing system. While working on the issue, I uncovered an unknown issue where the flag that tracked the progression of a service ticket through the system was being set incorrectly and causing tickets to be lost in the system. After the application bug and records were fixed, it was discovered that $75,000 of service tickets had not been billed. Many of the tickets were no longer eligible to bill, but $57,000 was successfully recouped.

This problem could have been avoided with better testing, but also with a report tracking ticket creation, billing and settlement. Dirty data can be a problem affecting decision-making, but lost data can quickly affect the bottom line.

Not Quite There Yet

Another time, I went to work on a project for a charitable organization which was doing a 300 million dollar fund drive over a number of years. They were moving the system from a mainframe to a client-server platform. I had to read through the old program code and created a new reporting system. While reviewing the old code (which had been in place for years) I found an issue in the way records were categorized and totaled. The bug was causing certain types of donations to be counted twice. The result of this little bug was that reporting was overstating donations by $3 million dollars.

A Taxing Situation

I joined the project team of a client who was writing their own payroll system so they would not have to lease costly software. The deadline was getting close and the staff was working hard to meet it. I worked on the reporting team. The new reporting was used to validate the system and any problem that was associated with the reporting until it could be traced back to a problem with data entry or system processing. Once the problem was located it was quickly fixed. The bigger problem was that management had already decided to start using the system and had begun running payrolls with the new system. Things hummed along well and the system was up and working … until tax time came around. When the client went to run W2s for clients they discovered that although the bugs had been fixed in the system, often the data was not. Thousands of W2s were incorrect and the staff worked around the clock to fix data problems, rerun months of payroll, and reproduce thousands of W2s. I don’t know what the total cost was to the company, but it did cost a few jobs.

I hope you can see why data quality is so important and should not be left as a bottom task on the project plan. Quality source data is a critical factor for a successful system. Management is counting on the correct reporting and analysis to guide the direction of the company. Just remember the old adage: garbage in = garbage out.

The post It’s All About the Data appeared first on Making Data Meaningful.

Making Data Meaningful

Five Ways Analytics Leads to Better Business Decisions

Has this happened to you? The time to make a decision comes before the feeling that you have enough information to reach the best decision. You settle on a decision to buy enough time to get the information you are missing.  When you finally get the information you need, you realize the option you want is no longer on the table.

Analytics has emerged as a key ingredient for delivering meaningful data to decision-makers in a timely fashion. Here are five ways that analytics leads to better business decisions:

  1. Process: Analytics requires the gathering and analysis of data. The process of determining what to measure, and what meanings to attach to those measures, focuses attention on the elements that will make a decision a good one.
  2. Data Exploitation: Transaction data systems begin feeding into data analysis systems.  The expense of gathering all the data needed to operate the business becomes an investment in ways to manage the business. Analytics points the way to decisions that focus resources where acceptable risks will generate the highest returns.
  3. Data Quality: The importance of data increases when the data life cycle does not end with the invoice payment and the tax return. When an organization recognizes that the quality of the data has a direct impact on million dollar decisions, systems emerge to monitor, assess, and improve data quality.
  4. Transparency: Where options are chosen with a clear vision of the basis for the decision it is easier to get broad based support for the choice. Not only do more participants understand the decision but what it will take to make it a reality and the impact its success will have on the organization.
  5. Confidence: A decision based on properly applied analysis of meaningful data gives the decision-maker confidence that gets communicated to those executing the decision, which in turn increases its chance of success, making it an overall better decision.

The post Five Ways Analytics Leads to Better Business Decisions appeared first on Making Data Meaningful.

Making Data Meaningful

The Evolution of Marketing: A Digital Conversation

Traditionally, most people view marketing as promoting and selling a particular product or service; but marketing also entails research, branding, public relations, community relations, web management, data capture, social media, and much more. Because of this broad range of responsibilities, marketers must be able to act using both the analytical and creative sides of their brain. A left-brained or logical marketing person would typically stick to research, while their right-brained counterparts draw up the tantalizing images and sell them to the world. At this juncture in our data laden world, each of these professionals needs an understanding of the practice of gathering data, analyzing its results, and then being able to make magic with this information.

The client experience is much different now than ever before. Clients create a vast amount of information that allows a company to gain intelligence into their habits and then analyze this data accordingly to react at a higher level of effectiveness. Digital marketing has given companies so much more to handle. However, the best companies seem to no longer be reacting. Instead, they are driving the digital experience for you.

We are all well aware of the fact that companies can see what we’re viewing through the internet and tailor promotions based on our engagement and activities. They can target digital marketing messages using our location if we’re accessing certain things on our mobile devices. For example, I went through a McDonald’s drive-thru a few months back and was viewing my Twitter feed while waiting for my food. Once I refreshed the feed I saw a promoted tweet from McDonald’s about a new breakfast deal – it was 9 AM and I do not follow the McDonald’s twitter feed. I remember thinking, “how did they do that?”

Normally, the social media communications aspect would be left up to one area of marketers, while the act of gathering geographical information of customers would be kept separate until a monthly meeting where this information would then be presented to the team as a whole. Using business intelligence and real-time analytics allows information to be gathered, combined, and used immediately. As a result, every marketing team member needs to be able to analyze the presented data and have a plan already in place to capitalize on each opportunity. The digital conversation is revolutionizing the way marketing works.

We used to think that the sales person was always the most influential point of contact with a client. The marketing team would send out feelers, spark interest, and the sales person would come in to do the grunt work and close the deal. Marketers hoped that their message would resonate; their communications were a broadcast, not the conversation. So much has changed in that digital interactions are now considered to be a major conversation piece between company and client. Many clients’ minds are made up about a company or product well before they enter into the sales cycle with a specific person. The information gathered from social media, search engines, and website engagement is huge, relevant, and actionable.

For example: I know that John Smith from ABC Company has viewed a particular service that we offer. I target my follow-up with him to include, not only what he was already viewing, but also even more services we provide that would complement his interests. If Mr. Smith needs additional information on data management, I can provide him with information on how to begin a BI project from the ground up and who needs to be involved in that process. I will show him how we build a data warehouse and then how we would implement specific measures to ensure the security of his information. And because I know that he is mobile – accessing our site from a tablet and his smartphone – I then promote the benefits of a cloud-based system that he can access from anywhere.

The “conversation” has changed from a “shot gun fire” approach to precision shooting with a rifle. We know what we are dealing with and we are able to aim accordingly. From the detailed information being gathered, the evolution of marketing is a weapon that should be harnessed and used daily rather than on a monthly basis due to a status report.

The post The Evolution of Marketing: A Digital Conversation appeared first on Making Data Meaningful.

Making Data Meaningful

Plugging In To Oracle 12c

Oracle 12C is over a year old now, so even firms wary of change are anxious to start taking advantage of its new features and capabilities. With the arrival of the first patch set release for in-memory enhancement, it’s time to utilize some of the most exciting concepts–Container Databases (CDB) and Pluggable Databases (PDB). Pluggable databases give administrators enormous flexibility in large and shared systems—to make best use of shared resources without sacrificing a database’s self-contained identity.

How does it work?

Let’s start from the bottom up, with the Pluggable database. PDBs are complete databases, with their own catalogs and tablespaces. Because they are self-contained, they can easily be moved or cloned onto different systems as needs change, or to propagate test systems quickly.

One or more PDBs are plugged into a Container database. The Container is a regular database, but its process structures are shared by the PDBs it holds—things like SGA and PGA, redo logs, etc. That means fewer processes to manage, fewer upgrades to think about, and better use of increasingly large pools of memory and processor strength.

Making it happen

There are lots of ways to plug, copy, and unplug, but creation from an existing database is easy too. In fact, it can be done in just a few commands. The following example shows how you create a pdb from seed:

SQL> alter system set PDB_FILE_NAME_CONVERT =’/u01/datafiles/pdbseed/’,’/u01/datafiles/Anuj/’ scope=both;

System altered.

SQL> CREATE PLUGGABLE DATABASE Anuj ADMIN USER PDB_Anj IDENTIFIED BY PDB_Anj default tablespace users datafile ‘/u01/datafiles/Anuj/users_01.dbf’ size      1000M;ROLES=(DBA);

Pluggable database created.

SQL> alter pluggable database Anuj open;

Now you’ve created a pluggable database. So what changes do you have to make to your code? None. Remember, a PDB is still a true database, not just a schema, despite the shared process and memory space. No application alterations are required now that it’s pluggable. Cloning to a new destination is just as simple with similar commands.

As always, there are implications to be aware of. Unsurprisingly, there are charges associated with additional databases in a single container. And, while PDBs sharing a container mean more applications per physical server, they also bind those applications together for startup, shutdown, and upgrade. For many, though, those tradeoffs are more than worth it for the ability clone to remote servers and maximize their efficiency.

Learn how Clear Measures helps you manage and utilize your Oracle Database.

CLEAR MEASURES has a decade of experience successfully working with Oracle in organizations of all sizes, including Fortune 1000 enterprises. Our dedicated team of Oracle DBAs are available and work around the clock, 24 × 7, ensuring our customers have Peace of Mind When IT Matters. Oracle’s rich features can require advanced skills and resources to operate. CLEAR MEASURES has these skills to make sure your enterprise will maximize the security, reliability, speed, and value of Oracle, accelerating the ROI of your data. Whether you are running Oracle 9i, 9.2, 10g, 10.2, 11g or 12c, CLEAR MEASURES is your source for Oracle expertise and operations. We have an entire team dedicated to remote Oracle database administrations ensuring 24 × 7 operation of your production environment.

Oracle Remote DBA Services

  • Oracle Database Monitoring
  • Oracle Routine Maintenance
  • Oracle Database Critical Patches Updates
  • Oracle Database Security & User Access
  • Oracle Database Backups & Recovery
  • Oracle Troubleshooting
  • Oracle Capacity Planning & Management

Oracle Project Services

  • Oracle Installations & Migrations
  • Oracle Upgrades
  • Oracle Database Architecture
  • Oracle Database Health Check
  • Oracle Database Performance Tuning
  • Oracle Database Audits
  • Learn More about Oracle Database Project Services

Oracle Specialized Services

Oracle databases have a variety of tools and add-ons that help you better manage and use your data. CLEAR MEASURES can help you support your entire Oracle infrastructure. Learn more about our Oracle Specialized Services below.

Other Databases Supported

What makes CLEAR MEASURES unique is our ability to manage databases across today’s diverse IT environment. Check out the other databases we support, in addition to Oracle. 

The post Plugging In To Oracle 12c appeared first on Making Data Meaningful.

Making Data Meaningful

Mobile Intelligence – The Rising Trend

We are coming into the golden age of analytics.  This is the vision that the speaker – CEO of a company that develops data visualization software – illuminated for the audience at a recent customer conference.  “We are putting the power of data into the hands of creative people to explore the worlds of possibility.”

The idea that we can use our talents to solve client data puzzles is like an adventure that makes it fun to come to work every day.  We’re explorers in an unknown land, moving from all the standard questions (and the standard answers) to a place where the questions themselves haven’t been formulated yet.  New thinking, contemporary sensibilities, and breakthrough technologies are disruptive factors in this age.

One area I pay attention to is the continued evolution of business intelligence (BI) in a mobile environment.  Mobile BI received a lot of fanfare in the past year, and all the major platforms promote a mobile solution.

Now that some of the hype is starting to settle down, what is the real story?  Here are a few thoughts based on my own observations and experience.

1. Mobile BI deployments will favor use on tablets, rather than smartphones, given current screen sizes.

It’s easy to refer to “mobile” like we refer to “Europe” as a single form factor or entity.  The reality is more complex, as there is a range of devices from the smart phone, with relatively small screen sizes, to tablets, with screens just a bit smaller than you’d see on a typical ultrabook laptop.  More screen space gives us more room to place data and provide interactivity.  I see organizations prioritizing tablet deployment over smartphone deployment in most cases.

2. Business gains so far have been incremental, providing efficiencies rather than game-changing breakthroughs.

Many mobile BI efforts seem to focus on converting the oodles of reports hanging around every corporate office into a mobile format.  That makes data more portable, which is an improvement.  But what we should seek to do is to make the analysis and decision-making that goes along with running a business happen in a portable way too – where you are, right now, as soon as information becomes available and action is needed.  This concept of “right-time mobile analytics” (not just mobile BI) is where I believe transformational gains will be found.

3. Design principles need to evolve to better anticipate the needs of mobile tablet users.

Most mobile BI efforts seem to be focused on cramming the dashboard or report that was designed for a desktop user into a mobile screen.  There are several problems with this approach.  First, a dizzying array of formats, resolutions, and screen sizes is present in the market.  It’s reminiscent of a challenge with website design, where you don’t know what kind of screen the user will have for their desktop.

Beyond screen size, you quickly discover that dashboards and interactive visualizations that are crammed onto a mobile device are balky to navigate when you substitute fine-point control of a mouse cursor with the more generalized notions of a tap or swipe.  Analysts and designers should invest the time to redesign existing visualizations and reports so that they can be easily and efficiently used with these new human interfaces.

4. Mobile BI tools do best at providing answers to known questions, rather than providing a platform for rich and interactive data exploration and discovery.

Largely due to the factors and challenges related to the interface, we have not seen a good mobile implementation of the interface needed to explore the data and design new visualizations.  Sleek interfaces that enable and facilitate data discovery, such as the forthcoming update to Tableau’s Desktop Professional software (v8), are getting there on the desktop, but are very limited on their mobile implementations.

I’m not sure we should even care, because this may be a square-peg-in-a-round-hole problem.  I don’t see the compelling need right now to port that capability to mobile, when most of the value in mobile will come from deploying effective, efficient visualizations to those who need better information to make right-time decisions, rather than enabling analysts to design new analyses on-the-go (and burning through their data plan in the process).

5. Standardization on a single mobile platform can significantly reduce development timelines.

This will help reduce complexity and allow you to dip your toe into the water with less up-front investment.  Fewer permutations of screen size, operating systems, and wireless carriers will reduce the time needed to deploy your solutions (little secret: this is one of the reasons that vendors originally delivered on Apple devices – you could predict how your software would look to the user!).

Apple, with its line of iOS devices, has the best track record in this area.  They have smartly positioned with a very small number of screen size variants across the iOS hardware platform.  And several studies have also shown that historically, users of iOS devices generated a significant majority of all mobile device traffic over any other platform.  Therefore, I believe that an engaged user base and a streamlined development lifecycle will favor adoption of mobile analytics solutions that operate on iOS devices.

6. Support the rollout of cellular-enabled mobile devices throughout the enterprise for knowledge workers that can best leverage mobile analytics applications.

Yes, this makes each tablet more expensive, and yes, cellular data plans are not free.  But neither is the lost time fumbling for the client guest intranet login, or roaming the highway off ramps looking for a coffee shop with free WiFi.  The investment from all of this valuable data and analytics applications will be reduced if your knowledge-based workforce cannot connect where they work, live, and travel.

Looking to the future, I believe we are now well positioned to generate competitive differentiation through mobile, BI-integrated, right-time analytical applications.  The growing maturity of mobile BI platforms, and their support for the little known-capability known as write back, has the potential to be a turning point for the field.  Write back in the context of BI gives the user both the ability to consume data through their visualization or BI application and to generate data that is put back into the database.  This is the next secret sauce.

The actual capability to do write backs has been around for a long time.  It’s even built into Excel and can be used with Microsoft Analysis Services OLAP cubes, if they are configured for this purpose.  It’s also a part of enterprise BI tools like MicroStrategy as well

This is a big deal, because now we can combine analytical data (and the processing power of real-time analytics engines) with information that is entered by a user, in context, on site, in the moment while they are working on a particular problem.

Let’s say you’re a medical supply sales representative, and you go on site to visit a hospital client.  Your mobile BI solution provides you with historical purchase patterns.  Then, you conduct an inventory check, inputting the data while you are standing in the supply room.  That information goes back to the database, and the purchasing models apply past history, seasonality, and metadata about trends at your other healthcare clients in that area (think: regional flu outbreak), generating a purchase forecast and preliminary order.  The solution also recommends a product change, from buying individual packages to large count bulk packs, which would save the customer $10,000 this year.  You review with the administrator, make a few adjustments, and you’re done.

This simple vignette may seem far off, but it isn’t a dream.  Right-time analytics applications can become critical competitive differentiators for current and future market leaders.   The complexity here is in gathering the data, understanding behavior, and building the analytical models that will help us optimize processes in our daily work.  It’s complex, but definitely within reach for those that are willing to invest the time and effort to see it through.

The post Mobile Intelligence – The Rising Trend appeared first on Making Data Meaningful.

Making Data Meaningful

Data Lifecycle Management

Data has a lifecycle. Every part of the lifecycle is important to a different type of user.  Every business will have their own rules for what is considered “hot”, “warm”, and “cold” data.

Being able to consistently meet these rules is a process that is built into every data warehouse effort. However, when these particular business rules change, how flexible is the architecture you have chosen at adapting to these changes? Let’s look at a scenario where the business rules may change, and how using a data vault architecture as the foundation of your data warehouse effort can mitigate risk, and facilitate the rapid deployment of new standards to meet your business goals.

An example, for an organization that has three primary applications: Sales, Service and Inventory management. The data for the Sales lifecycle is using a low-latency feed into a data vault. Service and Inventory have a nightly update process that populates the data vault. The data from the Sales system is organized such that it is on low-latency SSD drives, whereas the other two have their data stored on 15k spindles. The data vault produces multiple data marts.

The business rules under which this architecture was originally deployed stated that all data sourced from Sales less than two weeks old is stored on SSD drives.

Based on the evolution of events that happen in business, be that regulatory requirements, or wanting to review data sooner than what may have been accepted previously, it is decided to migrate all data from Servicing less than 6 weeks old onto SSD drives, for both the data vault as well as the data marts that are fed by the data vault.

What architectural changes are required to meet this new business goal?

Answer: 0

There is not an architectural change required to either the data vault or the data marts. The architecture of the data vault itself is resilient enough to deal with this type of change without requiring any modifications. There are some changes required at the physical layer wherein the data is stored. However, once this is done, the data vault structures, and rules that surround the loading of data into the data vault show you where the data is that needs to be migrated from one physical storage area to another.

Having a logical abstraction layer sitting above your physical storage, as well as breaking the data apart into the business keys, relationships between the keys, then contextual information describing either the relationship, or the keys themselves, allow you the most flexibility in adapting to new business rules that affect the performance of the integration layer of your data warehouse.

The post Data Lifecycle Management appeared first on Making Data Meaningful.

Making Data Meaningful

Agile Methodology to Create Business Intelligence Reports

Reporting is a major aspect of Business Intelligence (BI) and it’s possible to utilize the Agile Methodology when creating/editing reports. A report should always answer a particular question for the business: Who are the top customers by sales? How many accounts did we lose this year? What is our most profitable item by state? Etc. Without Agile BI, a report can be created based on a vague request by the users, who are not always sure about the true need of the business.

Agile methodology allows formulating that question to be SMART*, resulting in a report delivering the best possible answer. If the report fails to answer any question, then it is of no use and is yet another useless spreadsheet that someone will need to work on. Agile business intelligence helps us get away from this scenario by enabling the developers to interact with and give regular demos to the business quickly.

Just like a recipe needs changes to adhere to the taste of a particular chef, the same can be said about Agile BI. The basic rules remain the same but certain aspects of the methodology need to be tweaked when reporting is involved. In the case of BI reporting, the focus is not on a particular product but involves delivering various reports that are mostly independent of each other. The reports are deployed independently, so at the end of the process, reports answering that big question can be deployed into action.

Traditionally, business intelligence developers spend lot of time mapping data sources and creating pseudo designs before the start of development; and quite often the design has to be changed. Agile BI makes this process better by allowing developers the ability to break down the report into smaller solutions that can be deployed after approval. By breaking down a report into smaller solutions, the business receives insight into the capabilities of the tool used to create the report . It also helps to finalize requirements or scope of the report.

Author Experience: I have encountered multiple occasions where the business has completely revamped the requirements because their initial thought did not answer the question at hand. With agile business intelligence, I was able to deliver a report that satisfied all of their needs because of the constant communication that Agile BI provides. If this was waterfall, we would have spent months on the project before discovering the same result and changing the requirements over and over would have added additional time.

*SMART – Specific, Measurable, Attainable, Relevant and Time-Bound. 

The post Agile Methodology to Create Business Intelligence Reports appeared first on Making Data Meaningful.

Making Data Meaningful

Power to the People… the Business Intelligence way

When entering into any technology project, the team is typically is comprised of a blend of business and IT professionals. These can be from developers, analysts, business liaisons and business analysts; to the projects sponsors. When looking at a Business Intelligence project, this structure appears to be very similar on the surface. However, there is one component that changes drastically… the business involvement.

Business intelligence (BI) is an enterprise-level project focused on the analysis, distribution and contextual use of information for informed decision making. BI supports activities from the strategic level down to the operational level. When implemented in an organization, BI can drive the transformation of business knowledge from high level reporting to information-rich and integrated analysis.

By working with individual business units, IT will help identify the information needed to support and monitor business processes. The drive is to find the information needed to support business decision making, find where the data comes from and how is it used. Work alongside the business and ask why certain processes are the way they are and how can the process be made easier. Look at the underlying data, present this data to the business so they understand why processes are the way they are, and expose the anomalies within the data so processes can be improved across the organization. This data discovery is a critical piece of the process as it embraces the thoughts and ideas of the business units and exposes the data in a way businesses typically don’t see.

Working in an agile environment, IT can then have this data extracted and moved from operational data sources and transactional systems, into the BI environment. Business rules developed in the data discovery phase are then applied and data validation from the business units is needed to ensure data integrity for accuracy and completeness. Data Governance is implemented at this juncture, led by business unit leaders, to address not just the data anomalies, but also define business terminology.

The BI environment is loaded, rules applied, reports and dashboards are being generated. This is where the business really becomes the focal point. The business needs to understand how to work with the reports and dashboards and fully understand the information being presented. Feedback on report design and content are incorporated in an iterative fashion based on feedback. In depth training is then required for all levels within the organization so they have the knowledge what the reports are telling them, how it can impact their work, and what empowers them to make decisions. The ultimate goal is to enable the business to make informed decisions at all levels of the organization on a unified platform.

With the new BI knowledge, an organization will have the ability to gain insight as to departmental strengths and weaknesses, streamline processes, and grow profitability. The power of knowledge has thus shifted to the business units making information available for critical business decisions.

The post Power to the People… the Business Intelligence way appeared first on Making Data Meaningful.

Making Data Meaningful

What Makes Data Meaningful?

The URL for this site,, has presented a lot of meaningful commentary and information since its inception. Our mission has long been to provide its customers with the skills and strategy required to extract value from their data—from ETL and data visualization, to complex analytics designed to reveal new patterns in information. Today we redefine and extend that mission, even as we adopt an exciting new brand.

We have long enjoyed a close partnership with dbaDIRECT, the industry’s premier provider of remote infrastructure support services. Now, we rebrand together as MDM (Making Data Meaningful), with the unified goal of aiding customers in collecting, defining, analyzing and capitalizing on the data that runs their business, while simultaneously ensuring that the infrastructure required to meet that goal is available, recoverable, and functioning as designed. In short, MDM will offer a more complete understanding of meaningful data than ever before. After all, in order to be meaningful, data must first be:

  • Available

Availability begins with good design, but also requires a watchful eye and ready hands to address issues. It means keeping systems up and recoverable, and optimizing processes like data extraction and transformation. MDM designs with the end in mind, and offers complete support, 24X7.

  • Accurate

To be accurate, data must be complete. MDM helps unify its customers understanding of data across their environment.

  • Actionable

Like a coworker at a water cooler, data always tells a story. But the story needs to go somewhere. Meaningful data drives action; it fuels better decisions. MDM works to understand its client’s goals and power meaningful choices.

dbaDIRECT brings to “Making Data Meaningful” a toolset that helps customers balance the need for Capacity, Coverage, and Cost across their IT environment. We’ll also continue to offer business-focused data management methodologies designed to add value to your enterprise.

If you have been a our customer, your contacts and services haven’t change. Our new MDM brand simply adds exciting new services to the lineup you’ve already enjoyed. If you’re not yet a client, now is the perfect time to find out more about how your data—the data you are required to preserve, secure, and store—can be working for you; instead of the other way around.

The post What Makes Data Meaningful? appeared first on Making Data Meaningful.

Making Data Meaningful

BI, Mobile, and Cloud Technologies Drive IT Spending in 2013

CIO Magazine’s survey of industry leaders on IT spending makes a good starting point. “The majority of IT executives are expecting budgets in 2013 to increase over 2012. While only 23% expect their budgets to shrink.” The CIO Poll is consistent with Gartner’s spending forecast which shows overall growth to be low at 3-5%, with global IT spending over 3.6 trillion in 2013.

Top business priorities over the next year include overall revenue growth, exceeding customer expectations, attracting and keeping customers, and improving quality. IT alignment with the top business priorities is driving a shift in the ways that technology is transforming business processes. There will be IT categories actually experiencing modest growth, while other categories shrink.

The category at the top of the growth list is Mobile/Wireless. It’s no wonder, when the mobile device itself can be used as the device to execute a transaction, monitor delivery of products or services, promote sales, provide customer support, as well as control manufacturing processes on site and remotely. As technology shifts, most expect the expenditures on the Applications category to continue to grow to support the opportunities presented by the innovations.

Of course the increased mobility of the company workforce, in turn, promotes the movement towards cloud based access to the enterprise IT resources. The number of enterprise organizations planning to increase budgets for outsourced IT services – which includes cloud services – is the highest since April 2012.

One of the more interesting results of the CIO poll was the response to the question, In your opinion, how likely are organizations that have implemented tablets to adopt technologies such as cloud, social, and mobile solutions sooner than organizations that have not?” The surprise is not so much that 70% thought that this was a good indicator, but that this one question makes up 25% of the generally distributed version of the CIO Tech Poll: Economic Outlook.

The most popular phrase during next year’s networking events will be, “Say, do your employees get iPads?”

Adam Dennison, vice president/publisher of CIO, commented on the results of the poll. “While the research reveals budget increases, it is specific technology categories that are seeing growth. To meet business priorities and stay aligned with technology transformation, mobile/wireless, outsourced IT services and apps are receiving a larger percentage of the budget increases. Enterprises are in a race to grow revenue and exceed customer expectations, which can be met through technology innovation and implementation of mobile to enhance customer experience.”

Mobile and cloud technologies are driving IT from its traditional role as an information repository and processor and into the executive offices as communicator and decision support tool. IT services now reach outside the back office to touch the customer and supplier directly in ways that are just being dreamed of. The CIO Poll highlights the challenges and opportunities that IT now shares with the business it serves.

The post BI, Mobile, and Cloud Technologies Drive IT Spending in 2013 appeared first on Making Data Meaningful.

Making Data Meaningful

Day to Day Data: March Madness

Love it or hate it, the madness is upon us.  Every March, the country gets a healthy serving (or three) of College Basketball.  Each year, approximately 40 million people fill out brackets for the NCAA Men’s Basketball Tournament and each year, every single one of those people swears that they picked everything perfectly.  If you were about to Google, “What are the odds of completing a perfect bracket?” I will save you the trouble; it is 1 in 9.2 Quintillion.  If you were about to Google, “What on Earth is a Quintillion?” the answer is a 1 with 18 zeros behind it.  To put this in perspective, the odds of winning the Powerball are 1 in 175 Million. You have a better chance of winning the Powerball multiple times than picking that bracket correctly.

These however are just numbers, I began to wonder how I can slice and dice tournament history data.  Sure, I can find what teams have won the most or lost the most.  But can I dive even further, and find out what states, cities, or teams have the most wins or championships?  Which teams constantly underperform and which teams exceed expectations?

The Research

Using a data dump of NCAA Tournament History from 1939 to 2012 I was able to dive in very quickly and start seeing results.  I first wanted to see which states produced the most tournament victories.  Using Tableau I was able to visualize what the Top Ten states were in terms of victories.

Visual of the Winningest States

Using a filled map, I was able to visualize the amount of wins for the top ten states.  North Carolina and California are the top two states, no doubt fueled by the powerhouse schools of North Carolina, Duke, and UCLA. I wanted to go even further and see which cities brought the championships home for their respective states.  To create this visualization I used a dual axis map combining my filled map with a symbol map.

Visual of Winningest Cities within the Winningest States

Using this visualization you can see which cities allowed the states to appear on my first map.  Los Angeles and Lexington are homes to schools that have brought home the most national championships.  Instead of using strictly numbers and labels, I was able to represent their success using a “Circle” Symbol.  The bigger the symbol the more championships achieved.

I have a clear picture of what teams succeed, but how can I find out which teams succeed… Or don’t, when they are supposed to.  To do this, I needed to find out how many upsets occurred over the years.  Using the teams designated seeds at the beginning of the tournament I was able to determine every upset in tournament history.  I took this data and created visualizations for teams that get upset, and teams that create the upset.

Underachieving Teams Visual

Overachieving teams visual

I was able to utilize a stacked bar chart to visualize when teams were a higher seed if they were upset more often than not, and vice versa, if they were a lesser seed were they prone to upset their competitor.  The stacked bar also helped to show that while teams like Duke and North Carolina were upset the most, it was because they had the most opportunities to become upset.  The data above shows that Kansas is an overachieving team. 34 times out of 49 possibilities they upset their opponent in the tournament.

The Analysis

History shows that our top performing states are North Carolina, California, and Kentucky.  The cities that make those states successful are Lexington, Los Angeles, Chapel Hill, and Durham.  We can also see that teams such as Brigham Young, Pennsylvania and Utah State have a habit of underperforming in the tournament.  While teams such as Florida, Duke, and North Carolina, tend to over perform when they are the underdog.

The Conclusion

March Madness is an event loved by many, and the benefits of visualization allow me to recognize these findings very quickly. Imagine this type of data at your fingertips when you are filling out your bracket.  I certainly wish I would have used it to my advantage. Now, imagine these types of visualizations fueled by your company’s data. Replace the “wins” data with company revenue data.  You would be able to identify where you are successful, and then go further down to see what cities are producing that success. This allows a quick look at your business.  Use sales leads data to fuel your stacked bar charts.  See which of your offices is receiving/submitting leads and see how well they are closing them.  Data is powerful, but using visualization tools makes data meaningful.

The post Day to Day Data: March Madness appeared first on Making Data Meaningful.


September 28, 2018

Revolution Analytics

XKCD "Curve Fitting", in R

You probably saw this XKCD last week, which brought a grimace of recognition to statisticians everywhere: It's so realistic, that Barry Rowlingson was able to reproduce all but two of the "charts"...


September 26, 2018

Revolution Analytics

3-D shadow maps in R: the rayshader package

Data scientists often work with geographic data that needs to be visualized on a map, and sometimes the maps themselves are the data. The data is often located in two-dimensional space (latitude and...

Knoyd Blog

The 3 Do's & Dont's Of Hiring Your First Data Scientist

Whether you are from a startup or a big corporation, every company today has one problem in common. At some point, there is a need to develop your own analytic capabilities so you can leverage your data efficiently. While outsourcing and getting help from consultants can be a great way to get things off the ground, eventually it does make sense to have the dedicated people in-house. And voila, you are searching for your first Data Scientist.

“Ok, this is a task for human resources. How is it different from hiring anyone else?”

There actually are a bunch of differences. Three to be exact. The key differences between hiring your first ‘data person’ and hiring for any other role are:

  1. Multiple areas of expertise - you want your first hire to be a great generalist, skilled not only technically (programming, databases, machine learning theory, statistics), but to be an excellent communicator and have a strong feel for the business side of things.

  2. Lack of reliable tests - while you can test reasonably well, whether a developer can code, testing Data Science skills is much more difficult. Especially in your first data hire, you are searching not only for the quality of execution of ideas. You want someone, who can generate new ideas and evaluate their feasibility as well.

  3. Don’t know what to search for - most of the resources out there are aimed at hiring people to an already existing Data Science / Analytics teams. While these are useful and can help you avoid some common mistakes, they also assume that there is someone who can reliably evaluate the candidates (aka has similar knowledge).

“Got it! So what can I do?”


  1. Hire for skills & mindset - in more mature tech teams with clearly defined roles and responsibilities, hiring people who are really good at a really specific thing, makes a lot of sense. Your first Data Scientist has a much harder role though. This person needs to have business domain knowledge, be a good generalist and have great self-activation. The ability to spot opportunities, prioritise and execute without the guidance from the top is really important to get your team started.

  2. Focus on people skills - let’s be honest here, when hiring for technical positions, it is not uncommon to compromise in the area of soft skills. While this might work in a lot of scenarios, it is a bad compromise to make for your first Data Scientist. This person will spend a lot of time collecting buy-in, explaining the findings and convincing people why the results matter to the business. Great communication is key here.

  3. Get help! - There are plenty of outsourcing opportunities for HR to pick from out there. However, replacing your own recruiter with an external one having the same skill set will not make a lot of difference. Make sure you are trusting someone, who has the skills and track record of hiring for a first-time Data Scientist.

Follow this advice and you will leverage your data in no time. Here at Knoyd, we have helped companies to get their Data Science teams off the ground and know how hard it is to do it properly. Get in touch if you are looking for help:

  Get in touch

September 25, 2018

Revolution Analytics

R developer's guide to Azure

If you want to run R in the cloud, you can of course run it in a virtual machine in the cloud provider of your choice. And you can do that in Azure too. But Azure provides seven dedicated services...


September 24, 2018

Revolution Analytics

Applications of R presented at EARL London 2018

During the EARL (Enterprise Applications of the R Language) conference in London last week, the organizers asked me how I thought the conference had changed over the years. (This is the conference's...

InData Labs

How Face Recognition Technology Shapes the Future across Multiple Industries

The last decade has been a good one for face recognition technology. Although it has existed since the 1960s, any large-scale attempts to successfully implement it failed. The main reason being that it lacks precision and scalability. Everything changed in the late 2000s. In 2011, both the Pinellas County Sheriff’s Office and Panama’s Tocumen airport...

Запись How Face Recognition Technology Shapes the Future across Multiple Industries впервые появилась InData Labs.


September 21, 2018

Revolution Analytics

Because it's Friday: Fly Strong

I was about the same age as student pilot Maggie Taraska when I had my first solo flight. Unlike Maggie, I didn't have to deal with a busy airspace, or air traffic control, or engines (I was in a...


Forrester Blogs

Nike Scores A Customer-Values Touchdown

Unless you were on the Appalachian Trail for a few weeks, you know what I’m talking about. I’ll start with this paragraph from our newly published analysis of Nike’s “Just Do It” campaign featuring...


Forrester Blogs

Nike Scores A Customer-Values Touchdown

Unless you were on the Appalachian Trail for a few weeks, you know what I’m talking about. I’ll start with this paragraph from our newly published analysis of Nike’s “Just Do...


Forrester Blogs

Bad Bots Are Stealing Data And Ruining Customer Experience

Bad Bots Are Affecting Your Company More Than You Might Think Every online customer touchpoint — including websites, mobile apps, and APIs — is being attacked by bots. What are these bad bots doing?...


Forrester Blogs

Adobe Changes Its Marketing Cloud Trajectory With Marketo Acquisition

I am still catching my breath. Adobe has agreed to acquire Marketo for $4.75 billion. The deal is the biggest in Adobe’s history, and a massive encore to the acquisition of Magento for a mere $1.7...