Secrets of Building a Successful Big Data Team

Note: an edited version this story appeared on the Hortonworks blog on November 10, 2017.

Having a dedicated big data team is critical to achieving successful outcomes with the data you’re collecting. This post will discuss the elements of building a successful big data team and how this team operates within your organization.

Step One: Achieve Executive Buy-In

Ask any business school professor or seasoned project manager what one has to do first and they will all say the same thing: obtain executive buy-in. The most fundamental part of any business transformation is buy-in and enrollment from the top. A top down approach, and big approvals, are a key step in building a big data story in your organizations. Unless this step is done, don’t pass go, don’t collect $200: work until you have executive approval.

Step Two: Set Up a Center of Excellence

Building a center of excellence, a shared employee facility that develops resources and best practices, is the next step in this process. This CoE could be as small as one person, or it could be as big as you want. The CoE’s members should comprise a subject matter expert from each impacted group across the organization in order to adequately represent the way the transformation is happening. In this way, not only have you gotten buy-in from the executive level to start the overall transformation, but you have participation across the organization so all affected departments can feel heard and acknowledged.

Part of building a proper and maximally effective center of excellence is to encourage the naysayers. Platitudes are fantastic, and open mindedness is a great thing to strive for, but the reality is that everyone has their own agendas and different original points of view. In building the CoE, you want cheerleaders as well as skeptics in order to keep debate alive and valuable. Ironically enough, the naysayers end up leading the CoE often; they become most passionate over time because once their objections have been overcome, they understand why the transformation is so important.

Step Three: Building the Team

Once your center of excellence is in place, the bulk of building your big data team lies in finding individual employees to flesh out the team. This is about 75% art, 25% science. From the science perspective, you’ll want to screen workers through the job description, background requirements, and appropriate thresholds of experience. You will want workers with 7-12 years of experience in IT and some exposure to production Linux; data warehouse skills and experience are a very big plus. Unfortunately, this won’t get you all the way there: the industry at this current point in time doesn’t have skill set in body of work to make it easy to find workers just on those limited merits. The current H-1B situation is actively contributing to the dearth of objectively qualified candidates. It is akin to trying to find a needle in a haystack.

This is where the 75% art part comes in: you build your team from folks with personality and passion who also come from relevant backgrounds. How many candidates you interview are both willing to learn Hadoop but also have the passion to do so? The interview and subsequent in-person conversations is where you will find this passion and sense of opportunity. These soft skills have to be found in a face to face interview, and your interview process should dig into what the candidate’s exact experience translates into. You may also want to consider pushing candidates into a live demo where they perform a real world task, and then discuss how they solved problems you stage. Many times candidates are unsuccessful at completing a demo, but the real key is, can they explain why?

You will also find the best candidates are either toward the beginner of their careers or toward the end, and not necessarily in the middle. For more experienced resources, there is a familiarity with the “wild west” that is Big Data these days as it bears resemblance to IT 20 years ago. Things weren’t integrated, and staff had to do the heavy lifting: build it, script it, troubleshoot it. That innate ability to self-start is an asset. For younger resources, they are quick to adopt tech that can build scripts and automate, which is also a useful skill.

Leaders of these teams should be neutral parties in the organization with no self-driving interest other than to help the company overall change. If a leader does end up being from the department that is funding the project, that interest will often eclipse the greater good. The ideal leader is an employee who is not fully integrated, almost a third party.

Finding Talent

Finding this talent is also a challenge. One way is through the Hortonworks University, a program that 12-15 colleges nationwide provide in order to establish Hadoop as a fully accredited part of their curriculum. Hortonworks pays the costs of the program incurred by the university so long as the school makes the courses part of the curriculum. You might also consider recruitment days at local universities, asking professors, who stands out? Who solves things?  Provide internship and trial opportunities for those names you receive.

Word of mouth is also a proven way to find candidates. The raw truth is that the Big Data community is a small enough group in world that if you happen to be really good at Hadoop, then people know who you are.

The Last Word

Ultimately, a big data transformation is an enablement opportunity to get your entire organization to go learn. Over time, we can all get stale, but this transformation can be a driver of learning, a place to get hands dirty with something new, and an opportunity to create new subject matter experts. Don’t be afraid to use this rubric to build a successful team.

Making Big Data More Accessible

An edited version of this article was published on the Hortonworks blog. Credit: Hortonworks

Big data, and storing and analyzing it, has the reputation of being a complex and confusing. However, because of the great insight that can be taken from big data, it’s important to allow data science to be more easily accessible.

How can business leaders lower the barrier to entry to make data science more accessible to everyone? I see four ways this is possible.

Invest in natural language capabilities. Math sometimes has the tendency to scare people away, so if your plan is for everyone to learn Python and reduce their questions to a formula, I have news: that is not going to bode well for a successful uptake. Instead, look for platforms that allow users to query data  sets using standard English questions, like “how many more sales did I make this year rather than last year?” or “What are the top five things my customers purchase along with Widget Model A?” This type of natural language processing makes accessing large quantities of data easier and more approachable, but it also has a side benefit of encouraging interest and curiosity in further analyzing data. As your users ask natural questions and receive straightforward answers, they think of more questions, which leads to more answers and more insight. Save the math and SQL-like queries for your developers and put a friendly front end on your data.

Make visualization an integral part of your analysis and presentation. You surely have heard the old phrase, “I’m a visual person.” Some people simply absorb and digest information better if it is presented visually. There are currently many platforms and frameworks on the market that can take a raw data query result and turn it into a rich graphical answer to a user’s question. Further, the more sophisticated of these platforms can allow you to interact with certain subsections of a visualization, such as exploding out a pie chart into smaller subpieces or layering statistics onto a map of a certain geographic location. Even heat maps and gradients can make statistics pop into life and really enhance your users’ understandings of exactly what the data is telling them. Visualizations can also help to enhance the encouragement to ask more questions and interrogate the data further by creating a rich, inviting atmosphere to discover new ways of thinking about data.

Smart small and then aim to grow your user base. Chances are, there is a vocal community of employees or contractors within your organization who are already clamoring to access new data science tools. Perhaps you have already harnessed their enthusiasm and enrolled them in a controlled pilot of your data deployment. If you have not, however, consider finding a group of individuals excited about the potential for big data and data science and invite them to preview your plans and test systems and provide very valuable feedback about what they like, do not like, and wish to see or need to have to augment their roles. After all, the best camp fires start with proper kindling; a data portal project is no exception to this rule. Beginning with a small but dedicated group of users is a tried and true method of getting a successful project off the ground before expanding it so quickly that it outgrows its capabilities.

Embrace the power of data on the go. Mobility has changed the world around us, with computers more powerful than the space shuttle available in pocket sized form off the shelf at Target for $80 or less. Many—I dare say most—professionals have smartphones or tablets, some either issued by or paid for by their employer. This means you have an audience that expects to be able to get answers on their mobile devices whenever and wherever they are. Your data science projects and deployments must be available in a mobile friendly format, either with responsive web design, customized native apps for the popular mobile device platforms, or perhaps some combination of both paths. That mobile app ought to also satisfy the points we have already established, including supporting natural language and providing rich visualization support with the capability to touch, drag, pinch, zoom, scroll, and more. The bottom line: in this day and age, you simply must make data accessible on mobile devices.