Posted on Leave a comment

How Red Hat Runs

This past week at Red Hat Summit 2019 (May 7 – 9 2019) has been exhausting. It’s not an overstatement to say that they run analysts ragged at their events, but that’s not why the conference made me tired. It was the sheer energy of the show, the kind of energy that keeps you running with no sleep for three days straight. That energy came from two sources – excitement and fear.

Two announcements, in particular, generated joy amongst the devoted Red Hat fans. The first was the announcement of Red Hat Enterprise Linux version 8, better known as RHEL8. RHEL is the granddaddy of all major Linux distributions for the data center. RHEL8, however, doesn’t seem all that old. As well as all the typical enhancements to the kernel and other parts of the distro, Red Hat has added two killer features to RHEL.

The first, the web console, is a real winner. It provides a secure browser-based system to manage all the features of Linux that one typically needs a command line on the server to perform. Now, using Telnet or SSH to log in to a remote box and do a few adjustments is no big deal when you have a small number of machines, physical or virtual, in a data center. When there are thousands of machines to care for, this is too cumbersome. With web console plus Red Hat Satellite, the same type of system maintenance is much more efficient. It even has a terminal built in if the command line is the only option. I predict that the web console will be an especially useful asset to new sysadmins who have yet to learn the intricacies of the Linux command line (or just don’t want to).

The new image builder is also going to be a big help for DevOps teams. Image builder uses a point and click interface to build images of software stacks, based on RHEL of course, that can be instantiated over and over. Creating consistent environments for developers and testing is a major pain for DevOps teams. The ability to quickly and easily create and deploy images will take away a major impediment to smooth DevOps pipelines.

The second announcement that gained a lot of attention was the impending GA of OpenShift 4 represents a major change in the Red Hat container platform. It incorporates all the container automation goodness that Red Hat acquired from CoreOS, especially the operator framework. Operators are key to automating container clusters, something that is desperately needed for large scale production clusters. While Kubernetes has added a lot of features to help with some automation tasks, such as autoscaling, that’s not nearly enough for managing clusters at hyperscale or across hybrid clouds. Operators are a step in that direction, especially as Red Hat makes it easier to use Operators.

Speaking of OpenShift, Satya Nadella, CEO of Microsoft appeared on the mainstage to help announce Azure Red Hat OpenShift. This would have been considered a mortal sin at pre-Nadella Microsoft and highlights the acceptance of Linux and open source at the Windows farm. Azure Red Hat OpenShift is an implementation of OpenShift as a native Azure service. This matters a lot to those serious about multi-cloud deployments. Software that is not a native service for a cloud service provider do not have the integrations for billing, management, and especially set up that native services do. That makes them second class citizens in the cloud ecosystem. Azure Red Hat OpenShift elevates the platform to first-class status in the Azure environment.

Now for the fear. Although Red Hat went to considerable lengths to address the “blue elephant in the room”, to the point of bringing Ginny Rometty, IBM CEO on stage, the unease around the acquisition by IBM was palpable amongst Red Hat customers. Many that I spoke to were clearly afraid that IBM would ruin Red Hat. Rometty, of course, insisted that was not the case, going so far as to say that she “didn’t spend $34B on Red Hat to destroy them.”

That was cold comfort to Red Hat partners and customers who have seen tech mergers start with the best intentions and end in disaster. Many attendees I spoke drew parallels with the Oracle acquisition of Sun. Sun was, in fact, the Red Hat of its time – innovative, nimble, and with fierce loyalists amongst the technical staff. While products created by Sun still exist today, especially Java and MySQL, the essence of Sun was ruined in the acquisition. That is a giant cloud hanging over the IBM-Red Hat deal. For all the advantages that this deal brings to both companies and the open source community, the potential for a train wreck exists and that is a source of angst in the Red Hat and open source world.

In 2019, Red Hat is looking good and may have a great future. Or it is on the brink of disaster. The path they will take now depends on IBM. If IBM leaves them alone, it may turn out to be an amazing deal and the capstone of Rometty and Jim Whitehurst’s careers. If IBM allows internal bureaucracy and politics to change the current plan for Red Hat, it will be Sun version 2. Otherwise, it is expected that Red Hat will continue to make open source enterprise-friendly and drive open source communities. That would be very nice indeed.

Posted on Leave a comment

Perspectives 2019: The 20th Anniversary of Skillsoft and Todd’s Top Takeaways!

On April 15 – 17, 2019, I attended Skillsoft’s Perspectives 2019 in Orlando, Florida. Last year was spectacular, and I was not sure if Skillsoft could outdo last year, but they did!

The conference opened with a keynote from the Executive Chairman, Ron Hovsepian, who reminded the audience that this was Skillsoft’s 20th Anniversary as a company. He also discussed the significant progress made in the last year including expansion of Skillsoft’s Aspire Learning Journeys in the Technology, Developer and Certification Solutions, mobile experiences for Skillsoft Compliance, expanded localization,  Business Skills development, and much more. Ron also defined an aggressive roadmap for the coming year. This was followed by two interesting panel discussions with customers and was topped off by a fascinating presentation by Daniel Pink on the influence of time and mood on judgment, decision-making and performance. I am writing this in the morning, while my “analytic” mind is at its best!

I attended a number of presentations, client panels and had several one-off conversations with customers. One-off conversations with customers are always my favorite. Customers don’t pull any punches! I had a number of one-on-one meetings with Skillsoft executives. I really enjoy the one-on-one meetings because I enjoy seeing the genuine passion and excitement that Skillsoft’s executives display. To me this is one of the major strengths of Skillsoft. Their leadership is “down to earth”, they are passionate about their mission, and they are eager to meet, and exceed, their goals. Thanks to Heide Abelli, Mike Hendrickson, Tara O’Sullivan, Norm Ford, and Mark Onisk for taking time out of their busy schedules to meet with me, and thanks to the ever diligent, Tom Francoeur for keeping the trains running on time.

There were many announcements at Perspectives, and I could provide a list in the blog, but instead, I would like to highlight the topics that I found most interesting. Acknowledging up front that these topics reflect my personal biases, here goes.

The Importance of People (aka Soft) Skills Continues to Grow and is Reflected in Skillsoft’s Offering

People (aka soft) skills are behavioral skills. They are about what we do, how we do it and our intent. It is one thing to know “what” to do, but something completely different (and mediated by distinct learning systems in the brain) to know “how” to do it. People skills include showing empathy, effective communication, listening, collaboration, embracing diversity, and being inclusive.

Simply put, Skillsoft “gets it”. Skillsoft has been following the workplace research showing the importance of people skills and are listening to their clients about the importance of people skills in their workplace. Skillsoft emphasizes people skills training in all of their content areas. Sure, people skills are critical in leadership development, and Skillsoft’s Leadership Development Program emphasizes people skills, but people skills are also central in Skillsoft’s Technology and Development offerings, in Compliance, and in Digital Skills to name a few. [As an aside, if you have not met Ken who is featured in Skillsoft’s harassment awareness training content, get a demo! The subtleties of harassment and the relevant people skills that you will learn from Ken and the rest are memorable.]

The modern workplace needs to develop the “T-shaped” employee. This employee has depth of knowledge (the vertical segment of the T), but also has breadth of knowledge (the horizontal piece). It is one thing to receive effective training and depth of knowledge in data science or DevOps. It is another to simultaneously receive people skills training on effective communication, collaboration or team building. This is a critically important combination that can make the difference between an organization with a positive workplace environment and an efficient software development lifecycle, and an organization that is dysfunctional. Skillsoft is committed to emphasizing people skills in all of their offerings.

A Partnership with IBM Watson and the Promise of Personalized Learning

Ron Hovsepian announced a partnership between Skillsoft and IBM Watson Talent. This is an exciting development and one that will be groundbreaking for a number of reasons. First, talent assessment, which has generally been restricted to employee recruitment, can add significant value in Learning and Development. In fact, I make this case in a recent report entitled “Assessment in Talent and Human Capital Management: A Psychological Science Evaluation” (available upon request). Talent assessment can facilitate the identification of strengths and weaknesses in a candidate and can be used to curate personalize learning paths.

Second, by leveraging the power of IBM Watson Talent, career paths can be recommended to employees in an objective manner. As those of you who have read Sheryl Sandberg’s “Lean In”, or know Brene Brown, and others work, you know that (on average) women are much less likely to pursue careers or apply for jobs if they believe that they fulfill 60% of the qualifications, whereas men are likely to pursue the career or job with the same qualifications. IBM Watson Talent can neutralize these biases by applying a fixed recommendation criterion regardless of gender. Women and men with equivalent skill sets will be “tapped on the shoulder” to pursue promotions or to pursue new career paths. This will level the playing field and democratize learning.

Finally, the “big data” that will be generated from the partnership between Skillsoft and IBM Watson Talent can be analyzed and explored in a number of ways. More than likely these data will suggest new and emerging career and learning paths that have not been considered. It will uncover overlap and relationships between career aspirations that were previously thought of as non-overlapping, but that may in actuality overlap in interesting ways. The Talent, Career and Learning landscapes that will emerge from analysis of these large data sets will be exciting to explore.

Skillsoft’s Aspire: Meeting Employee’s Desire for Continuous Learning and Employer’s Desire to Retain Talent

Skillsoft has been developing Aspire Learning Journeys at a rapid pace. Organizations desperately need to retain talent. Too often talented employees leave a job because they see no avenue for enhancing their capabilities. When talent leaves, employers must find new high-quality talent and start from scratch. Skillsoft’s Aspire Learning Paths address these needs by offering a sequenced path of instruction, training and credentials that allow employees to aspire to new heights and allow employers to keep talent.

If you are a data analyst with expertise in spreadsheets and typically work with siloed data sources, but want to aspire to become a data scientist, then Skillsoft’s Aspire is for you. If you are an employer with a talented data analyst in your organization and you want to make sure to keep them by suggesting that they “aspire” toward being a data scientist, then Skillsoft’s Aspire is for you. Aspire offers a combination of course, multimodal content, hands-on practice labs, and certification preparation and assessment that can take a learner along the journey from data analyst to data wrangler, to data ops and finally to data science. Aspire learning journeys in cybersecurity, cloud computing, software development, and many other areas have been developed, with many more in the works. With the cost of a University education skyrocketing, and employers finding value in upskilling their current employees, I fully expect Aspire to continue its grow in its reach.

Closing Remarks

Skillsoft is a leader in developing and delivering engaging learning content that drives business impact for businesses and organizations across the globe and in nearly every industry. Check it out yourself by requesting a demo of the Leadership Development program or the Harassment Training content. It is high-quality, well-designed, engaging and compelling. Content delivery is solid. Percipio’s multi-modal “watch”, “read”, “listen” delivery offers learners choice, and the 24/7 access on any platform approach is a must. The new “practice” offering in Percipio is an exciting addition as well. As with any offering more work could be done and optimized delivery that effectively engages the task appropriate learning systems in the brain is a work in progress. The ultimate goal of any L&D platform is to speed initial learning, enhance long-term retention, and prime the learner for behavior change. Although no platform meets all of these goals, Skillsoft is working hard and has their eye on the prize.

Skillsoft’s progress in the last year is impressive, and the roadmap for the coming year is ambitious. I look forward to following Skillsoft on its journey toward excellence, and to talking with the Skillsoft team over the coming months. Most importantly, I look forward to Perspective 2020. Perspectives 2019 will be hard to beat–then again, that is what I said last year.

Posted on

Amalgam Insights Publishes Highly Anticipated SmartList on Service Mesh and Microservices Management

Amalgam Insights has just published my highly anticipated SmartList Market Guide on Service Mesh. It is currently available this week at no cost as we prepare for KubeCon and CloudNativeCon Europe 2019 where I’ll be attending.

Before you go to the event, get prepared by catching up on the key strategies, trends, and vendors associated with microservices and service mesh. For instance, consider how the Service Mesh market is currently constructed.

To get a deep dive on this figure regarding the three key sectors of the Service Mesh market, gain insights describing the current State of the Market for service mesh, and learn where key vendors and products including Istio, Linkerd, A10, Amazon, Aspen Mesh, Buoyant, Google, Hashicorp, IBM, NGINX, Red Hat, Solo.io, Vamp, and more fit into today’s microservices management environment, download my report today.

Posted on

Data Science and Machine Learning News Roundup, April 2019

On a monthly basis, I will be rounding up key news associated with the Data Science Platforms space for Amalgam Insights. Companies covered will include: Alteryx, Amazon, Anaconda, Cambridge Semantics, Cloudera, Databricks, Dataiku, DataRobot, Datawatch, Domino, Elastic, Google, H2O.ai, IBM, Immuta, Informatica, KNIME, MathWorks, Microsoft, Oracle, Paxata, RapidMiner, SAP, SAS, Tableau, Talend, Teradata, TIBCO, Trifacta, TROVE.

Alteryx Acquires ClearStory Data to Accelerate Innovation in Data Science and Analytics

Alteryx acquired ClearStory Data, an analytics solution for complex and unstructured data with a focus on automating Big Data profiling, discovery, and data modeling.  This acquisition reflects Alteryx’s interest in expanding its native capabilities to include more in-house data visualization tools. ClearStory Data provides a visual focus on data prep, blending, and dashboarding with their Interactive Storyboards that partners with Alteryx’s ongoing augmentation of internal visualization capabilities throughout the workflow such as Visualytics.

Dataiku Announces the Release of Dataiku Lite Edition

Dataiku released two new versions of its machine learning platform, Dataiku Free and Dataiku Lite, targeted towards small and medium businesses. Dataiku Free will allow teams of up to three users to work together simultaneously; it is available both on-prem and on AWS and Azure. Dataiku Lite will provide support for Hadoop and job scheduling beyond the capabilities of Dataiku Free. Since Dataiku already partners with over 1000 small and medium businesses, creating versions of its existing platform more financially accessible to such organizations lowers a significant barrier to entry, and grooms smaller companies to grow their nascent data science practices within the Dataiku family.

DataRobot Celebrates One Billion Models Built on Its Cloud Platform

DataRobot announced that as of mid-April, its customers had built one billion models on its automatic machine learning program. Vice President of Product Management Phil Gurbacki noted that DataRobot customers build more than 2.5 million models per day. Given that the majority of models created are never successfully deployed – a common theme cited this month at both Enterprise Data World and at last week’s Open Data Science Conference – it seems likely that DataRobot customers don’t currently have one billion models operationalized. If the percentage of deployed models is significantly higher than the norm, though, this would certainly boost DataRobot in potential customers’ eyes, and serve to further legitimize AutoML software solutions as plausible options.

Microsoft, SAS, TIBCO Continue Investments in AI and Data Skills Training

Microsoft announced a new partnership with OpenClassrooms to train students for the AI job marketplace via online coursework and projects. Given an estimate that projects 30% of AI and data jobs will go unfilled by 2022, OpenClassrooms’ recruiting 1000 promising candidates seems like just the beginning of a much-needed effort to address the skills gap.

SAS provided more details on the AI education initiatives they announced last month. First, they launched SAS Viya for Learners, which will allow academic institutions to access SAS AI and machine learning tools for free. A new SAS machine learning course and two new Coursera courses will provide access to SAS Viya for Learners to those wanting to learn AI skills without being affiliated with a traditional academic institution. SAS also expanded on the new certifications they plan to offer: three SAS specialist certifications in machine learning, natural language and computer vision, and forecasting and optimization. Classroom and online options for pursuing both of these certifications will be available.

Meanwhile, TIBCO continued expanding its partnerships with educational institutions in Asia to broaden analytics knowledge in the region. Most recently, it has augmented its existing partnership with Singapore Polytechnic to train 1000 students in analytics and IoT skillsets by 2020. Other analytics education partnerships TIBCO has announced in the last year include Yuan Ze University in Taiwan, Asia Pacific University of Technology and Innovation in Malaysia, and BINUS University in Indonesia.

The big picture: existing data science degree programs and machine learning and AI bootcamps are not providing a large enough volume of highly-skilled job candidates quickly enough to fill many of these data-centric positions. Expect to hear more about additional educational efforts forthcoming from data science, machine learning, and AI vendors.

Posted on Leave a comment

Docker Enterprise 3.0 is the Docker We’ve Been Waiting For

For the past few years, one of the big questions in the software industry has been what direction Docker would take. Much of their unique intellectual property, such as Docker images, had been open sourced and many of their products have underperformed. Docker Swarm is an excellent example of a product that was too little too late. While loved by Docker customers I spoke with, Docker Swarm simply couldn’t surf the swell that is the Kubernetes wave. Continue reading Docker Enterprise 3.0 is the Docker We’ve Been Waiting For

Posted on Leave a comment

Quick AI Insights at #MSBuild in an Overstuffed Tech Event Week

We are in the midst of one of the most packed tech event weeks in recent memory. This week alone, Amalgam Insights is tracking *six* different events:

This means a lot of announcements this week that will be directly comparable. For instance, Google, Microsoft, Red Hat, SAP, and ServiceNow should all have a variety of meaty DevOps and platform access announcements. Google, Microsoft, SAP, and possibly IBM and ServiceNow should have interesting new AI announcements. ServiceNow and Red Hat will both undoubtedly be working to one-up each other when it comes to revolutionizing IT. We’ll be providing some insights and give you an idea of what to look forward to.

Continue reading Quick AI Insights at #MSBuild in an Overstuffed Tech Event Week

Posted on Leave a comment

How is Salesforce Taking on AI: a look at Einstein at Salesforce World Tour Boston

On April 3rd, Amalgam Insights attended Salesforce World Tour 2019 in Boston. Salesforce users may know this event as an opportunity to meet with their account managers and catch up with new functionalities and partners without having to fly to San Francisco and navigate through the colossus that is Dreamforce.

Salesforce also uses this tour as an opportunity to present analysts with the latest and greatest changes in their offerings. Amalgam Insights was interested both in learning more about Salesforce’s current positioning from a data perspective, including the vendor’s acquisition of Mulesoft as well as its progression in both the Einstein Analytics and Einstein Platform in providing value-added insights and artificial intelligence to Salesforce clients. Continue reading How is Salesforce Taking on AI: a look at Einstein at Salesforce World Tour Boston

Posted on Leave a comment

Zoho Is Moving to Austin!

I recently attended Zoholics 2019 in Austin, Texas. It was quite an event. The conference opened with the big news that was Zoho is moving its headquarters to Austin! This made headline news on the front page of the Austin American Statesman, and Austin mayor, Steve Adler offered words of excitement and encouragement during his Keynote address.

Zoho also announced two new product offerings. Zoho Commerce Plus offers a comprehensive E-commerce platform that provides an end-to-end solution for the commerce vertical. Zoho MarketingHub allows businesses to coordinate marketing with sales by integrating with a number of Zoho apps (e.g., Zoho CRM, Zoho Campaigns, etc), as well as other customer applications such as Facebook, Twitter, and LinkedIn.

Given my focus on talent management and learning and development, two Zoholics topics were of particular interest to me. One was an update on the success of Zoho University, and the second was the announcement that Zoho has a Learning Management System (LMS) currently in beta.

Zoho University

Rajendran Dandapani, Evangelist and Raconteur at Zoho gave an enthusiastic presentation on the success of Zoho University. Zoho University offers a “crusade against academic credentialism”. It was built on the philosophy that the majority of new Zoho employees did not find their 4-year degree useful in their job, the necessity for good employees, and the realization that Product Managers were frustrated by how little new employees appeared to learn in college. As a former University Professor, the word “ouch” comes to mind, but when I take a step back and think about it, there is merit in this crusade, especially in the software development industry.

One of the main advantages of Zoho University, is that students (and their families) do not incur debt during the education process. Instead, students are paid to attend Zoho University, not the other way around.

There are also a number of learning science—the marriage of psychology and neuroscience—advantages of the Zoho University approach to teaching software development. First, the emphasis is on “learning by doing”. Students spend the majority of their time in labs working on real-world software problems, and very little time in lectures. In the end, developing software solutions is more about trial and error and behavioral learning than it is about learning facts and figures. Learning by doing targets these behavioral learning centers in the brain directly. Second, the learning is in teams, is highly collaborative, and centers around solving specific, current, real-world problems. The software development industry is becoming more cross-functional and collaborative by the day. Given this fact, it is highly efficient to instill this way of thinking and approach to problem solving directly into the educational process from Day 1. Finally, when lectures are necessary a “flipped classroom” approach is utilized. The lecture material is provided using videotaped content and the classroom setting is reserved for discussion and hands on practice. This integration of knowledge acquisition and behavioral training is ideal for software development.

Rajendran also mentioned that Zoho University plans to expand its curriculum to include Technology, Design and Marketing. Finally, the new Austin Headquarters will double as a new Zoho University campus. As an Austin local, I believe that Zoho University will be highly coveted by students in the Austin Metro area.

Zoho LMS

In my individual meeting with Chandrashekar L S P (Zoho Evangelist) and Raja Ramasamy (Head of Product Management for Zoho People Plus) I was delighted to hear that Zoho is currently developing an LMS. This is exciting news and is one product that is currently lacking from the Zoho One suite. I hope to obtain a detailed briefing and to learn more about the LMS in the coming months. Stay tuned.

  1. Final Thoughts

I was impressed by the enthusiasm and loyalty of the Zoho users to the Zoho product line. Whether from organized customer and partner panels, or one-off happenstance conversations, the message that I heard was clear: Zoho users like the products, feel “heard” when they have a problem, and find the overall customer service experience to be outstanding.

I will continue to follow Zoho, with particular interest in Zoho University and Zoho’s upcoming LMS. It will be exciting to have Zoho’s headquarters “right down the block” so to speak.

 

Posted on 1 Comment

Enterprise Data World 2019: Data Science Will Take Over The World! … Eventually.

Amalgam Insights attended Enterprise Data World, a conference focused on data management, in late March. Though the conference tracks covered a wide variety of data practices, our primary interest was in the sessions on the AI and Machine Learning track. We came away with the impression that the data management world is starting to understand and support some of the challenges that organizations face when trying to get complex data initiatives off the ground, but that the learning process will continue to have growing pains.

Data Strategy Bootcamp

I began my time at Enterprise Data World with the Data Strategy Bootcamp on Monday. Often, organizations focus on getting smaller data projects done quickly in a tactical fashion at the expense of consciously developing their broader data strategy. The bootcamp addressed how to incorporate these “quick wins” into the bigger picture, and delved into the details of what a data strategy should include, and what does the process of building one look like. For people in data analytics and data scientist roles, understanding and contributing to your organization’s data strategy is important because well-documented and properly-managed data means data analysts and data scientists can spend more of their time doing analytics and building machine learning models. The “data scientists spend 80% of their time cleaning and preparing data” number continues to circulate without measurable improvement. To build a successful data strategy, organizations will need to identify business goals that are data-centric to align the organization’s data strategy with its business strategy, assess the organization’s maturity and capabilities across its data ecosystem, and determine long-term goals and “quick wins” that will provide measurable progress towards those goals.

Getting Started with Data Science, Machine Learning, and Artificial Intelligence Initiatives

Actually getting started on data science, machine learning, and artificial intelligence initiatives remains a point of confusion for many organizations looking to expand beyond the basic data analytics they’re currently doing. Both Kristin Serafin and Lizzie Westin of FINRA and Vinay Seth Mohta of Manifold led sessions discussing how to turn talk about machine learning and artificial intelligence into action in your organizations, and how to do so in a way that can scale up quickly. Key takeaways: your organization needs to understand its data to understand what questions it wants answered that require a machine learning approach; it needs to understand what tools are necessary to move forward; it needs to understand who already has pertinent data capabilities within the organization, and who is best positioned to improve their skills in the necessary manner; and you need to obtain buy-in from relevant stakeholders.

Data Job Roles

Data job roles were discussed in multiple sessions; I attended one from the perspective of how analytical jobs themselves are evolving, and one from the perspective of analytical career development. Despite the hype, not everyone is a data scientist, even if they may perform some tasks that are part of a data science pipeline! Data engineers are the difference between data scientists’ experiments sitting in silos and getting them into production where they can affect your company. Data analysts aren’t going anywhere – yet. (Though Michael Stonebraker, in his keynote Tuesday morning, stated that he believed data science would eventually replace BI, pending upskilling a sufficient number of data workers.) And data scientists spend 80% of their time doing data prep instead of building machine learning models; they’d like to do more of the latter, and because they’re an expensive asset, the business needs them to be doing less prep and more building as well.

By the same token, there are so many different specialties across the data environment, and the tool landscape is incredibly large. No one will know everything; even relatively low-level people will need to provide leadership in their particular roles to bridge the much-bemoaned gap between IT and Business. So how can data people do that? They’ll need to learn to talk about their initiatives and accomplishments in business terms – increasing revenue, decreasing cost, managing risk. By doing this, data strategy can be tied to business strategy, and this barrier to success can be surmounted.

Data Integration at Scale

Michael Stonebraker’s keynote highlighted the growing need for people with data science capabilities, but the real meat of his talk centered around how to support complex data science initiatives: doing data integration at scale. One example: General Electric’s procurement system problem. Clearly, the ideal number of procurement systems in any company is “one.” Given mergers and acquisitions, over time, GE had accumulated *75* procurement systems. They could save $100M if they could bring together all of these systems, with all of the information on the terms and conditions negotiated with each vendor via each of these systems. But this required a rather complex data integration process. Once that was done, the same process remained for dealing with their supplier databases, and their customer databases, and a whole host of other data. Machine learning can help with this – once there are sufficient people with machine learning skills to address these large problems. But doing data integration at scale will remain a significant challenge for enterprises for now, with machine learning skills being relatively costly and rare, data accumulation continuing to grow exponentially, and bringing in third-party data to supplement existing analyses..

Knowledge Graphs and Semantic AI

A number of sessions discussed knowledge graphs and their importance for supporting both data management and data science tasks. Knowledge graphs provide a “semantic” layer over standard relational databases – they prioritize documenting the relationships between entities, making it easier to understand how different parts of your organization’s data are interrelated. Because having a knowledge graph about your organization’s data provides natural-language context around data relationships, it can make machine learning models based on that data more “explainable” due to the additional human-legible information available for interpretation and understanding. Another example: if you’re trying to perform a search, most results rely on exact matches. Having a knowledge graph makes it simple to pull up “related” results based on the relationships documented in that knowledge graph.

Data Access, Control, and Usage

My big takeaway from Scott Taylor’s Data Architecture session: data should be a shared, centralized asset for your entire organization; it must be 1) accessible by its consumers 2) in the format they require 3) via the method they require 4) if they have permission to access it (security) 5) and they will use it in a way that abides by governance standards and laws. Data scientists care about this because they need data to do their job, and any hurdle in accessing usable data makes it more likely they’ll avoid using official methods to access the data. Nobody has three months to wait for a data requisition from IT’s data warehouses to be turned around anymore; instead, “I’ll just use this data copy on my desktop” – or more likely these days, in a cloud-hosted data silo. Making centralized access easy to use makes data users much more likely to comply with data usage and access policies, which helps secure data properly, govern its use appropriately, and prevent data silos from forming.

Digging a bit more into the security and governance aspects mentioned above, it’s surprisingly easy to identify individuals in a set of anonymized data. In separate presentations, Matt Vogt of Immuta demonstrated this with a dataset consisting of anonymized NYC taxi data, even as more and more information was redacted from it. Jeff Jonas of Senzing’s keynote took this further – as context accumulates around data, it gets easier to make inferences, even when your data is far from clean. With GDPR on the table, and CCPA coming into effect in nine months, how data workers can use data, ethically and legally, will shift, significantly affecting data workflows. Both the use of data and the results provided by black-box machine learning models will be challenged.

Recommendations

Data scientists and machine learning practitioners should familiarize themselves with the broader data management ecosystem. Said practitioners understand why dirty data is problematic, given that they spend most of their work hours cleaning that data so they can do the actual machine learning model-building, but there are numerous tools available to help with this process, and possibly obviate the need for a particular cleaning job that’s already been done once. As enterprise data catalogs become more common, this will prevent data scientists from spending hours on duplicative work when someone else has already cleaned the set they were planning to use and made it available for the organization’s use.

Data scientists and data science managers should also learn how to communicate the business value of their data initiatives when speaking to business stakeholders. From a technical point of view, making a model more accurate is an achievement in and of itself. But knowing what it means from a business standpoint builds understanding of what that improved accuracy or speed means for the business as a whole. Maybe your 1% improvement in model accuracy means you save your company tens of thousands of dollars by more accurately targeting potential customers who are ready to buy your product – that’s what will get the attention of your line-of-business partners.

Data science directors and Chief Data or Chief Analytics Officers should approach building their organization’s data strategy and culture with the long-term view in mind. Aligning your data strategy with the organization’s business strategy is crucial to your organization’s success. Rather than having both departments tugging on opposite ends of the rope going in different directions, develop an understanding of each others’ needs and capabilities and apply that knowledge to keep everyone focused on the same goal.

Chief Data Officers and Chief Analytics Officers should understand their organization’s capabilities by conducting an assessment both of their data capabilities and capacity available by individual, and to assess the general maturity in each data practice area (such as Master Data Management, Data Integration, Data Architecture, etc.). Knowing the availability of both technical and people-based resources is necessary to develop a scalable set of data processes for your organization with consistent results no matter who the data scientist or analyst is in charge of executing on the process for any given project.

As part of developing their organization’s data strategy, Chief Data Officers and Chief Analytics Officers must work with their legal department to develop rules and processes for accumulating, storing, accessing, and using data appropriately. As laws like GDPR and the California Privacy Act start being enforced, data access and usage will be much more scrutinized; companies not adhering to the letters of those laws will find themselves fined heavily. Data scientists and data science managers who are working on projects that involve sensitive or personal data should talk to their general counsel to ensure they remain on the right side of the law.

Posted on Leave a comment

Google Goes Corporate at Google Next

There’s no doubt that Google exists to make money. They make money by getting companies to buy their services. When it comes to selling ads on search engines, Google is number one. When it comes to their cloud business, Google is… well, number three.

I’m guessing that irks them a bit especially since they sit behind a company whose main business is selling whatever stuff people want to sell and a company that made its name in the first wave of PCs. Basically, a department store and a dinosaur are beating them at what should be their game.
Continue reading Google Goes Corporate at Google Next