Data Science – Page 19 – Amalgam Insights

Posted on April 16, 2019January 3, 2022 by Lynne Baer — 1 Comment

Enterprise Data World 2019: Data Science Will Take Over The World! … Eventually.

Amalgam Insights attended Enterprise Data World, a conference focused on data management, in late March. Though the conference tracks covered a wide variety of data practices, our primary interest was in the sessions on the AI and Machine Learning track. We came away with the impression that the data management world is starting to understand and support some of the challenges that organizations face when trying to get complex data initiatives off the ground, but that the learning process will continue to have growing pains.

Data Strategy Bootcamp

I began my time at Enterprise Data World with the Data Strategy Bootcamp on Monday. Often, organizations focus on getting smaller data projects done quickly in a tactical fashion at the expense of consciously developing their broader data strategy. The bootcamp addressed how to incorporate these “quick wins” into the bigger picture, and delved into the details of what a data strategy should include, and what does the process of building one look like. For people in data analytics and data scientist roles, understanding and contributing to your organization’s data strategy is important because well-documented and properly-managed data means data analysts and data scientists can spend more of their time doing analytics and building machine learning models. The “data scientists spend 80% of their time cleaning and preparing data” number continues to circulate without measurable improvement. To build a successful data strategy, organizations will need to identify business goals that are data-centric to align the organization’s data strategy with its business strategy, assess the organization’s maturity and capabilities across its data ecosystem, and determine long-term goals and “quick wins” that will provide measurable progress towards those goals.

Getting Started with Data Science, Machine Learning, and Artificial Intelligence Initiatives

Actually getting started on data science, machine learning, and artificial intelligence initiatives remains a point of confusion for many organizations looking to expand beyond the basic data analytics they’re currently doing. Both Kristin Serafin and Lizzie Westin of FINRA and Vinay Seth Mohta of Manifold led sessions discussing how to turn talk about machine learning and artificial intelligence into action in your organizations, and how to do so in a way that can scale up quickly. Key takeaways: your organization needs to understand its data to understand what questions it wants answered that require a machine learning approach; it needs to understand what tools are necessary to move forward; it needs to understand who already has pertinent data capabilities within the organization, and who is best positioned to improve their skills in the necessary manner; and you need to obtain buy-in from relevant stakeholders.

Data Job Roles

Data job roles were discussed in multiple sessions; I attended one from the perspective of how analytical jobs themselves are evolving, and one from the perspective of analytical career development. Despite the hype, not everyone is a data scientist, even if they may perform some tasks that are part of a data science pipeline! Data engineers are the difference between data scientists’ experiments sitting in silos and getting them into production where they can affect your company. Data analysts aren’t going anywhere – yet. (Though Michael Stonebraker, in his keynote Tuesday morning, stated that he believed data science would eventually replace BI, pending upskilling a sufficient number of data workers.) And data scientists spend 80% of their time doing data prep instead of building machine learning models; they’d like to do more of the latter, and because they’re an expensive asset, the business needs them to be doing less prep and more building as well.

By the same token, there are so many different specialties across the data environment, and the tool landscape is incredibly large. No one will know everything; even relatively low-level people will need to provide leadership in their particular roles to bridge the much-bemoaned gap between IT and Business. So how can data people do that? They’ll need to learn to talk about their initiatives and accomplishments in business terms – increasing revenue, decreasing cost, managing risk. By doing this, data strategy can be tied to business strategy, and this barrier to success can be surmounted.

Data Integration at Scale

Michael Stonebraker’s keynote highlighted the growing need for people with data science capabilities, but the real meat of his talk centered around how to support complex data science initiatives: doing data integration at scale. One example: General Electric’s procurement system problem. Clearly, the ideal number of procurement systems in any company is “one.” Given mergers and acquisitions, over time, GE had accumulated *75* procurement systems. They could save $100M if they could bring together all of these systems, with all of the information on the terms and conditions negotiated with each vendor via each of these systems. But this required a rather complex data integration process. Once that was done, the same process remained for dealing with their supplier databases, and their customer databases, and a whole host of other data. Machine learning can help with this – once there are sufficient people with machine learning skills to address these large problems. But doing data integration at scale will remain a significant challenge for enterprises for now, with machine learning skills being relatively costly and rare, data accumulation continuing to grow exponentially, and bringing in third-party data to supplement existing analyses..

Knowledge Graphs and Semantic AI

A number of sessions discussed knowledge graphs and their importance for supporting both data management and data science tasks. Knowledge graphs provide a “semantic” layer over standard relational databases – they prioritize documenting the relationships between entities, making it easier to understand how different parts of your organization’s data are interrelated. Because having a knowledge graph about your organization’s data provides natural-language context around data relationships, it can make machine learning models based on that data more “explainable” due to the additional human-legible information available for interpretation and understanding. Another example: if you’re trying to perform a search, most results rely on exact matches. Having a knowledge graph makes it simple to pull up “related” results based on the relationships documented in that knowledge graph.

Data Access, Control, and Usage

My big takeaway from Scott Taylor’s Data Architecture session: data should be a shared, centralized asset for your entire organization; it must be 1) accessible by its consumers 2) in the format they require 3) via the method they require 4) if they have permission to access it (security) 5) and they will use it in a way that abides by governance standards and laws. Data scientists care about this because they need data to do their job, and any hurdle in accessing usable data makes it more likely they’ll avoid using official methods to access the data. Nobody has three months to wait for a data requisition from IT’s data warehouses to be turned around anymore; instead, “I’ll just use this data copy on my desktop” – or more likely these days, in a cloud-hosted data silo. Making centralized access easy to use makes data users much more likely to comply with data usage and access policies, which helps secure data properly, govern its use appropriately, and prevent data silos from forming.

Digging a bit more into the security and governance aspects mentioned above, it’s surprisingly easy to identify individuals in a set of anonymized data. In separate presentations, Matt Vogt of Immuta demonstrated this with a dataset consisting of anonymized NYC taxi data, even as more and more information was redacted from it. Jeff Jonas of Senzing’s keynote took this further – as context accumulates around data, it gets easier to make inferences, even when your data is far from clean. With GDPR on the table, and CCPA coming into effect in nine months, how data workers can use data, ethically and legally, will shift, significantly affecting data workflows. Both the use of data and the results provided by black-box machine learning models will be challenged.

Recommendations

Data scientists and machine learning practitioners should familiarize themselves with the broader data management ecosystem. Said practitioners understand why dirty data is problematic, given that they spend most of their work hours cleaning that data so they can do the actual machine learning model-building, but there are numerous tools available to help with this process, and possibly obviate the need for a particular cleaning job that’s already been done once. As enterprise data catalogs become more common, this will prevent data scientists from spending hours on duplicative work when someone else has already cleaned the set they were planning to use and made it available for the organization’s use.

Data scientists and data science managers should also learn how to communicate the business value of their data initiatives when speaking to business stakeholders. From a technical point of view, making a model more accurate is an achievement in and of itself. But knowing what it means from a business standpoint builds understanding of what that improved accuracy or speed means for the business as a whole. Maybe your 1% improvement in model accuracy means you save your company tens of thousands of dollars by more accurately targeting potential customers who are ready to buy your product – that’s what will get the attention of your line-of-business partners.

Data science directors and Chief Data or Chief Analytics Officers should approach building their organization’s data strategy and culture with the long-term view in mind. Aligning your data strategy with the organization’s business strategy is crucial to your organization’s success. Rather than having both departments tugging on opposite ends of the rope going in different directions, develop an understanding of each others’ needs and capabilities and apply that knowledge to keep everyone focused on the same goal.

Chief Data Officers and Chief Analytics Officers should understand their organization’s capabilities by conducting an assessment both of their data capabilities and capacity available by individual, and to assess the general maturity in each data practice area (such as Master Data Management, Data Integration, Data Architecture, etc.). Knowing the availability of both technical and people-based resources is necessary to develop a scalable set of data processes for your organization with consistent results no matter who the data scientist or analyst is in charge of executing on the process for any given project.

As part of developing their organization’s data strategy, Chief Data Officers and Chief Analytics Officers must work with their legal department to develop rules and processes for accumulating, storing, accessing, and using data appropriately. As laws like GDPR and the California Privacy Act start being enforced, data access and usage will be much more scrutinized; companies not adhering to the letters of those laws will find themselves fined heavily. Data scientists and data science managers who are working on projects that involve sensitive or personal data should talk to their general counsel to ensure they remain on the right side of the law.

Posted on April 12, 2019January 3, 2022 by Tom Petrocelli — Leave a comment

Google Goes Corporate at Google Next

There’s no doubt that Google exists to make money. They make money by getting companies to buy their services. When it comes to selling ads on search engines, Google is number one. When it comes to their cloud business, Google is… well, number three.

I’m guessing that irks them a bit especially since they sit behind a company whose main business is selling whatever stuff people want to sell and a company that made its name in the first wave of PCs. Basically, a department store and a dinosaur are beating them at what should be their game.
Continue reading Google Goes Corporate at Google Next

Posted on April 8, 2019January 3, 2022 by Hyoun Park — Leave a comment

Accounting Tech Market Alert: FloQast Provides AI-Powered Transaction Matching to Accelerate the Financial Close

Practice: Accounting Tech

Key Stakeholders: Chief Financial Officers, Chief Accounting Officers, Vice President of Finance, Vice President of Accounting, Corporate Controllers, Financial Close Directors and Managers, Accounting Directors and Managers, Finance Directors and Managers, Accounts Payable Directors and Managers, Financial Analysts, Staff Accountants

Why This Matters: Transaction matching, a key function in the account reconciliation process, is one of the most time-consuming challenges for timely financial close. Amalgam Insights estimates that 80% of mid-market companies currently conduct transaction matching either manually or with only the assistance of ungoverned spreadsheets. Current transaction matching solutions are either limited in transaction scope or extremely challenging for mid-market organizations to implement in a cost-effective and time-efficient manner.

Key Takeaway: With FloQast Matching, mid-sized enterprises and organizations have access to a scalable and usable transaction matching solution that will significantly reduce time-to-close by eliminating painful manual reviews for the vast majority of transactions and reducing matching error rates.
Continue reading Accounting Tech Market Alert: FloQast Provides AI-Powered Transaction Matching to Accelerate the Financial Close

Posted on April 3, 2019January 3, 2022 by admin — 1 Comment

Todd Maddox Explains Why Extended Reality (xR) Technologies Will Disrupt Corporate L&D

Research Fellow Todd Maddox, Ph.D. has just published a new Analyst Insight: Leveraging Learning Science: Why Extended Reality (xR) is Poised to Disrupt Corporate Learning and Development.

In this Analyst Insight, Todd Maddox, Ph.D. provides guidance on why Augmented and Virtual Reality are set to disrupt corporate learning. This report focuses on a learning science evaluation of the potential for extended reality (xR) technologies to disrupt corporate L&D and show how xR technologies have the potential to improve the quality and quantity of training, to accelerate learning and enhance retention in all aspects of corporate learning to provide the following benefits: Continue reading Todd Maddox Explains Why Extended Reality (xR) Technologies Will Disrupt Corporate L&D

Posted on April 2, 2019January 3, 2022 by Lynne Baer — 1 Comment

Data Science and Machine Learning News Roundup, March 2019

On a monthly basis, I will be rounding up key news associated with the Data Science Platforms space for Amalgam Insights. Companies covered will include: Alteryx, Amazon, Anaconda, Cambridge Semantics, Cloudera, Databricks, Dataiku, DataRobot, Datawatch, Domino, Elastic, Google, H2O.ai, IBM, Immuta, Informatica, KNIME, MathWorks, Microsoft, Oracle, Paxata, RapidMiner, SAP, SAS, Tableau, Talend, Teradata, TIBCO, Trifacta, TROVE.

Dataiku Releases Version 5.1 in Anticipation of AI’s Surge in the Enterprise

Dataiku released version 5.1 of their software platform. This includes a GDPR framework for governance and control, as well as user-experience upgrades such as the ability to copy and reuse analytic workflows in new projects, coders being able to use their preferred development environment from within Dataiku, and easier navigation of complex analytics projects where data sources may number in the hundreds.

Being able to document when sensitive data is being used and prevent inappropriate use of such data is key for companies trying to work within GDPR and similar laws and not lose significant funds to violations of these laws. Dataiku’s inclusion of a governance component within its data science platform distinguishes it from its competitors, many of whom lack such a component natively, and enhances Dataiku’s attractiveness as a data science platform.

Domino Data Lab Platform Enhancements Improve Productivity of Data Science Teams Across the Entire Model Lifecycle

Domino announced three new capabilities for their data science platform. Datasets is a high-performance data store that will make it easier for data scientists to find, share, and reuse large data resources across multiple projects, saving time in the search process. Experiment Manager gives data science teams a system of record for ongoing experiments, making it easier to avoid unnecessary duplicate work. Activity Feed provides this type of information for data science leads to understand changes in any given project when they may be tracking multiple projects at once. Together, these three collaboration capabilities enhance Domino users’ ability to do data science in a documented, repeatable, and mature fashion.

SAS Announces $1 Billion Investment in Artificial Intelligence (AI)

SAS announced a $1B investment in AI across three key areas: Research and Development, education initiatives, and a Center of Excellence. The goal is to to enable SAS users to use AI to some degree even without a significant baseline of AI skills, to help SAS users improve their baseline AI skills through training, and to help organizations using SAS to bring AI projects into production more quickly with the help of AI experts as consultants. A significant percent of SAS users aren’t currently using SAS to perform complex machine learning and artificial intelligence tasks; helping these users to get actual SAS-based AI projects into production enhances SAS’ ability to sell its AI software.

NVIDIA-Related Announcements

H2O.ai and SAS both announced partnerships with NVIDIA this month. H2O.ai’s Driverless AI and H2O4GPU are now optimized for NVIDIA’s Data Science Workstations, and NVIDIA RAPIDS will be integrated into H2O as well. SAS disclosed future plans to expand NVIDIA GPU support across SAS Viya, and plan to use these GPUs and the CUDA-X AI acceleration library to support SAS’ AI software. Both H2O.ai and SAS are using NVIDIA’s GPUs and CUDA-X to make certain types of machine learning algorithms operate more quickly and efficiently.

These follow prior announcements about NVIDIA partnerships with IBM, Oracle, Anaconda, and MathWorks, reflecting NVIDIA’s importance in machine learning. With NVIDIA GPUs making up an estimated 70% of the world market share, data science and machine learning software programs and platforms need to be able to work well on the de facto default GPU.

Posted on April 2, 2019January 3, 2022 by admin

Tom Petrocelli Releases Groundbreaking Technical Guide on Service Mesh

On April 2, 2019, Amalgam Insights Research Fellow Tom Petrocelli published Technical Guide: A Service Mesh Primer, which serves as a vital starting point for technical architects and developer teams to understand the current trends in microservices and service mesh. This report provides enterprise architects, CTOs, and developer teams with the guidance they need to understand the microservices architecture, service mesh architecture, and OSI model context necessary to conceptualize service mesh technologies.

In this report, Amalgam Insights provides context in the following areas: Continue reading Tom Petrocelli Releases Groundbreaking Technical Guide on Service Mesh

Posted on March 26, 2019January 3, 2022 by Todd Maddox — 3 Comments

Why Extended Reality (xR) is Poised to Disrupt Corporate Learning and Development – Part IV: xR Behavioral Skills Applications, and Recommendations

Note: If you missed Parts I, II, and III of this blog series, catch up and read

This is part of a four-blog series exploring the psychology and brain science behind the potential for extended reality tools to disrupt corporate Learning & Development.

xR and Behavioral Skills Learning: Whereas hard skills learning involves knowing what to do, behavioral skills learning involve knowing how to do it. People (aka soft) skills, such as the ability to communicate, collaborate, and lead effectively, or to show empathy and to embrace diversity, are behavioral skills. Similarly, technical skills, such as the ability to learning how to use new software, to upskill to a new software release, or to use and maintain a piece of hardware or equipment, are behavioral skills. Continue reading Why Extended Reality (xR) is Poised to Disrupt Corporate Learning and Development – Part IV: xR Behavioral Skills Applications, and Recommendations

Posted on March 25, 2019January 3, 2022 by Tom Petrocelli — 1 Comment

Coming Attractions: Groundbreaking Service Mesh Research

In early January, I started researching the service mesh market. To oversimplify, a service mesh is a way of providing for the kind of network services necessary for enterprise applications deployed using a microservices architecture. Since most microservices architectures are being deployed within containers and, most often, managed and orchestrated using Kubernetes, service mesh technology will have a major impact on the adoption of these markets.

As I began writing the original paper, I quickly realized that an explanation of service mesh technology was necessary to understand the dynamic of the service mesh market. Creating a primer on service mesh and a market guide turned out to be too much for one paper. It was unbearably long. Subsequently, the paper was split into two papers, a Technical Guide and a Market Guide.

The Technical Guide is a quick primer on service mesh technology and how it is used to enhance microservices architectures, especially within the context of containers and Kubernetes. The Market Guide outlines the structure of the market for service mesh products and open source projects, discusses many of the major players, and talks to the current Istio versus Linkerd controversy. The latter is actually a non-issue that has taken on more importance than it should given the nascence of the market.

The Technical Guide will be released next week, just prior to Cloud Foundry Summit. Even though service mesh companies seem to be focused on Kubernetes, anytime there is a microservices architecture, there will be a service mesh. This is true for microservices implemented using Cloud Foundry containers.

The Market Guide will be published roughly a month later, before Red Hat Summit and KubeCon+CloudNative Summit Europe, which I will be attending. Most of the vendors discussed in the Market Guide will be in attendance at one or the other conference. Read the report before going so that you know who to talk to if you are attending these conferences.

A service mesh is a necessary part of emerging microservices architectures. These papers will hopefully get you started on your journey to deploying one.

Note: Vendors interested in leveraging this research for commercial usage are invited to contact Lisa Lincoln (lisa@amalgaminghts.com).

Posted on March 19, 2019January 3, 2022 by Todd Maddox — 2 Comments

Why Extended Reality (xR) is Poised to Disrupt Corporate Learning and Development – Part III: xR Hard Skills Applications

Note: If you missed Parts I and II of this blog series, catch up and read Part I: The Problem, and Part II: The Brain Science. This is part of a four-blog series exploring the psychology and brain science behind the potential for extended reality tools to disrupt corporate Learning & Development.

xR Applications in Corporate L&D

The key ingredient of xR technology in corporate L&D is the experiential and immersive nature of the technology that provides rich, coordinated contextual cues that lead to a sense of “presence”. You are either in a real-world experience augmented with information (Augmented Reality or AR), or you are transported into a new virtual world (Virtual Reality or VR). In both cases, experiential learning systems are engaged in synchrony with cognitive, behavioral, and emotional learning systems in the brain. I elaborate below. Continue reading Why Extended Reality (xR) is Poised to Disrupt Corporate Learning and Development – Part III: xR Hard Skills Applications

Posted on March 15, 2019January 3, 2022 by Tom Petrocelli — Leave a comment

Network Big Iron f5 Acquires Software Network Vendor NGINX

I woke up last Tuesday (March 12, 2019) to find an interesting announcement in my inbox. NGINX, the software networking company, well known for its NGINX web server/load balancer, was being acquired by f5. f5 is best known for its network appliances which implement network security, load balancing, etc. in data centers.

The deal was described as creating a way to “bridge NetOps to DevOps.” That’s a good way to characterize the value of this acquisition. Networking has begun to evolve, or perhaps devolve, from the data center into the container cluster. Network services that used to be the domain of centralized network devices, especially appliances, may be found in small footprint software that runs in containers, often in a Kubernetes pod. It’s not that centralized network resources don’t have a place – you wouldn’t be able to manage the infrastructure that container clusters run on without them. Instead, both network appliances and containerized network resources, such as a service mesh, will be present in microservices architectures. By combining both types of network capabilities, f5 will be able to sell a spectrum of network appliances and software tailored toward different types of architectures. This includes the emerging microservices architectures that are quickly becoming mainstream. With NGINX, f5 will be well positioned to meet the network needs of today and of the future.

The one odd thing about this acquisition is that f5 already has an in-house project, Aspen Mesh, to commercialize very similar software. Aspen Mesh sells an Istio/Envoy distribution that extends the base features of the open source software. There is considerable overlap between Aspen Mesh and NGINX, at least in terms of capabilities. Both provide software to enable a service mesh and provide services to virtual networks. ” Sure, NGINX has market share (and brain share) but $670M is a lot of money when you already have something in hand.

NGINX and f5 say that they see the products as complementary and will allow f5 to build a continuum of offerings for different needs and scale. In this regard, I would agree with them. Aspen Mesh and NGINX are addressing the same problems but in different ways. By combining NGINX with the Aspen Mesh, f5 can cover more of the market.

Given the vendor support of Istio/Envoy in the market, it’s hard to imagine f5 just dropping Aspen Mesh. At present, f5 plans to operate NGINX separately but that doesn’t mean they won’t combine NGINX with Aspen Mesh in the future. Some form of coexistence is necessary for f5 to leverage all the investments in both brands.

The open source governance question may be a problem. There is nervousness within the NGINX community about its future. NGINX is based on its own open source project, one not controlled by any other vendors. The worry is that the NGINX community run into the same issues that the Java and MySQL communities did after they were acquired by Oracle which included changes to licensing and issues over what constituted the open source software versus the enterprise, hence proprietary software. f5 will have to reassure the NGINX community or risk a fork of the project or, worse, the community jumping ship to other projects. For Oracle, that led to MariaDB and a new rival to MySQL.

NGINX will give f5 both opportunity and technology to address emerging architectures that their current product lines will not. Aspen Mesh will still need time to grow before it can grab the brain share and revenue that NGINX already has. For a mainstream networking company like f5, this acquisition gets them into the game more quickly, generates revenue immediately, and does so in a manner that is closer to their norm. This makes a lot of sense.

Now that the first acquisition has happened, the big question will be “who are the next sellers and the next buyers?” I would predict that we will see more deals like this one. We will have to wait and see.

Category: Data Science

Enterprise Data World 2019: Data Science Will Take Over The World! … Eventually.

Data Strategy Bootcamp

Getting Started with Data Science, Machine Learning, and Artificial Intelligence Initiatives

Data Job Roles

Data Integration at Scale

Knowledge Graphs and Semantic AI

Data Access, Control, and Usage

Recommendations

Google Goes Corporate at Google Next

Accounting Tech Market Alert: FloQast Provides AI-Powered Transaction Matching to Accelerate the Financial Close

Todd Maddox Explains Why Extended Reality (xR) Technologies Will Disrupt Corporate L&D

Data Science and Machine Learning News Roundup, March 2019

Dataiku Releases Version 5.1 in Anticipation of AI’s Surge in the Enterprise

Domino Data Lab Platform Enhancements Improve Productivity of Data Science Teams Across the Entire Model Lifecycle

SAS Announces $1 Billion Investment in Artificial Intelligence (AI)

NVIDIA-Related Announcements

Tom Petrocelli Releases Groundbreaking Technical Guide on Service Mesh

Why Extended Reality (xR) is Poised to Disrupt Corporate Learning and Development – Part IV: xR Behavioral Skills Applications, and Recommendations

Coming Attractions: Groundbreaking Service Mesh Research

Why Extended Reality (xR) is Poised to Disrupt Corporate Learning and Development – Part III: xR Hard Skills Applications

Network Big Iron f5 Acquires Software Network Vendor NGINX

Recent Posts

Categories