Posted on

TWIET Episode 43

Welcome back to This Week in Enterprise Technology, Hyoun Park and Charles Araujo analyze the latest enterprise technology announcements and how they will affect your business and your bosses’ expectations.

Join TWIET as we guide CIOs and technical managers through the strategic ramifications behind the vendor hype, product innovation, and the avalanches of money going in and out of enterprise tech. As always, this podcast is available in audio, video, and broken up into sections for your benefit.

Audio Podcast: https://www.buzzsprout.com/2319034/episodes/16252199

This Week in Enterprise Technology, Hyoun Park and Charles Araujo critically assess last week’s biggest tech news:

  1. AWS Enhances Amazon Connect with Generative AI Tools
  2. AWS Takes on AI Hallucination Challenges
  3. AWS Bedrock Adds Multi-Agent Orchestration and Model Routing
  4. AWS Centralizes AI Efforts with SageMaker
  5. Casey Newton Examines AI Skepticism’s Comforts
  6. Emergence AI Coordinates Multi-Vendor Agents
  7. Exa Redefines Generative Search Experiences
  8. MLCommons Benchmarks LLM Output Risks
  9. South Korea’s Unrest Threatens Global Memory Supply
  10. Werner Vogels on Managing “Simplexity”
  11. Broadcom Adjusts to Minimize VMware Migration Risks

AWS Upgrades Amazon Connect with New Generative AI Features


Amazon Connect has been a successful cloud contact center product and contact center has been one of the clearest areas for AI to provide productivity benefits and increase potential revenue transactions,  AWS re:invent was an opportunity to announce the latest generative AI advancements within Connect. Charles and Hyoun discuss the opportunities for contact centers to adopt AI.

Source:
Maria Deutscher from Silicon Angle: https://siliconangle.com/2024/12/01/aws-upgrades-amazon-connect-new-generative-ai-features/ 


AWS Tackles AI Hallucinations

AWS launches Automated Reasoning checks to cross reference outputs with known facts and enterprise data. Although this is not as novel as AWS was stating, it is a valuable step forward. Hyoun and Charles debate the utility of this Automated Reasoning checks and whether AI hallucinations really matter or are just a sign of AI immaturity and inexperience. 

Source:

Kyle Wiggers on TechCrunch: https://techcrunch.com/2024/12/03/aws-new-service-tackles-ai-hallucinations/ 


AWS Bedrock Updates: Multi-Agent Collaboration, Model Routing

AWS announced interesting AI management updates for Amazon Bedrock. Both multi-agent management and prompt routing across models will be useful for enterprises seeking to expand the utility and cost structure of AI. Charles and Hyoun wonder if this agent management will cover the bill given the wide variety of agents that are starting to appear in the enterprise. . 

Source:

AWS: https://aws.amazon.com/blogs/aws/introducing-multi-agent-collaboration-capability-for-amazon-bedrock/ 


AWS Wraps Everything Together Under Sagemaker

AWS create a new umbrella brand that includes data studio, data lake, analytics, and data management. Hyoun and Charles argue about whether Sagemaker, best known as a data science tool, was the right umbrella brand for these data efforts.

Source:

AWS: https://aws.amazon.com/blogs/aws/introducing-the-next-generation-of-amazon-sagemaker-the-center-for-all-your-data-analytics-and-ai/ 


Casey Newton Examines AI Skepticism’s Comforts

One of TWIET’s favorite journalists, Casey Newton, weigh in on the false comfort of AI skepticism. Newton argues that the potential harm of AI is being underestimated by those who simply think that AI is full of lies or incompetent.  Charles and Hyoun discuss a more realistic path for IT departments to consider as they deploy AI.

Source:

Casey Newton on Platformer: https://www.platformer.news/ai-skeptics-gary-marcus-curve-conference/ 


Emergence AI Coordinates Multi-Vendor Agents

Start up Emergence AI announced its autonomous multi-agent AI orchestrator. At a time on every enterprise platform seems to be coming out with its own set of agents, Hyoun and Charles think it is about time for a third-party agent orchestration solution to hit the market and get some traction.

Source

Carl Franzen on VentureBeat: https://venturebeat.com/ai/emergences-ai-orchestrator-launches-to-do-what-big-tech-offerings-cant-play-well-with-others/ 


Exa Redefines Generative Search Experiences

The MIT Technology Review covered a startup named Exa taking a novel approach to Gen AI based web searches with the goal of using the web like a database. Charles and Hyoun discuss the scale and results for this approach.

Source:

Will Douglas Heaven on MIT Technology Review: https://www.technologyreview.com/2024/12/03/1107726/the-startup-trying-to-turn-the-web-into-a-database/ 


MLCommons Benchmarks LLM Output Risks

MLCommons has released its AIluminate 1.0 benchmarks to describe several categories of harm including sex crimes, violence, and defamation risks. Hyoun and Charles discuss past challenges regarding model benchmarking and risks. 

Source:

MLCommons: https://ailuminate.mlcommons.org/benchmarks/ 


South Korea’s Unrest Threatens Global Memory Supply

South Korea saw government unrest in an attempted military coup last week. Although we are not expert political scientists, international supply chains do affect our ability to source IT. We discussed the ramifications of South Korea earning 60% of the global memory, check market and considerations for the CIO in looking at geopolitical strife.

Source:

Prasanth Aby Thomas on CIO.com: https://www.cio.com/article/3617847/south-koreas-political-unrest-threatens-the-stability-of-global-tech-supply-chains.html 


Werner Vogels On Managing “Simplexity”

At Amazon re:invent, Amazon CTO pointed out both that complexity is inevitable and that there are two types of complexity that are important for technical audiences to consider, including a new concept of “simplexity”.. Hyoun is reminded of the Nassim Taleb concept of antifragility while Charles digs deeper into the strategic issues of technical debt. 

Source:

Tom Krazit on Runtime News: https://www.runtime.news/werner-vogels-complexity-is-inevitable/ 


Broadcom Adjusts to Minimize VMware Migration Risks

Broadcom has had to call back from its initial plans of making its top 2000 customers all direct and has handed much of that business back to its channels. With help from The  Register and Canalys, Hyoun and Charles discuss repercussions for tech sourcing. 

Source:

Simon Starwood on The Register: https://www.theregister.com/2024/12/05/vmware_user_migration_plans/ 

Posted on

8 Keys to Managing the Linguistic Copycats that are Large Language Models

Over the past year, Generative AI has taken the world by storm as a variety of large language models (LLMs) appeared to solve a wide variety of challenges based on basic language prompts and questions.

A partial list of market-leading LLMs currently available include:

Amazon Titan
Anthropic Claude
Cohere
Databricks Dolly
Google Bard, based on PaLM2
IBM Watsonx
Meta Llama
OpenAI’s GPT

The biggest question regarding all of these models is simple: how to get the most value out of them. And most users fail because they are unused to the most basic concept of a large language model: they are designed to be linguistic copycats.

As Andrej Karpathy of OpenAI stated earlier this year,

"The hottest new programming language is English."

And we all laughed at the concept for being clever as we started using tools like ChatGPT, but most of us did not take this seriously. If English really is being used as a programming language, what does this mean for the prompts that we use to request content and formatting?

I think we haven’t fully thought out what it means for English to be a programming language either in terms of how to “prompt” or ask the model how to do things correctly or how to think about the assumptions that an LLM has as a massive block of text that is otherwise disconnected from the real world and lacks the sensory input or broad-based access to new data that can allow it to “know” current language trends.

Here are 8 core language-based concepts to keep in mind when using LLMs or considering the use of LLMs to support business processes, automation, and relevant insights.

1) Language and linguistics tools are the relationships that define the quality of output: grammar, semantics, semiotics, taxonomies, and rhetorical flourishes. There is a big difference between asking for “write 200 words on Shakespeare” vs. “elucidate 200 words on the value of Shakespeare as a playwright, as a poet, and as a philosopher based on the perspective on Edmund Malone and the English traditions associated with blank verse and iambic pentameter as a preamble to introducing the Shakespeare Theatre Association.”

I have been a critic of the quality that LLMs provide from an output perspective, most recently in my perspective “Instant Mediocrity: A Business Guide to ChatGPT in the Enterprise.” https://amalgaminsights.com/2023/06/06/instant-mediocrity-a-business-guide-to-chatgpt-in-the-enterprise/. But I readily acknowledge that the outputs one can get from LLMs will improve. Expert context will provide better results than prompts that lack subject matter knowledge

2) Linguistic copycats are limited by the rules of language that are defined within their model. Asking linguistic copycats to provide language formats or usage that are not commonly used online or in formal writing will be a challenge. Poetic structures or textual formats referenced must reside within the knowledge of the texts that the model has seen. However, since Wikipedia is a source for most of these LLMs, a contextual foundation exists to reference many frequently used frameworks.

3) Linguistic copycats are limited by the frequency of vocabulary usage that they are trained on. It is challenging to get an LLM to use expert-level vocabulary or jargon to answer prompts because the LLM will typically settle for the most commonly used language associated with a topic rather than elevated or specific terms.

This propensity to choose the most common language associated with a topic makes it difficult for LLM-based content to sound unique or have specific rhetorical flourishes without significant work from the prompt writer.

4) Take a deep breath and work on this. Linguistic copycats respond to the scope, tone, and role mentioned in a prompt. A recent study found that, across a variety of LLM’s, the prompt that provided the best answer for solving a math problem and providing instructions was not a straightforward request such as “Let’s think step by step,” but “Take a deep breath and work on this problem step-by-step.”

Using a language-based perspective, this makes sense. The explanations of mathematical problems that include some language about relaxing or not stressing would likely be designed to be more thorough and make sure the reader was not being left behind at any step. The language used in a prompt should represent the type of response that the user is seeking.

5) Linguistic copycats only respond to the prompt and the associated prompt engineering, custom instructions, and retrieval data that they can access. It is easy to get carried away with the rapid creation of text that LLM’s provide and mistake this for something resembling consciousness, but the response being created is a combination of grammatical logic and the computational ability to take billions of parameters into account across possibly a million or more different documents. This ability to access relationships across 500 or more gigabytes of information is where LLMs do truly have an advantage over human beings.

6) Linguistic robots can only respond based on their underlying attention mechanisms that define their autocompletion and content creation responses. In other words, linguistic robots make judgment calls on which words are more important to focus on in a sentence or question and use that as the base of the reply.

For instance, in the sentence “The cat, who happens to be blue, sits in my shoe,” linguistic robots will focus on the subject “cat” as the most important part of this sentence. The cat “happens to be,” implies that this isn’t the most important trait. The cat is blue. The cat sits. The cat is in my shoe. The words include an internal rhyme and are fairly nonsensical. And then the next stage of this process is to autocomplete a response based on the context provided in the prompt.

7) Linguistic robots are limited by a token limit for inputs and outputs. Typically, a token is about four characters while the average English content word is about 6.5 characters (https://core.ac.uk/download/pdf/82753461.pdf). So, when an LLM talks about supporting 2048 tokens, that can be seen as about 1260 words, or about four pages of text, for concepts that require a lot of content. In general, think of a page of content as being about 500 tokens and a minute of discussion typically being around 200 tokens when one is trying to judge how much content is either being created or entered into an LLM.

8) Every language is dynamic and evolves over time. LLMs that provide good results today may provide significantly better or worse results tomorrow simply because language usage has changed or because there are significant changes in the sentiment of a word. For instance, the English language word “trump” in 2015 has a variety of political relationships and emotional associations that are now standard to language usage in 2023. Be aware of these changes across languages and time periods in making requests, as seemingly innocuous and commonly used words can quickly gain new meanings that may not be obvious, especially to non-native speakers.

Conclusion

The most important takeaway of the now-famous Karpathy quote is to take it seriously not only in terms of using English as a programming language to access structures and conceptual frameworks, but also to understand that there are many varied nuances built into the usage of the English language. LLM’s often incorporate these nuances even if those nuances haven’t been directly built into models, simply based on the repetition of linguistic, rhetorical, and symbolic language usage associated with specific topics.

From a practical perspective, this means that the more context and expertise provided in asking an LLM for information and expected outputs, the better the answer that will typically be provided. As one writes prompts for LLMs and seek the best possible response, Amalgam Insights recommends providing the following details in any prompt:

Tone, role, and format: This should include a sentence that shows, by example, the type of tone you want. It should explain who you are or who you are writing for. And it should provide a form or structure for the output (essay, poem, set of instructions, etc…). For example, “OK, let’s go slow and figure this out. I’m a data analyst with a lot of experience in SQL, but very little understanding of Python. Walk me through this so that I can explain this to a third grader.”

Topic, output, and length: Most prompts start with the topic or only include the topic. But it is important to also include perspective on the size of the output. Example, “I would like a step by step description of how to extract specific sections from a text file into a separate file. Each instruction should be relatively short and comprehensible to someone without formal coding experience.”

Frameworks and concepts to incorporate: This can include any commonly known process or structure that is documented, such as an Eisenhower Diagram, Porter’s Five Forces, or the Overton Window. As a simpe example, one could ask, “In describing each step, compare each step to the creation of a pizza, wherever possible.”

Combining these three sections together into a prompt should provide a response that is encouraging, relatively easy to understand, and compares the code to creating a pizza.

In adapting business processes based on LLMs to make information more readily available for employees and other stakeholders, be aware of these biases, foibles, and characteristics associated with prompts as your company explores this novel user interface and user experience.