Skip to main content

Troubleshooting scenarios for Azure Open AI

· 8 min read

Troubleshooting Azure Open AI and API calls to OpenAI can be challenging, especially when you may not know where to start!

The idea of this article is to give you not only a place to start with some common scenarios you may run into but also a way of thinking - to help you troubleshoot. This is not a technical 'get-your-hands dirty, delve into those logs' type article.

I am a big fan of the KT, or Kepner-Tregoe problem analysis methodology, and I have used it in many scenarios throughout my career to help discover and test the root cause of various problems. So, we will use the base of this problem analysis methodology to help us troubleshoot the scenarios we will discuss in this article.

Kepner-Tregoe Method

❓ The Kepner-Tregoe (KT)

info

The Kepner-Tregoe (KT) Problem Management methodology incorporates elements such as:

  • "Is" (what we know to be true about the problem)
  • "Is Not" (what we know to be false or different from the problem)
  • "Could Be" (possible causes)
  • "Could Not Be" (what is not a possible cause)
  • "Distinctive Clarity" (what sets this problem apart from others)

and "Next Steps" (actions to further diagnose or solve the problem)

You may find that with some issues, we instinctively do a lot of this. Still, this method helps, gives you context, allows you to check any bias you may have in trying to find the root cause, and gives you some great tools to rule out and test any theories, given it has been a few years since I have been through the formal training. Still, it's one of those methodologies that have stuck with me, and I always keep it in mind when troubleshooting issues. Obviously, you can use this methodology for more than just troubleshooting OpenAI issues; this is the scenario we are going to cover today.

The key to successful troubleshooting, if any IT (Information Technology) issue - is having a clear problem statement, and in the 'real world' really concentrating on the one single problem statement to remain effective for this article; however, we will be covering a mix of common scenarios problems, to help give you high-level ideas and context when troubleshooting issues.

❗ Problem Statements

Today, we are going to look at the following statements:

Type of Problem StatementProblem Statement
Chunking Control"Inconsistent accuracy during high-volume transactions suggests that the chunking process is not fully controlled. The problem manifests where API consumption occurs and is especially prominent when handling complex inputs of varying sizes."
Token Limit Checks"Unexpected API call failures, which are confirmed to occur before the API request is made, indicate that the token limit pre-check may not be accurately estimating token sizes, particularly at the time of calling and in cases of complex requests."
Region Usage"Increased latency and occasional service disruptions are observed in a specific default region during peak usage times, suggesting that network latency or regional service performance may not be optimized for the workload, impacting certain regions more than others."
Model Versioning"Encoding and performance issues have arisen across all API endpoints following model or API updates, which are more pronounced in certain versions, indicating that using outdated model versions or incompatibilities between model versions and API might be the underlying cause."
Streaming Response"User experience issues with the streaming implementation on the front-end application during real-time interactions suggest that there might be backend streaming service limitations or insufficient front-end optimization, affecting certain user interactions."
Token Volume Expectancy"The system occasionally experiences overload or underperformance across all API endpoints during 24/7 operation, implying that token volume expectancy might not be accurately predicted or that the system's scaling and load balancing are not adequately configured."
Logging Practices"Issues in diagnostic effectiveness within the logging system arise during error occurrences, which may be due to incomplete logging data or incorrect logging configurations, affecting the resolution of problems by not capturing comprehensive data."
API Versioning"Persisting aborted issues and doubts regarding production readiness after using a preview API version suggest that stability might be compromised due to the continued use of a less stable preview version rather than the GA version across the API service."

🎯 IS and Is Nots

Let's look at each problem and work out various Is and Nots and potential causes at a high level.

Troubleshooting CriteriaIs (What is True)Is Not (What is False)Could Be (Possible Causes)Could Not Be (What is not a cause)Distinctive Clarity (What sets this apart)Next Steps
Control of ChunkingChunking process is implementedPerfect control over chunkingInadequate chunk size managementA problem with the API itselfHow chunking affects the accuracy of responsesEvaluate and refine the semantic chunking process
Token Limit CheckPre-call token limit check is in placeAlways under the token limitMisestimation of token sizeA network connectivity issueInstances when token limits are exceededImplement stricter checks and alerts for token limits
Region SelectionUsing a specific default regionRegion is the cause of all issuesNetwork latency or regional service disruptionModel versioning issuesThe impact of region selection on response timesTest performance in different regions; consider geo-redundancy
Model VersionUsing a specified model versionAll versions have the same performanceOutdated model causing issuesA problem with user inputDifferences in encoding and performance between versionsUpgrade to the latest GA or recommended version of the model
Streaming ResponseStreaming is being usedStreaming is flawlessStreaming implementation affecting user experienceToken limit issuesUser experience with streaming vs. non-streaming responsesOptimize streaming experience based on user feedback
Token VolumeExpected volume is knownThe system can handle any volumeInsufficient API rate limiting or scalingAn issue with the model's capabilitiesPeak token volume times or patternsPlan for scaling and load balancing based on expected volume
LoggingLogging practices are in placeAll necessary data is being loggedIncomplete or incorrect loggingAn issue with the model's accuracyThe level of detail and usefulness of logsEnhance logging for better diagnostics and problem resolution
API VersionCurrently using a non-GA API versionGA version has been testedUse of preview API versions causing issuesAn issue unrelated to the API versionThe stability and features of GA vs. preview API versionsTest and migrate to the GA version of the API
System Message ControlSystem messages are being usedFull control over system messagesSystem message limits not being respected by the modelAn issue with the API call structureHow system messages guide user interactionsEnsure system message limits are enforced and informative
System Message ComplianceChecks for model adherence to system messagesThe model always adheres to system message limitsOversights in system message enforcementAn issue with user expectationsInstances of non-compliance impacting user experienceRegularly verify model compliance with system message limits
Hallucinations and AccuracyOccurrences of hallucinations and inaccuraciesThe model is always accurateModel training or complexity of queriesAn issue with the input data qualitySpecific scenarios where inaccuracies are more frequentInvestigate model choice and implement LLM OPS for evaluation
Request and Timeout IssuesExperiencing request or timeout issuesAll requests are processed promptlyHigh server load, inadequate resources, or network latencyComplete system failureFrequency and conditions of timeouts and request failuresAnalyze server and network logs, monitor system performance, and optimize configuration

KT - Dimensions of a Problem

Although this is a very theoretical article, hopefully, it points you in the right direction and thinking when needing to troubleshoot your issues with Azure OpenAI. As a Consultant or Engineer, it can be very quick to jump straight into solutions without fully knowing the extent of the issue or the problem you are actually trying to solve.

🔗 References