High-Load Support AI Chatbot for an Online Fashion Store

Highlights

  • Delivered AI-Driven Chatbot for Customer Support: We leveraged a Lanchchain framework as a backbone for orchestrating the usage of LLMs (large language models) and data embeddings for question answering taks. 
  • Process 99% of Queries Under 10 Seconds: The chatbot deployed on the autoscaling infrastructure which makes it capable of processing waves of inquiries and providing quick responses for consistent availability and increased customer satisfaction.  

Client

Our client is a B2C eCommerce company operating in the fashion niche. The organization sells a wide range of clothes and accessories online for customers all over France. 

Country
Industry
Team Size:

Product

The product is an AI–powered chatbot for resolving customer queries in the online store. It is capable of processing high-load customer requests about the website, including navigation, purchasing methods, policies, and other information. Chatbot automates the responses to user queries and helps provide customer support, freeing up the human support team. The key components of the AI chatbot are:

  • Chatbot API connected to the web solution;
  • RAG (Retrieval Augmented Generation) database with automatic website parsers for online updates.
  • Deployment of the AI part on AWS, including the parsers, with autoscaling in mind.

Goals and objectives

  1. Automate Customer Support: We needed to reduce the number of requests handled by human agents, helping them to avoid receiving millions of emails. For that, we were required to ensure that the chatbot provided accurate automated responses. This approach was aimed at improving efficiency and reducing the workload on the support team.
  2. Make Chatbot Handle High Peak Loads: Our team needed to develop the product with scalability in mind, so that the chatbot would maintain high availability during critical times.
  3. Provide Well-Documented Integration Guidelines: We were required to ensure the client’s web development team can easily integrate, tune, and configure the chatbot.

Project challenge

  1. Create a Unified Data Format: Process the data from the website in a format that the chatbot can use. This involves properly parsing, gathering, formatting, and storing data in a vector database.
  2. Ensure Accuracy of Chatbot Responses: Fine-tune the chatbot to retrieve information from the vector database and answer user questions accurately. This required us creating an agent chatbot that uses appropriate tools and additional guardrails to ensure it provides correct and complete responses without hallucination.
  3. Scale the Chatbot: Deploy the chatbot to handle high peak loads, particularly during working hours and weekends. For that, our team needed to ensure the system could scale up and down when needed to maintain performance. 

Solution

As the first step, we started working on the chatbot’s architecture. We chose LangChain agent powered by Mistral LLM to serve as the foundation of the solution’s design. This component receives user queries and sifts through several data sources to craft the most relevant response. The data, including parsed website content stored in S3 buckets (AWS storage), is housed within a vector database built on PostgreSQL and deployed on AWS. This database automatically converts all information into a vector format, enabling fast retrieval for accurate responses.

To feed the chatbot with reliable and high-quality data, we implemented guardrails that check the trustworthiness of responses. Additionally, the data anonymisation was designed to be GDPR compliant.

Having established the chatbot’s logic and secured accurate data, we moved on to model selection. We chose:

  • Mistral LLM model within  AWS Bedrock for query understanding and efficient text requests processing;
  • AWS embeddings model for data embedding in the vector database.

These models were selected for their robust performance and compatibility with AWS infrastructure. We did not need to train the models from scratch. Instead, we used prompt engineering with pre-trained models. The data sources included pre-parsed website content stored in S3 buckets, which was used for creating the vector database.

After the models were ready, we created the main stack of the LangChain agent and then utilized a testing phase to tune the prompts and guardrails, collaborating with QA testers to refine the system. Our team used black-box testing methodologies, including a set of tests that utilized Mistral to evaluate language consistency and the quality of answers based on retrieved documents, known as the RAG Triad method.

Once ready, the entire solution was packaged as an API with configurations. We also provided extensive documentation to facilitate easy integration with the client’s existing web infrastructure.

Finally, the chatbot deployment was managed using EC2 instances with Kubernetes scripts to enable autoscaling, ensuring the system could handle high peak loads efficiently.

Tech Stack

  • Amazon AWS  Amazon AWS
  • AWS Bedrock AWS Bedrock
  • Kubernetes Kubernetes
  • FastAPI FastAPI
  • LangChain LangChain
  • Mistral LLM Mistral LLM

Process

  1. Business Analysis: We conducted the analysis of the data targeted and assessed how many users were asking questions to understand the volume and nature of the queries.
  2. Architecture Design and Implementation: We went for a LangChain agent to leverage a language model as it can dynamically choose the best course of action based on user input and context. It also uses natural language, offering better customer experience.
  3. Documentation and Integration Help: We created an API with configuration for easy integration with the existing eCommerce website and provided documentation for using this API.

Our results

The developed chatbot completely transformed customer service for our client. By utilizing a LangChain agent for user queries and interacting with various data sources as well as multiple pre-trained ML models for NLP and data embedding, we ensured accurate work of the chatbot even during peak hours. On top of that, the chatbot now performs with:

  1. Latency P99 <10s: The chatbot delivers responses within less than 10 seconds for 99% of user queries. 
  2. Uptime 100% During Peak Hours of 30 Requests Per Second: The system can be constantly available during peak usage times to handle a high volume of user queries without interruption.