Leveraging AI/ML for Expanded Product Range and Enhanced eCommerce Product Listings 

Highlights

  • AI/ML for Coherent Product Listing: Leveraged cutting-edge AI/ML technology, including ChatGPT API and image recognition models, to accurately categorize products, clean up irrelevant data, and improve search functionality, resulting in streamlined operations and improved user engagement.
  • Added Over 1,000,000 Products from the Major Global Retailer: We extended the product range of the client’s inventory, with an efficient scraping of 300,000 products in just 3 hours, enabling the client to significantly expand their product line and cater to a broader customer base.
  • 29 to 97 Google PageSpeed Score Boost: Achieved a remarkable increase in loading speed with a Google PageSpeed score improvement from 29 to 97 (web) and from 12 to 90 (mobile), enhancing user experience and search engine rankings.

Client

Our client is an online retail business based in the United States, operating according to the dropshipping model. With a keen focus on efficiency and flexibility, our client has embraced the dynamic landscape of eCommerce to deliver clothing, electronics, beauty items, home supplies and others to customers at competitive prices across the nation.

Country
Industry
Team Size:

Product

The product is a B2C web solution, an online retail store providing access to a diverse selection of over 1,000,000 products. To achieve this extensive inventory, the website seamlessly integrated with TopDawg dropshipping products provider. This integration allows access to a vast array of products from an expansive wholesale catalog, meticulously curated from top dropshipping suppliers, totaling over 300,000  items.

Goals and objectives

Upon engaging with our company, the client’s primary objective was to ensure website stabilization to enhance the customer experience. To achieve this, we needed to perform:

  • Performance Optimization: Resolve loading speed issues, impacting user experience and search engine rankings.
  • SEO Enhancement: Overcome limited visibility in search engine results pages (SERPs) hindering organic traffic and growth potential.
  • UI/UX Improvements: Introduce enhancements to the website’s interface and navigation to facilitate a seamless browsing and shopping experience.

However, following successful collaboration, our scope of tasks expanded to:

  • Expanding Product Range: Create a tailored parsing solution to efficiently gather products from the major global online retailer’s website, allowing an inventory expansion to over one million items.
  • Improving Product Categorization: Overcome the challenge of product categorization discrepancies between the eCommerce website structure and the major global online retailer. 
  • Product Cleanup: Employ image-to-text and image captioning solutions to remove irrelevant text and images from product descriptions.
  • Autosuggestions: Implement autosuggestions, enhancing user experience and product discoverability.

Project challenge

  1. Product Categories Classification: Reconciling TopDowg’s categories with those of the major global online retailer proved challenging due to discrepancies. 
  2. Products Scrapping Scalability: web scraping thousands of products at a high speed.
  3. Product Cleanup: A significant number of items sourced from the major retailer contained irrelevant text in product names, descriptions, and images. 
  4. Autosuggestions: The inability to access search history on the eCommerce site hindered the implementation of autosuggestions. 

Solution

Performance and SEO Enhancement

Our focus on performance optimization encompassed a multifaceted approach aimed at enhancing usability and site responsiveness. This included strategies such as optimizing page loading speed, streamlining navigation pathways, and refining user interface elements to ensure a seamless browsing experience.

Some of the existing SQL queries took too long to execute even with the current volume of products and there would have been at least 10 times more products in the future. Therefore, we analyzed database queries and applied different optimization techniques: created multiple indexes, rewritten some queries and created materialized views achieving lower execution times than in the beginning with a larger amount of products.   

In parallel, our SEO strategy was meticulously crafted to boost visibility and search engine rankings. Through comprehensive keyword research, content optimization, and strategic backlinking efforts, we aimed to elevate customer’s products’ online presence. Additionally, we implemented measures such as cleaning up 404 pages, optimizing robot.txt files, and refining indexing practices to enhance search engine crawlability and indexing efficiency. By improving semantics and ensuring alignment with search engine algorithms, we aimed to position the products favorably in search engine results pages, driving organic traffic and fostering increased conversions.

UI/UX enhancements and the uniqueness of the product detail page provide significant attractiveness from SEO perspective.

Furthermore, we developed customized email forms tailored for marketing campaigns, enabling targeted communication with potential customers and fostering deeper engagement with the client’s brand.

Custom Parsing Solution

In response to the client’s ambitious goal of expanding their product range by over a million items sourced from the major online retailer, we embarked on creating a parsing solution. Despite extensive searches for pre-existing solutions, none could handle the scale required, peaking at 300,000 products. Thus, we devised a tailored approach, executed in three stages:

  • Product Collection: We initiated by gathering product data from a curated list of links.
  • Data Extraction: Each product’s vital information, including name, description, image, and tags, was meticulously extracted from the provided links.
  • Categorization: Leveraging the GhatGPT categorization capability, we seamlessly categorized the parsed products, ensuring efficient organization and accessibility.

The peak of our efforts was the staggering throughput of 300,000 items sourced within three hours, demonstrating the scalability, speed, and automation capabilities of our solution.

Enhanced Product Categorization with ChatGPT

In our quest for precise product categorization, we harnessed the capabilities of the ChatGPT API. Our first step involved crafting a tailored list of business-specific categories and subcategories, laying the groundwork for efficient organization. 

To enhance this categorization structure, we developed a script that worked along with ChatGPT, fine-tuned to classify products into specific categories. Subsequently, we submitted this classification along with over a million of product items, comprising names, descriptions, and metadata to the ChatGPT model.

By meticulously selecting prompts and optimizing parameters such as Temperature and top_p sampling, we optimized the model’s performance, achieving an impressive 90% accuracy in categorization. This strategic approach ensured that ChatGPT effectively assigned each product to its appropriate category with remarkable precision.

Product Cleanup with Tesseract OCR and HuggingFace Salesforce Blip Image Captioning Model

In our pursuit of excellent product listings sourced from the major retailer, we encountered challenges such as irrelevant text in product descriptions and names, as well as redundant content within product images. To rectify this, we implemented a comprehensive cleanup system designed to ensure product data integrity.

  • Description and Name Cleanup: Leveraging ChatGPT, we systematically identified and removed irrelevant text from both product descriptions and names. To make the cleanup more accurate, we additionally programmed the logic for curating lists of forbidden words and relevant keywords, meticulously removing irrelevant data while preserving essential product description. 
  • Image Cleanup: We used Tesseract OCR and a custom-programmed solution to extract text descriptions from product images, determining their relevance (e.g., containing details like size and dimensions) for inclusion on the client’s eCommerce site. In cases where images lacked text but provided visual product information, we employed the Salesforce Blip image captioning model to assess their business relevance. After cleaning up the images, we utilized ChatGPT to associate images with corresponding products, enabling seamless image-to-product mapping. 

Autosuggestions with Algolia and ChatGPT

Initially, our efforts to implement auto-suggestions were impeded by the lack of access to search history on the eCommerce site. So, we developed a logic wherein autosuggestions were based on keywords related to products and categories, prioritizing relevance. However, to enhance the effectiveness of autosuggestions, we devised a more sophisticated approach. This involved leveraging ChatGPT for search category suggestions and utilizing another major retailer website API for category suggestions.

Moreover, suggestions for products were generated by ChatGPT, powered by Algolia. Algolia enables developers to implement fast and relevant search experiences in their applications, offering a range of tools such as typo tolerance and faceted search.

Scalability

Our solution leverages map reduce algorithms and AWS Batch with Fargate Spot Fleet to run the web scrapping process in parallel on multiple server instances. To distribute processes such as categorization and cleanup, ensuring optimal performance and resource utilization even during peak loads, we used AWS Elastic Beanstalk and Java CompletableFuture parallelism capabilities.

Tech Stack

Front-end
  • HTML Front-end HTML
  •  CSS Front-end CSS
  • SAAS Front-end SAAS
  • Vue.js Front-end Vue.js
  • JQuery Front-end JQuery
Back-end
  • PHP Back-endPHP
  • Laravel Back-endLaravel
  • Botble CMS Back-endBotble CMS
  • MySQL Back-endMySQL
Online retail platform product scraper
  • Node.js Online retail platform product scraperNode.js
  • Puppeteer Online retail platform product scraperPuppeteer
ChatGPT categorization script
  • Java ChatGPT categorization scriptJava
  • Quarkus ChatGPT categorization scriptQuarkus
AWS Infrastructure
  • AWS Elastic Beanstalk AWS InfrastructureAWS Elastic Beanstalk
  • AWS RDS AWS InfrastructureAWS RDS
  • AWS EC2 AWS InfrastructureAWS EC2
  • AWS Lambda AWS InfrastructureAWS Lambda
  • AWS Batch AWS InfrastructureAWS Batch
  •  AWS Step Functions AWS Infrastructure AWS Step Functions
  • AWS Fargate AWS InfrastructureAWS Fargate
Third-Party Services
  • HuggingFace Third-Party ServicesHuggingFace
  • NewRelic Third-Party ServicesNewRelic
  • Sentry Third-Party ServicesSentry
  • OpenAI Third-Party ServicesOpenAI

Our results

We have successfully fulfilled the agreed-upon scope of work, achieving key objectives and milestones through meticulous planning, diligent execution, and a creative approach to leveraging AI/ML technologies. Our innovative methods enabled us to find new approaches to implement the required functionality effectively.

  1. 3x Product Expansion Enhanced by AI/ML: Leveraging AI/ML capabilities, we accomplished the monumental task of incorporating over 1,000,000 products from the major global retailer’s website into the client’s inventory. Utilizing advanced algorithms, we efficiently scraped 300,000 products in a mere 3 hours. 
  2. 3x Performance Boost: Achieved a remarkable increase in website loading speed, enhancing user experience and search engine rankings with a Google PageSpeed score improvement from 29 to 97 (web) and from 12 to 90 (mobile).
  3. 56% Traffic Increase: Within the following six months, our SEO initiatives resulted in a substantial surge in website traffic.
Next Project
Software Audit of One of the Biggest Libraries in the MENA Region 

Highlights Client The client’s company is as a cultural hub and symbolizes a shift in focus towards people, interactions,...

Explore Case