Categorization of Suppliers with ChatGPT

Categorizing: the foundation for spend analysis

Why supplier categorization?

A thorough supplier categorization is the key to gaining valuable insights and optimizing your spend analysis.

The five crucial reasons why a thorough supplier categorization forms the basis for a successful spend analysis are:

  • Understanding spending patterns
  • Vendor dependency identification
  • Strengthening of negotiating position
  • Supply chain risk management
  • Performance review and supplier management

How good is ChatGPT in categorizing?

ChatGPT uses several large Language Models on the backend. To determine how well suppliers can be categorized, we compared the results of the free version (which works with GPT-3.5Turbo) with the model that is only accessible with the paid version (which also works with GTP-4). GPT-4 is the most advanced model here.

In our 2022 experiments, we have already seen promising results using GPT-3 to categorize suppliers. By leveraging the powerful API that OpenAI makes available to developers, we can interact directly with the various advanced language models that OpenAI offers.

To determine whether the latest and most advanced model, GPT-4, outperforms GPT-3.5 Turbo, we took up the gauntlet ourselves. We ran a test where we submitted our predefined list of vendors multiple times to both GPT-3.5 Turbo and GPT-4 and asked them to categorize them as they saw fit.

Results from our research

We compared the output of the models to the classifications we found on the internet to determine their similarity.

The results show a difference between GPT-3 Turbo and GPT-4. GPT-4 scores better than GPT-3.5 Turbo. At GPT-4, approximately 70% of the symilarity score is higher than 0.5. With GPT-3.5 Turbo, that percentage is 60%. For a good and useful categorization, the score must be above 0.5. The category may have a different name, but it will be correct. A score below 0.5 means the category is incorrect or unknown.

Supplier Classification Analysis: A Comparative Study of GPT-3.5 and GPT-4 for Procurement:

What's the meaning of this result?

Based on the results of this study, we can conclude that less suppliers are incorrectly classified when using GPT-4 than in the case of GPT-3.5 Turbo. By adding the categories generated by GPT4 to the spend data, the basis for a spend analysis is quickly laid. For both models, the categories with a score lower than 0.5 must be classified manually.

If you would like to know more about how categorization using GPT4 takes place, or if you have a list of suppliers that you would like to categorize using GPT-4, don't hesitate to contact us. We are happy to help you.