European Patent Office | Service Industry

EPO – Maintenance of data acquisition and transformation processes

About the client

The European Patent Office (EPO) offers inventors a uniform application procedure which enables them to seek patent protection in up to 40 European countries.
The EPO is one of the leading international organisations in Europe, with some 7000 staff from over 30 different countries at five locations in four countries. Under the European Patent Convention inventors can obtain patent protection in up to 40 countries by filing a single application.

Mission :

A critical upgrade of an ageing system to a cloud-based system, improving its data model, costs and performance, and managing synchronization of patents data between the EPO and US PTO.


  • The system is able to allocate symbols at > 95% accuracy for the first 5 categorization levels in the allocation symbol system,
    providing a fit-for-purpose automated system.


Infotel has been working with the European Patents Office (EPO) on a critical upgrade of its patent allocation system of which it has an in-depth knowledge. Developers at the Infotel Innovation Lab have extracted Cooperative Patent Classification data and used natural language processing to train and refine a deep learning model for automating the process of ‘reading’ patent abstracts to determine which symbols (of some 250,000) should be allocated per patent.

  • The training process involves a number of programming and
    Assessment steps:
    Load data > Pre-process and embed data with rules and weights >
    Define model > Train > Evaluate
    > Re-train and refine
  • The key is to iterate and refine, beginning as simply as possible (fewer records, minimal viable ‘success’), then adding complexity.


  • Refine and ‘translate’ lexographic data for use in deep learning training
  • Work with data scientists to research and develop methods to train and automate, through assessment, adjustment, test and validation
  • Extract classification data to create deep learning datasets
  • Pre-process, format for machine learning, embed meanings, and apply weightings and convolution layers for training
  • Train, validate and test algorithm, rinse and repeat to refine
  • Apply deep learning to achieve fit-for-purpose automation


  • Python programming using Jupyter notebook
  • Keras deep-learning framework, with Tensorflow, python programming
  • FastText (Facebook)
  • Wiktionary
  • GlOve library (Stanford University)


  • 1 Project Owner
  • 1 Project Lead
  • 3 Full-Stack Developers
  • 3 Data Scientist


Patent categorization is a job done by expert examiners who read abstracts (concise summary of inventions in applications) and allocate defined symbols; the project goal is to develop an algorithm to automate the process of scanning patent abstracts and assigning allocation symbols, to flag discrepancies in the allocation system, with the potential for audit cost savings.
The goal of the project was achieved by providing a solution that extracting and processing CPC patent data consisting of abstracts and correctly allocated symbols from some 1 million records, to use as the basis for training, testing and refining deep-learning models, before using on unseen (un-allocated) data for testing.
Success Factor is the system is able to allocate symbols at > 95% accuracy for the first 5 categorization levels in the allocation symbol system, providing a fit-for-purpose automated system.


«Machine learning is already being used in Smart cars, cancer research and facial recognition. As a developer, being able to work in this field with a car manufacturer and the EPO has been very satisfying and allowed me to see how essential machine learning is, and how useful it could be for the future».