European Patent Office | Service Industry

EPO – Maintenance of data acquisition and transformation processes

About the client

The European Patent Office (EPO) offers inventors a uniform application procedure which enables them to seek patent protection in up to 40 European countries.
The EPO is one of the leading international organisations in Europe, with some 7000 staff from over 30 different countries at five locations in four countries. Under the European Patent Convention inventors can obtain patent protection in up to 40 countries by filing a single application.

Infotel UK - Service Industry Case Study

Mission :

A critical upgrade of an ageing system to a cloud-based system, improving its data model, costs and performance, and managing synchronization of patents data between the EPO and US PTO.

KEY ACHIEVEMENTS

The system is able to allocate symbols at > 95% accuracy for the first 5 categorization levels in the allocation symbol system,
providing a fit-for-purpose automated system.

THE CHALLENGE

Infotel has been working with the European Patents Office (EPO) on a critical upgrade of its patent allocation system of which it has an in-depth knowledge. Developers at the Infotel Innovation Lab have extracted Cooperative Patent Classification data and used natural language processing to train and refine a deep learning model for automating the process of ‘reading’ patent abstracts to determine which symbols (of some 250,000) should be allocated per patent.

The training process involves a number of programming and
Assessment steps:
Load data > Pre-process and embed data with rules and weights >
Define model > Train > Evaluate
> Re-train and refine
The key is to iterate and refine, beginning as simply as possible (fewer records, minimal viable ‘success’), then adding complexity.

THE SOLUTION

Refine and ‘translate’ lexographic data for use in deep learning training
Work with data scientists to research and develop methods to train and automate, through assessment, adjustment, test and validation
Extract classification data to create deep learning datasets
Pre-process, format for machine learning, embed meanings, and apply weightings and convolution layers for training
Train, validate and test algorithm, rinse and repeat to refine
Apply deep learning to achieve fit-for-purpose automation

THE TECHNOLOGIES

Python programming using Jupyter notebook
Keras deep-learning framework, with Tensorflow, python programming
FastText (Facebook)
Wiktionary
GlOve library (Stanford University)

THE RESOURCES

1 Project Owner
1 Project Lead
3 Full-Stack Developers
3 Data Scientist

THE RESULTS

Patent categorization is a job done by expert examiners who read abstracts (concise summary of inventions in applications) and allocate defined symbols; the project goal is to develop an algorithm to automate the process of scanning patent abstracts and assigning allocation symbols, to flag discrepancies in the allocation system, with the potential for audit cost savings.
The goal of the project was achieved by providing a solution that extracting and processing CPC patent data consisting of abstracts and correctly allocated symbols from some 1 million records, to use as the basis for training, testing and refining deep-learning models, before using on unseen (un-allocated) data for testing.
Success Factor is the system is able to allocate symbols at > 95% accuracy for the first 5 categorization levels in the allocation symbol system, providing a fit-for-purpose automated system.

INFOTEL’S EXPERT SAYS :

«Machine learning is already being used in Smart cars, cancer research and facial recognition. As a developer, being able to work in this field with a car manufacturer and the EPO has been very satisfying and allowed me to see how essential machine learning is, and how useful it could be for the future».