Skip to content
  • The Project
    • The Project
    • Pilots
      • Pilot #1: FaMLy – A powerful financial recommendation engine for families
      • Pilot #2: Embedding Finance Services in a Personalized Citizen Wallet
      • Pilot #3: Personalized Collaborative Intelligence for Enhancing EmFi Services
      • Pilot #4: The EU Funds Application Process Made Easy
      • Pilot #5: ESG Scorecard Ranking & Sustainable Portfolio Optimisation
      • Pilot #6: Embedding Climatic Predictions in Property Insurance Products
      • Pilot #7: Assessing the Quality and Monetary Value of Data Assets
    • Consortium
    • FAME Ecosystem
  • Marketplace
  • Resources
    • Deliverables
    • Scientific Publications and White Papers
    • Training Materials
    • Communication Kit
  • Newsroom
  • The Project
    • The Project
    • Pilots
      • Pilot #1: FaMLy – A powerful financial recommendation engine for families
      • Pilot #2: Embedding Finance Services in a Personalized Citizen Wallet
      • Pilot #3: Personalized Collaborative Intelligence for Enhancing EmFi Services
      • Pilot #4: The EU Funds Application Process Made Easy
      • Pilot #5: ESG Scorecard Ranking & Sustainable Portfolio Optimisation
      • Pilot #6: Embedding Climatic Predictions in Property Insurance Products
      • Pilot #7: Assessing the Quality and Monetary Value of Data Assets
    • Consortium
    • FAME Ecosystem
  • Marketplace
  • Resources
    • Deliverables
    • Scientific Publications and White Papers
    • Training Materials
    • Communication Kit
  • Newsroom
CONTACT US

Enhancing Query Understanding with a Hybrid Classification Model

August 27, 2025

Prev

As part of the ongoing development of the FAME marketplace, a key objective is to provide an intelligent search experience that helps future users discover relevant resources through simple, free-text queries.

To improve the search experience, FAME Search Engine uses filters that can be semantic interpreted such as:

  1. The developer or organization responsible for the asset.
  2. The type of the asset (e.g., model, dataset, documentation).

We have designed a hybrid classification system that interprets natural language queries and predicts these attributes, even when the phrasing is informal, abbreviated, or contains typos.

Figure 1 End-to-End Query Processing Pipeline for Metadata Inference

Preprocessing: Preparing the Query for Analysis

Before any semantic processing takes place, each user query undergoes a preprocessing step designed to improve input quality. This includes operations such as lowercasing, punctuation removal, and basic typo correction using rule-based and dictionary-based methods.

Semantic Embeddings: Capturing Meaning

Each query is first transformed into a semantic embedding using a pretrained language model. This process generates a numerical representation that reflects the meaning of the query, not just the words it contains. For example, the phrases “JOT performance predictor” and “predictive model by JOT-Internet Media” are mapped to similar vectors, enabling the system to recognize their shared intent.

Figure 2 Query vectorization by Embedding Model

A Hybrid Strategy: Matching and Classification

Once the query is embedded, the system compares it to known developer names and asset types using vector similarity. If a confident match is found, the metadata is inferred directly.
If the similarity score is below a defined threshold, the system relies on a supervised classification model. This fallback mechanism ensures robustness, especially for queries that are ambiguous.
By combining semantic matching with machine-learned classification, the hybrid model balances precision and adaptability.

Looking Ahead

The system will have the opportunity to evolve based on real user behaviour. By logging anonymized queries and model predictions, we plan to establish a feedback loop that enables periodic retraining of both the embedding-based similarity component and the classification models.

Retraining with production data ensures the service remains accurate, relevant, and responsive to the needs of its users.

 

Author: Raúl Encinas (Data Scientist in JOT Internet Media)

—

Subscribe to our newsletter for the latest updates, and follow FAME on LinkedIn and X to be part of the journey.

More info:
JOT

DISCLAIMER

​The FAME project has received funding from the European Union’s Horizon 2023 Research and Innovation Programe under grant agreement nª 101092639.

Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or Horizon Europe. Neither the European Union nor the granting authority can be held responsible for them.

EXPLORE

  • The Project
    • The Project
    • Pilots
      • Pilot #1: FaMLy – A powerful financial recommendation engine for families
      • Pilot #2: Embedding Finance Services in a Personalized Citizen Wallet
      • Pilot #3: Personalized Collaborative Intelligence for Enhancing EmFi Services
      • Pilot #4: The EU Funds Application Process Made Easy
      • Pilot #5: ESG Scorecard Ranking & Sustainable Portfolio Optimisation
      • Pilot #6: Embedding Climatic Predictions in Property Insurance Products
      • Pilot #7: Assessing the Quality and Monetary Value of Data Assets
    • Consortium
    • FAME Ecosystem
  • Marketplace
  • Resources
    • Deliverables
    • Scientific Publications and White Papers
    • Training Materials
    • Communication Kit
  • Newsroom
  • The Project
    • The Project
    • Pilots
      • Pilot #1: FaMLy – A powerful financial recommendation engine for families
      • Pilot #2: Embedding Finance Services in a Personalized Citizen Wallet
      • Pilot #3: Personalized Collaborative Intelligence for Enhancing EmFi Services
      • Pilot #4: The EU Funds Application Process Made Easy
      • Pilot #5: ESG Scorecard Ranking & Sustainable Portfolio Optimisation
      • Pilot #6: Embedding Climatic Predictions in Property Insurance Products
      • Pilot #7: Assessing the Quality and Monetary Value of Data Assets
    • Consortium
    • FAME Ecosystem
  • Marketplace
  • Resources
    • Deliverables
    • Scientific Publications and White Papers
    • Training Materials
    • Communication Kit
  • Newsroom

FOLLOW US

  • Join Newsletter
  • X
  • YouTube
  • LinkedIn
  • Zenodo
Copyright © 2025 FAME  | PRIVACY POLICY FOR FAME

Web design: Arysa

Manage cookie consent

To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies allows us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent may negatively affect certain features and functions.

Funcional Always active
Technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferencias
El almacenamiento o acceso técnico es necesario para la finalidad legítima de almacenar preferencias no solicitadas por el abonado o usuario.
Statistics
El almacenamiento o acceso técnico que es utilizado exclusivamente con fines estadísticos. Technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance by your Internet Service Provider, or additional records from a third party, information stored or retrieved solely for this purpose cannot be used to identify you.
Marketing
Technical storage or access is necessary to create user profiles to send advertising, or to track the user across one or multiple websites for similar marketing purposes.
Manage options Manage services Manage {vendor_count} vendors Read more about these purposes
View preferences
{title} {title} {title}