LogoTrust: a validated dataset of brands, domain names, and logos

Phishing is highly effective because attackers reproduce trusted visual identifiers, most notably brand logos. Whether impersonating Mastercard, La Banque Postale, or WhatsApp, adversaries rely on users' immediate visual recognition to create a false sense of legitimacy. The presence of a brand logo on a webpage is, therefore, insufficient to determine whether the website displaying the logo is associated with the brand.

In our recent paper, LogoTrust: Leveraging BIMI to Build a Validated Dataset of Brands, Domain Names, and Logos, we explore how Brand Indicators for Message Identification (BIMI) (currently, still a draft) can be used not just as an email feature, but as a foundation for a high-integrity dataset that is suitable for security-related applications like phishing detection. The key idea is that BIMI does not just "show a logo": it publishes, via DNS, pointers to both the logo and a Certificate, called a Mark Certificate (MC), that cryptographically links a logo to an organization's authorized domain names. By collecting these records at scale and validating the certificates, we can build a dataset that is far harder to poison than logos pulled from the open web.

While BIMI was originally designed to allow email clients to display verified logos for authenticated emails, we leverage the same domain name-to-logo bindings to support phishing website detection, where the key question is whether the domain name of a web page showing a brand logo is actually owned by the brand. The resulting dataset, LogoTrust, is publicly available at https://github.com/korlabsio/logo-trust.

Summary

  • We compiled a global-scale list of registered domain names from multiple sources, including ICANN CZDS zone files, passive DNS, Certificate Transparency logs, and theTranco list.
  • We performed an Internet-wide DNS scan to locate default BIMI DNS TXT records (default._bimi.<domain>).
  • We followed BIMI assertions to download both the SVG logos and the Mark Certificates (MCs) referenced in each record.
  • We validated the collected Mark Certificates and discarded logos with invalid ones to avoid impersonation attempts.

What is BIMI?

Brand Indicators for Message Identification (BIMI) is a recent security mechanism, introduced in 2021. It extends the DMARC email authentication standard by giving brands a way to display visual elements like a logo as an indicator of trust directly in the mailbox.

Figure 1 below shows an example email from PayPal. In the sender's section (the top), and right next to the sender's email, PayPal logo is displayed along with a blue checkmark as an indicator of authenticity. This was possible because three conditions were met: (i) the email client (Gmail) supports BIMI; (ii) the domain name appearing in the email's "RFC5322.From" header (paypal.fr) has DMARC configured and participates in BIMI; and (iii) the domain provides a valid logo-certificate pair.

Example of a PayPal email displaying a verified BIMI logo and blue checkmark, illustrating how BIMI adds visual trust indicators to authenticated emails.
Figure 1: Example email from PayPal with its BIMI logo and a blue checkmark.

Instead of relying solely on DMARC-related textual authentication cues, BIMI allows organizations with strong DMARC enforcement (specifically, a policy of quarantine or reject) to display their verified logo next to authenticated messages in supported email clients. In practice, whether (and how) the logo or checkmark is displayed ultimately depends on the receiving mailbox provider's policies. For example, Gmail displays the blue checkmark only for a specific type of Verified Mark Certificates or VMCs (more details later).

Behind the scenes, BIMI adds two important layers of verification. First, domain owners must submit their logo to a Certificate Authority (called: Mark Verifying Authority or MVA), which issues a special type of X.509 certificates called Mark Certificates (MCs)that cryptographically bind a logo to a domain, to prevent spoofing. Second, the domain publishes a TXT record in DNS that points to both the logo and the certificate. BIMI logos are also constrained (notably: a specific SVG profile and size limits), so they render consistently across email clients.

For example, querying the DNS TXT record of default._bimi.amazon.com returns:

v=BIMI1;l=https://d3frv9g52qce38.cloudfront.net/amazondefault/order_1152306678_logo.svg;a=https://d3frv9g52qce38.cloudfront.net/amazondefault/amazon_web_services_inc_2025.pem

This is a compact set of instructions that tells email clients exactly where to find the logo and the certificate proving the right to use the logo. With these components working together, BIMI turns brand identity into a visual security signal, making emails not just more recognizable but more trustworthy.

In the previous amazon.com example, the BIMI TXT record has three fields: the v tag indicating the version of BIMI, the l tag containing a URL leading to the logo image and finally the a tag specifying a URL where the Mark Certificate can be retrieved. The Figure 2 below depicts the main verification steps that an email client performs. These include authenticating an email using DMARC, finding out which visual indicator the domain owner intends to display, and making sure they are allowed to display it thanks to the certificate.

Diagram showing BIMI verification steps, including DMARC authentication, DNS BIMI record lookup, logo retrieval, and mark certificate validation by email clients.
Figure 2: BIMI verification steps to be performed by email clients.

There are two types of Mark Certificates: Verified Mark Certificates (VMCs), which are issued to registered trademarks, and government marks (official governmental entities), and Common Mark Certificates (CMCs). Unlike VMCs, CMCs do not require the logo to be trademarked, making them more accessible to a wider range of organizations. These can be Prior Use Marks, which recognize logos with established commercial use but not formally registered, and Modified Registered Marks, which are adapted versions of existing registered trademarks for broader use.

Why use BIMI?

Existing logo datasets mostly rely on web scraping or manual labeling, making it prone to noise and mistakes. By contrast, the BIMI approach links logos to domain names via cryptographic certificates issued by trusted Mark Verifying Authorities - creating an auditable link between the brand identity, domain names, and logos. This makes the resulting dataset particularly trustworthy for applications where false positives are critical, such as logo-based phishing detection and domain name verification. It also makes the dataset harder to poison, because the attacker cannot simply upload or edit a logo online and have it “accepted” without a valid certificate binding it to the domain.

Methodology

In our paper, we first combined domain names from several complementary Internet-wide sources (CZDS zone files, passive DNS, CT logs, and Tranco) to support an Internet-wide scan for BIMI records. Then, we queried each domain for default._bimi.<domain> to collect BIMI assertion records and download the referenced SVG logo and Mark Certificate. We ran these lookups at scale using zdns, extending prior BIMI measurement efforts to a global scan of the domain population rather than focusing only on top lists or small samples. We then validated MCs to filter out impersonation and misconfiguration, aggregated organization names from the certificate common names to build brand - domain - logo mappings, and deduplicated logos using SHA-256 hashes (preferring it over pHash due to false positives).

Key findings

We discovered 55,650 logos and 5,430 certificates in the wild. Having eliminated misconfigured BIMI records, invalid Mark Certificates, and deduplicated logos, the dataset distilled down to a high-integrity set of:

  • 2,821 domain names
  • 1,811 distinct, verified logos
  • 1,680 brands

This filtering step is necessary, as some domains in the wild advertise logos they do not own in BIMI TXT records - exactly the kind of tampering that would poison a scraped dataset.

The Figure 3 below illustrates some cases of impersonation where domain names advertised BIMI DNS TXT records with logo fields (i.e. l tag) containing URLs that lead to logos of brands like Apple, Microsoft, Meta and PayPal.

Examples of logo impersonation where domains falsely advertise BIMI records pointing to logos of major brands such as Apple, Microsoft, Meta, and PayPal.
Figure 3: Examples of logo impersonation.

Verified Mark Certificates constituted the majority of valid Mark Certificates, roughly 95.4%, while the remaining 4.6% were Common Mark Certificates. Most of the VMCs were Registered Marks (2,682), and as few as 9 Government Marks. The CMCs were 18 Prior Use Marks and 12 Modified Registered Marks.

The graph below (Figure 4) shows the distribution of logos and domain names of the leading 30 brands, ordered by descending domain count. The vast majority of brands (94.22%) use a single BIMI logo, even when operating across multiple domains. For example, IKEA uses a single logo across more than 50 domains. Overall, the results suggest that despite BIMI's still-early adoption, our methodology achieves meaningful brand coverage while also capturing each brand's verified domain footprint (e.g., PayPal and Rakuten each span more than 30 domains).

Bar chart showing the number of unique domain names and BIMI logos per brand, highlighting how most brands use a single verified logo across multiple domains.
Figure 4: Number of unique domain names and logos per common name.

Practical impact for phishing detection

Logo-based phishing detection typically works in two steps: (i) detect and recognize a brand logo on an email or webpage, and (ii) verify whether the domain name matches what would be expected for that brand. The second step is crucial because attackers can easily reuse legitimate logos on web pages they control.

Compared to other brand logo datasets, ours provides the following advantages. First, the authenticity of the brand to logos and domain names binding is verified by a trusted Mark Verifying Authority, unlike publicly-modifiable databases (e.g., Wikidata) that lack rigorous verification processes. Second, in addition to logos, our dataset provides a set of trusted domain names per brand exactly as specified by the brand owners at the time of issuing the certificate. This is particularly useful when brands operate many legitimate domains (e.g., multiple top-level domains - TLDs): the mapping helps avoid flagging these as phishing simply because the domain looks unfamiliar.

Final thoughts

LogoTrust presents a new direction in dataset construction: instead of relying solely on human curation or uncontrolled web scraping, it leverages an authentication protocol (BIMI) and certificate authorities to provide verifiable logo - domain mappings. It is not a replacement for large, diverse logo datasets (yet), but it fills an important niche where provenance and auditable authenticity matter, particularly for anti-phishing research and tools. As the BIMI adoption grows, this technique will become valuable for more trustworthy, continuously updated logo datasets.

Funding

European Cybersecurity Competence Centre and Network
Co-funded by the European Union

The project funded by the European Union under Grant Agreement No. 101128042 is supported by the European Cybersecurity Competence Centre. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Cybersecurity Competence Centre. Neither the European Union nor the European Cybersecurity Competence Centre can be held responsible for them.

Project details

  • Project number:101128042
  • Call:DIGITAL-ECCC-2022-CYBER-03
  • Topic:DIGITAL-ECCC-2022-CYBER-03-UPTAKE-CYBERSOLUTIONS
  • Type of action:DIGITAL JU SME Support Actions
  • Project starting date:1 October 2023
  • Project end date:30 September 2026

Contact