Web Almanac 2024: data and insights on SEO, UX and performance

Resources such as robots.txt are being better configured, Core Web Vitals are showing steady growth, and some modern standards, such as hreflang attributes or structured data, are finally gaining traction. In short, in baby steps, the Web is improving and “more sites have fewer problems,” although we are still far from ideal and significant technical challenges persist, such as the gap between mobile and desktop, poorly managed duplicate content or limited adoption of advanced configurations. This is the picture taken by the Web Almanac, which returns with a detailed analysis of the health of the Web based on a factual and unbiased observation of 16.9 million websites and the processing of as many as 83 terabytes of data. In addition to the numbers and static data, this work charts a clear path between the successes achieved and the opportunities yet to be seized, highlighting how effective optimization results in a faster, more accessible web that is suited to the needs of an increasingly demanding and mobile-first user base – while, on the contrary, many sites are still slowed down by poorly optimized resources and ineffective solutions. So let’s explore the main themes and trends of the Web Almanac 2024, trying to derive an operational framework of practical advice and insights for dealing with the modern Web in a knowledgeable way.

What is the Web Almanac

The Web Almanac is an annual report published by theHTTP Archive, compiled with input from dozens of industry experts who analyze the state of the web through data collected from millions of sites. It provides an in-depth overview of technologies, trends and best practices in development, SEO and design.

Get the most out of your digital strategy
Data, tools and insights: SEOZoom helps you make the most of creative and technical analysis for your website
Registrazione

A unique project, since its first edition in 2019 it has been a benchmark for understanding how the web transforms, offering a combination of large-scale data and in-depth insights from leading industry professionals.

Specifically, the 2024 report is based on a meticulous analysis of 83 terabytes of data derived from millions of websites, with the goal of translating complex information into actionable insights for developers, marketers, SEO professionals, and designers, helping them make informed decisions to improve the quality and competitiveness of their online activities.

A lens on the state of the web

What sets Web Almanac apart in the landscape of digital studies is its ability to provide a systematic and articulate view of the state of the web, combining detailed data and critical trend analysis. Each year, this project involves dozens of professionals – including analysts, developers, SEO experts and authors – who contribute to the planning and writing of its chapters until a document is produced that can serve both as a universal knowledge base on the web and as a practical tool for those who live and work online.

The chapter on SEO in the 2024 edition, for example, looks at key issues such as managing the robots.txt file , the importance of Core Web Vitals for user experience, and the strategic use of the canonical tag to solve duplicate content problems. What emerges is an increasingly complex and sophisticated web in which digital professionals are challenged to contend with constant technological and behavioral changes.

The project’s distinctiveness lies in its ability to transform numbers and trends into applicable best practices, encapsulating concrete solutions and insights in a single accessible platform. Thanks to its focus on complementary disciplines such as performance, accessibility, design and SEO, the Web Almanac contributes to outlining a holistic approach to digital: it does not just improve a single area but aims to build a more efficient, inclusive and sustainable web .

How the Web Almanac is implemented

The Web Almanac is based on a rigorous and collaborative methodological approach, combining automated processing of large datasets with human intervention to ensure contextualized and targeted analysis. The entire project is made possible thanks to the HTTP Archive , a huge repository of data on the Web using advanced measurement tools such as Lighthouse and WebPageTest. For the 2024 edition, more than 16.9 million websites were analyzed , with data collected from both home pages and internal pages.

The amount of information processed, amounting as mentioned to more than 83 terabytes, is designed to provide detailed insight into aspects such as adoption of modern technologies, adherence to technical performance standards, and implementation of SEO and UX strategies. But it is the human contributions that make the real difference: each chapter is the result of close collaboration between analysts, authors and reviewers , who synthesize the metrics collected to construct meaningful conclusions.

For example, Core Web Vitals metrics-such as Largest Contentful Paint (LCP) or Interaction to Next Paint (INP) -are analyzed both from a statistical perspective and based on their practical impact on site performance. This mixed approach, in which technology meets expert expertise, makes the Web Almanac an extremely reliable resource suitable for real-world interactions.

The importance of the Web Almanac for the modern web

Documenting the state of the web is a necessary step in keeping the web on the cutting edge and responding to the challenges that emerge each year; thanks to the Web Almanac, developers and practitioners can identify promising trends, anticipate risks, and adopt tools aligned with the latest standards.

For example, analysis of the growing adoption of technologies such as the WebP image format and VideoObject markup provides crucial insights into optimizing visual and multimedia resources. At the same time, the survey of mobile performance issues highlights the need to bridge the gap with the desktop, especially considering that more than 60 percent of global web traffic now comes from mobile devices.

Similarly, the focus on inclusivity and accessibility issues helps to create a web that is truly open to all, ensuring that every user can interact unhindered. This is especially important in an era when connection to the Internet is becoming-according to the United Nations-a basic human right.

The state of technical SEO: scanability and indexing

Scannability and indexing represent the two pillars of technical SEO, determining a site’s ability to be detected and valued by search engines. These processes, once considered primarily a technical responsibility, now emerge as strategic elements in the dynamics of ranking and accessibility. According to data collected by Web Almanac 2024, optimizing these aspects remains a challenge for many sites, but signs of progress in applying best practices are emerging.

The analysis shows a growing awareness toward improved configurations, including more widespread and accurate use of robots.txt, a key tool for optimizing crawling by crawlers. However, areas for improvement remain, such as handling robots meta tags and consistency between crawling and content destined for the index. In 2024, search engine performance and the advance of new crawlers, including those related to artificial intelligence, are changing the way rules are defined and applied, creating opportunities but also new challenges.

These data underscore the need to refine the technical infrastructure of sites to improve both bot interaction and end-user experience in an increasingly competitive and dynamic digital landscape.

The robots.txt file and its proper configuration

Among the most important tools for managing scannability, the robots.txt file has seen an increase in its correct application:83.9 percent of requests to the analyzed sites produced responses that met the standards, up from82.4 percent in 2022. The improvement is attributed to the adoption of practices more aligned with the formalization of the protocol through RFC 9390 in 2022, which ensured uniformity of interpretation among the major bots.

One interesting aspect concerns the use of thecatch-all user-agent “ * ” in 76.9 percent of mobile and 76.6 percent desktop files, which allows generic rules to be defined for all crawlers where no specific instructions exist. However, not all crawlers respect these rules; for example, Google Adsbot ignores catch-all, requiring dedicated configurations.

robots-directive-implementation

In addition to basic configurations, robots.txt files are becoming an increasingly strategic tool for managing specific resources and blocking access to crawlers that do not bring value. The Web Almanac analysis shows an increase in control over bots related to artificial intelligences, such as OpenAI’s GPTBot , which are increasingly mentioned and regulated in the files. This trend reflects not only the need to preserve crawl budgets, but also a focus on preventing exploitation of content for unwanted purposes.

The increased precision in configurations is an important sign of evolution, but there remain sites that do not fully exploit the potential of robots.txt, limiting their SEO performance due to absent or incorrect implementations.

The evolution of the robots.txt file: new challenges

The role of the robots.txt file has changed over time, evolving from a simple unofficial tool to a standardized and globally recognized component. ThroughRFC 9390, 2022 marked a turning point that led crawlers to adopt a uniform interpretation of the rules. This shift has fostered greater clarity in the management of crawlable resources, but the new challenges of technical SEO are dictated by two major forces: the evolution of traditional bots and the advent of new artificial intelligence crawlers.

A significant finding emerges from the Web Almanac study: some sites are specifically blocking bots such as AhrefsBot and MJ12Bot, analytics tools commonly used by competitors. This customization of rules reflects a clear focus on protecting the crawl budget and minimizing the dispersion of server resources. At the same time, controls on bots related to generative AI, such as the aforementioned GPTBot, grow, marking a break with the past, when robots.txt files were geared almost exclusively to traditional search engines.

The robots meta tag guidelines: managing crawler behavior

Parallel to robots.txt, robots meta tags play a crucial role in controlling content indexing. These tools, applicable at the individual HTML page level, offer more granular customization than the general file. However, the Web Almanac 2024 analysis highlights some major issues: among the 24 directives valid today, the most widely used – index and follow – turn out to be unsupported by Googlebot and thus completely ignored. This misapplication, still present on many pages, demonstrates an understanding gap in technical configurations.

A case in point that emerged from the study concerns the use of the noarchive directive, which seems to be applied strategically when targeted to specific bots. For example, while only1 percent of pages use this rule under the generic name “robots,” the percentage rises to 36 percent for Bingbot, a clear attempt to prevent content from appearing in the results of the new artificial intelligence-based Bing Chat.

This targeted approach indicates that webmasters are becoming more aware of the importance of customizing the rules given in meta tags by pandering to the different behaviors of search engines and technology platforms. However, errors and ambiguities persist, especially when directives are not congruent with other signals, such as canonical tags or robots.txt settings.

To ensure effective configuration, the report reiterates the importance of monitoring rule enforcement through tools such as Google Search Console, avoiding misunderstandings that could penalize indexing or organic visibility.

Managing duplicate content with canonicalization

Managing duplicate content remains among the greatest complexities of technical SEO. When multiple URLs display identical or very similar content, search engines can get confused, with the risk of distributing ranking signals among different pages or, worse, ignoring those most relevant for organic ranking. To consolidate these signals onto a single resource and reduce algorithmic confusion, the canonical tag is the most effective tool available today for consolidating ranking signals and guiding crawlers to the preferred version of a page.

According to the data reported in Web Almanac 2024 , the use of the canonical tag has reached an important level of maturity: 65 percent of the analyzed sites correctly implement this technique, confirming a slight, but steady, progression from the surveys of previous years. Moreover, the growth in adoption is accompanied by a partial maturation in implementation: the canonical tag is now implemented uniformly in both desktop and mobile versions, reducing the mismatch observed in past years.

However, technical errors remain that undermine the effectiveness of the tool. Among the most common problems identified are canonical chains, unreachable URLs or incorrect implementations in sites using dynamic rendering technologies, making rigorous monitoring necessary to ensure optimal results. Proper optimization is no longer just a technical issue, but a strategic component of the entire SEO ecosystem, on which a site’s ability to compete for top queries depends.

The use of the canonical tag in 2024

The implementation of the canonical tag is now more widespread than ever, and the Web Almanac highlights an important milestone: for the first time, there is a near-perfect balance between adoption on desktop and mobile devices. Data show that 65 percent of mobile sites and 69 percent of desktop sites are using this technique correctly, eliminating a historical discrepancy that had penalized mobile-first sites.

canonical-implementation

The canonical tag is mainly applied at the HTML code level and in most cases (over 97%) points to the desired version of the page without errors. However, the report points out that 3% of sites have serious inefficiencies, such as unreachable or missing URLs. Another interesting finding concerns the increasing adoption of self-referential canonical: this configuration, used to prevent unintentional duplication, was found in the majority of the analyzed pages. This shows an increased attention by webmasters, who prefer to adopt preventive control even in the absence of obvious duplicates.

An innovation increasingly adopted in 2024 is the indexifembedded tag, designed to properly handle content displayed via iframes, widgets or embedded videos. About 7.6 percent of the mobile pages analyzed include content displayed via iframe, and most of these have implemented canonical tags and specific meta tags to optimize the indication of embeddable resources.

Another factor to consider is the increase in canonical tag implementations through HTTP headers, especially for non-HTML resources, such as images and PDF files. Although this practice has not yet overtaken traditional implementation in source code, it is a sign of technological evolution and more dynamic content management.

Mistakes to avoid when using canonical

Despite the progress observed, misconfigurations of the canonical tag are still a major problem, often underestimated by webmasters. Web Almanac 2024 identifies some common errors that can compromise the effectiveness of the tag and negatively affect indexing and page ranking.

One of the most common errors involves canonical chains: instead of pointing directly to the preferred URL, some configurations redirect from one page to another creating unnecessary intermediate paths. This practice not only complicates crawling for search engines, but can also disperse ranking signals. Similarly, cases of canonical loops, in which pages circularly refer to each other without ever indicating a final version, are also detected.

Another problematic aspect concerns inconsistencies between raw HTML and rendered HTML. About1 percent of the pages analyzed have discrepancies between the two levels of implementation, causing confusion for crawlers and risking thwarting the intent of the tag. This is especially common on sites using dynamic rendering technologies or unoptimized frameworks.

Finally, an emblematic error is the conflict between the canonical tag and other signals, such as the robots .txt file or robots meta tags. For example, some pages with a canonical pointing to a main URL result in being unscannable due to a block imposed in the robots.txt file, rendering the entire setup ineffective.

Avoiding these errors requires constant monitoring and a strategic approach to implementation. Tools such as Google Search Console and Screaming Frog are irreplaceable for analyzing canonical paths and identifying any inconsistencies before they become structural problems for the site. The data show that proper application of canonical is not only a good practice, but an essential requirement for optimizing resources and maximizing the impact of one’s pages in search engines.

Looking at Core Web Vitals and their actual application

Core Web Vitals (CWVs) have become established as a central standard for evaluating the technical quality and user experience of a Web site; these metrics are a key parameter for search engines, reflecting the direct effect of performance on accessibility, usability, and organic rankings. The Web Almanac 2024 analysis provides a detailed picture of the adoption and effectiveness of CWV metrics-with a focus on how sites are responding to the pressure for a more responsive, stable, and accessible web.

While there are improvements in overall scores, adoption is still uneven. Only 54 percent of desktop sites and 48 percent of mobile sites meet minimum performance standards, with issues still evident in improving Largest Contentful Paint ( LCP) and greater difficulty in ensuring optimal visual stability in uploads. This data is an immediate reminder of the importance of ongoing optimization to close the gaps that are present, especially in a digital context where mobile is increasingly a major player.

Technical overview: what the report data tells us

Since the launch of the Core Web Vitals in 2020, the three core metrics – LCP, Interaction to Next Paint (heir to First Input Delay) and Cumulative Layout Shift – have become an essential benchmark for UX performance, but many web pages still struggle to optimize sufficiently to exceed the required thresholds, especially on mobile devices.

According to the report’s data, only 54 percent of desktop sites meet the criteria for Core Web Vitals, while this percentage drops to 48 percent for mobile sites. The transition to more precise metrics, such as the INP, has made the gap in responsiveness of interactions even more pronounced: many sites cannot guarantee a quick response to user commands. The Largest Contentful Paint, which measures the time it takes for the main content element on the page to become visible, remained the worst-performing metric, penalized by high loading times and inefficient use of resources.

Cumulative Layout Shift, which evaluates visual stability during page loading, is another critical area, with 40 percent of sites experiencing significant shifts in layouts, a problem that affects not only user experience but also conversion rates. These results highlight the importance of an ongoing focus on technical optimization, combining strategies such as resource compression, content preloading, and reducing unnecessary scripts.

Optimization techniques to improve CWVs.

The obvious difficulties in meeting the requirements of CWV metrics highlight the need for a more systematic and focused approach to optimizing Web site performance. The Web Almanac 2024 highlights several effective techniques adopted by sites that succeed in exceeding the optimal thresholds.

To improve Largest Contentful Paint-which should be less than 2.5 seconds-the most relevant strategies include:

  • Image optimization: converting visual resources to lightweight formats such as WebP and specifying static sizes to avoid rendering delays .
  • Caching and use of CDNs: reduce response times by serving static resources through content delivery networks closer to users.
  • Critical CSS loading: minimize or embed essential CSS directly in the initial markup to speed up the display of key elements.

In the context of responsiveness, INP-which measures how quickly pages respond to interactions-requires a focus on optimized management of JavaScript. Techniques include:

  • Eliminating nonessential scripts through lazy loading and deferring loading.
  • Reduction of work in the main thread through asynchronous processes, load sharing and use of lighter frameworks.

As for visual stability assessed through CLS, the report suggests interventions that prevent unwanted movement during loading, such as:

  • Specifying sizes for images, videos and iframes to reserve a default space before they load.
  • Preloading fonts to avoid sudden text replacements.
  • Avoid dynamic content that fits into the layout without reserving its space.

Sites that implement these practices experience not only improvements in CWV scores, but also a positive impact on business KPIs, such as increased conversion rates and reduced abandonment rates.

Why mobile still suffers compared to desktop

Despite the well-established priority on mobile-first indexing, the gap between desktop and mobile performance continues to be a major obstacle for many websites. The percentage of sites meeting Core Web Vitals on mobile devices is 12 percent lower than desktop sites (48 percent versus 54 percent), a figure that further widens the gap noted in previous years. The Web Almanac identifies the main reasons for this phenomenon.

percent-of-good-cwv-experiences-on-desktop

First of all, many sites fail to optimize for the hardware limitations of mobile devices, which often have less powerful processors and less capable memories than desktop computers. This is compounded by the challenges associated with mobile connections, which are generally slower and less stable than wired or Wi-Fi connections.

On the other hand, many adopted development frameworks are not designed to ensure smooth responsive experiences, with unnecessary resources weighing down pages and slowing loading times on small screens. Failure to optimize interactive elements or the interfaces themselves also contributes to a poorer experience for mobile users.

The introduction of INP as a metric to measure interactivity has highlighted a further gap between desktop and mobile performance: in 2024, only 48% of mobile pages meet the minimum thresholds of Core Web Vitals, compared to 54% of desktop versions, a differential of +11%, almost double the gap recorded in 2022 (+5%)

percent-of-good-cwv-experiences-on-mobile

To close this gap, the report stresses the importance of adopting strategies such as lazy loading for nonessential resources, responsive design that eliminates superfluous elements for mobile, and implementing server-side caching systems . Tools such as Lighthouse and PageSpeed Insights remain critical for diagnosing mobile-specific problems and acting in time.

Reducing the gap between mobile and desktop must become a priority for brands, especially considering that more than 65 percent of global web traffic now comes from mobile devices. Only by ensuring smooth browsing experiences and optimal performance will it be possible to achieve significant results, not only from a ranking perspective but also in terms of engagement and conversions.

User experience and SEO: an increasingly strong combination

The evolution of search engines has led to a profound change in the value placed on user experience (UX) as a determining factor for organic ranking. Today, SEO is no longer limited to simple content optimization, but includes metrics and technologies that measure ease of use, speed of loading and page stability. Thanks to tools such as Core Web Vitals and the widespread adoption of mobile-first indexing, it is now clear that excellent performance is critical not only to winning over search engines, but also to meeting the ever-higher expectations of users.

Web Almanac 2024 highlights how the relationship between SEO and UX is now inseparable: sites that offer an intuitive, fast, and accessible experience tend to perform better in terms of both ranking and user engagement. However, despite the improvements observed in recent years, many pages still show significant gaps, especially on mobile devices. This highlights the need to refine targeted strategies that combine technical optimization and user-centered design.

The importance of a mobile-first experience

With the transition to mobile-first indexing, fully implemented by Google, the digital landscape has changed dramatically. Today, the mobile version of a site is the primary focus for crawlers during crawling and indexing. This approach, born out of the fact that more than 65 percent of global traffic now comes from mobile devices, sets a higher standard in building a user experience designed specifically for smartphones and tablets.

Data provided by Web Almanac 2024 show progress in the ability of pages to adapt to mobile needs, but also highlight recurring issues. Among the 17 million sites analyzed, only 48 percent of mobile versions meet the standards of Core Web Vitals, compared to 54 percent of desktop versions. The gap between performance on mobile and desktop, instead of narrowing, has even widened from previous years, indicating that many sites are not ready to provide an optimized experience on mobile devices.

The main critical issues observed relate to three key factors:

  1. Poor loading performance: longer times on mobile devices due to slow connections, limited hardware and poorly optimized web resources.
  2. Not fully responsive design: layouts that are not adapted to the mobile viewport, with obvious problems such as fonts that are too small, buttons that are difficult to click, and spacing that is not ideal.
  3. Lack of specific technical optimizations: too many mobile pages do not take full advantage of techniques such as lazy loading, image compression, or reducing non-essential JavaScript.

These problems not only penalize user satisfaction, but also negatively affect SEO ranking, as Google crawlers prioritize smooth, high-performing experiences on mobile devices.

What is lacking in less optimized sites is an integrated vision that combines technical performance goals with a truly user-focused approach. For a site to improve its impact, some best practices need to be implemented:

  • Reduce loading times through image optimization (e.g., using WebP format ) and deferred loading (lazy loading) of nonessential resources.
  • Design fully responsive interfaces that put readability and ease of interaction at the center. For example, well-spaced buttons, care for text formatting and touch-friendly elements are now prerequisites.
  • Regularly test mobile performance with tools such as Google Search Console and Lighthouse , to identify and correct any gaps.

The needs of mobile users pose urgent questions for those developing and optimizing websites. Failure to adapt quickly to this environment means not only missing growth opportunities, but risking being excluded from the dynamics of the digital marketplace. In the mobile-first era, delivering a high-performing experience is no longer an option-it is the starting point for remaining competitive.

The state of on-page SEO: optimizing structural elements

In the complex landscape of on-page SEO, optimization of structural elements continues to play a crucial role in the ranking and usability of web pages. Tools such as titles (H1-H6), title tags, meta descriptions and heading tags not only help search engines better understand content, but also directly influence user experience and engagement. According to data from Web Almanac 2024, while attention to the proper use of these elements has increased, errors and shortcomings persist that limit the impact of many pages.

Structural elements, in fact, do more than just facilitate crawlers; they are a bridge between content and users. Proper semantic hierarchy can improve crawling and indexing, while well-written meta descriptions and titles contribute directly to increasing CTR (Click-Through Rate). However, the report points out that many pages ignore good optimization practices, resulting in content that is less visible and performs better.

Content structure and semantic hierarchy

Semantic hierarchy is the basis on which to build effective content from the perspective of on-page SEO: titles, headings and meta tags are not mere technical details, but key tools for communicating to both users and search engines what a page offers, organizing the information in a clear and logical way.

Data from Web Almanac show that about 68 percent of the pages analyzed use a consistent semantic heading hierarchy, an advance over previous editions’ data, but still insufficient in an increasingly competitive Web. Common errors, such as duplicate H1s or the absence of intermediate levels (e.g., jumping directly from an H1 to an H3), are still common, especially among sites built with less optimized CMSs.

Dwelling on “pure” numbers, mobile home pages show an average of 364 visible words versus 400 for desktop versions, while internal pages register slightly lower numbers (317 words on mobile and 333 on desktop). “Raw” content, on the other hand, increases significantly after rendering, with an increase of 13.6 percent on mobile and 17.5 percent on desktop , a figure that demonstrates an increased focus on directly loading information into the initial HTML.

home-page-visible-words-rendered-by-percentile

On the tools side, title tags and meta descriptions remain the most influential structural elements on SERPs. 98% of pages include a title tag, but nearly 30% of these titles were found to be generic, redundant, or not optimized for specific queries. This lack affects both ranking and overall visibility. As for meta descriptions, only 66% of the pages analyzed include one, and of these, more than 70% are rewritten by Google to better suit users’ search intent, often due to poor or out-of-context wording.

How to manage title tags and description meta

Title tags represent the main title of a page for search engines and are one of the main variables affecting visibility in search results. In addition to their centrality to ranking, they play a key role in convincing the user to click on a particular page. However, Web Almanac 2024 points out that too many titles neglect critical optimization elements, such as emphasis on strategic keywords or clear structure.

According to the report, optimized titles that adhere to the recommended length (between 50 and 60 characters) get 15 percent higher CTRs than those that are too short or too long. Although meta descriptions do not directly affect ranking, they are even more critical for increasing click-through rates: a well-written description provides a quick overview of the page content, aligning with user needs and expectations.

In addition, the report shows that the use of the title tag dropped slightly between 2022 and 2024: 98.8 percent of pages in 2022 used it, while today it is 98.0 percent on desktop and 98.2 percent on mobile. The adoption of meta descriptions also decreased, dropping from 71% to 66 .7% on desktop and 66.4% on mobile, signaling less attention to these important on-page components.

The main problem highlighted by the report, however, concerns automatic rewriting by Google, which affects more than 70 percent of the meta descriptions examined. This frequently occurs in cases where descriptions are generic, unrelated to page content or longer than the optimal range (150-160 characters). Avoiding these issues requires a focus on specific, search intent-oriented descriptions capable of directly answering users’ questions or needs.

meta-description-characters-by-percentile

Managing these elements as well as possible also means continually testing their effectiveness: tools such as Google Search Console can offer useful insights to identify pages with rewritten titles or descriptions that do not convert, allowing timely action to be taken.

Heading tags: readability and hierarchy

Heading tags (H1-H6) are critical for organizing web content in a way that is readable for both users and crawlers. A well-planned structure not only improves the browsing experience by making it easier for users to understand the main points of content, but also allows search engines to better understand the hierarchy of information.

According to data from Web Almanac 2024, 70% of desktop pages include at least one H1, but only 55% use a correct semantic hierarchy, highlighting numerous inefficiencies. Common errors include using multiple H1s on the same page, confusing layers (e.g., inserting an H3 without going through H2), and not having headings that clearly represent the content.

These errors not only confuse crawlers, causing more complex and potentially incomplete crawling, but also negatively affect the user experience, which may perceive content as disorganized or of little value.

Another problem reported concerns the lack of readability for headings: by using wording that is too short or generic, many tags fail to effectively communicate the topics covered. In contrast, the best results are obtained with headings that are descriptive and, at the same time, include relevant keywords while maintaining a close link to the search intent.

Correcting these problems requires greater attention to planning and a thorough study of queries. Creating a consistent semantic structure, with a single H1 and progressive use of headings, provides not only technical benefits but also a smoother and more engaging experience for users.

HTML errors in the <head> tag: what they are and how they impact SEO

The <head> section of the HTML code is the area that provides search engines with essential information about a web page: from its title to the CSS file for design to metadata useful for SEO. When this section includes invalid elements, problems such as crawlers interrupting processing or not reading strategic tags (e.g., canonical or hreflang) can occur. This, according to data from Web Almanac 2024, continues to be a challenge for many sites.

pages-with-invalid-HTML-in-head

Among the top errors detected in 2024, the <img> tag (used for images) accounted for 29% of errors on desktop pages and 22% on mobile, numbers up sharply from the 10% found in 2022. The use of invalid tags such as <div>, designed for the body of the page and not the <head>, also increased, reaching 10% of mobile pages and11% of desktop pages , the highest values in recent years.

In contrast, the elements considered valid in the <head> tag are as follows:

  1. <title>: page title, essential for SERPs.
  2. <meta>: metadata for descriptions, language and crawler directives.
  3. <link>: links to external resources, such as CSS or RSS sitemaps.
  4. <script>: JavaScript files for advanced functionality.
  5. <style>: CSS styles for page design.
  6. <base>: base URL of the page.
  7. <noscript>: fallback for browsers that do not support JavaScript.
  8. <template>: predefined templates for reusable content.

Any tag not included in this list may be considered invalid, leading to a number of negative effects:

  • Crawl interruption: crawlers may crash in front of non-standard elements, ignoring some of the metadata included in the <head> section.
  • Indexing problems: inclusion of errors could compromise the reading of canonical tags, hreflang or meta description, reducing the SEO effectiveness of the page.

Media, visual resources and interactivity

Images that capture attention at first glance, videos that entertain and inform, interactive content that transforms a simple visit into an immersive experience: the modern Web thrives on media, which are not mere aesthetic elements, but essential tools for capturing attention, communicating effectively, and improving overall UX. However, they are also among the most problematic assets to manage: if not properly optimized, they penalize site performance and, consequently, user experience and search engine rankings.

Web Almanac 2024 shows that nearly 96 percent of the pages analyzed use at least one image, confirming the growing importance of visual content. Videos are slightly less prevalent but growing strongly, thanks in part to their impact on dwell time and engagement. Despite these numbers, it is clear that a significant percentage of visual assets are still not optimized for technical performance and fast loading, causing significant slowdowns, especially on mobile devices.

The adoption of modern formats and advanced solutions, such as WebP for images and VideoObject markup for video, is growing, but many opportunities remain unexplored. Common errors, such as unreported sizes or failure to use lazy loading, highlight that there is still a way to go for widespread and effective optimization.

The state of media optimization on web pages.

Images and video content are the backbone of multimedia communication on the Web. However, the Web Almanac 2024 analysis reveals that not all websites are taking full advantage of modern optimization technologies. Crucial data from the report show several trends and issues to consider.

The WebP format, designed to reduce image file size without sacrificing visual quality, is now implemented in 56 percent of web pages analyzed, a significant increase from the 40 percent recorded in 2022. The move to lighter images results in significantly improved loading times, especially on mobile connections. However, the report points out that many sites do not provide adequate fallbacks for older browsers, creating potential compatibility issues for a portion of users.

On the video front, the VideoObject markup, which is essential for improving the visibility of video content in rich results, has been adopted by onlyabout 1 percent of the pages surveyed-to be precise, it has now reached 0.9 percent of the pages surveyed, virtually double the 0.4 in the 2022 analysis. This figure disappoints expectations, as sites that properly use metadata for video achieve significant benefits in terms of discoverability and engagement in SERPs. The limited adoption indicates an underestimation of the potential offered by structured data for media.

percentage-of-pages-with-video

These numbers show a growing gap between sites that invest in advanced optimization of visual assets and those that ignore such opportunities, directly impacting both technical performance and user experience.

Improving the impact of visual assets

Optimizing images and videos means not only improving site performance, but also offering users more usable and interactive content. Web Almanac 2024 highlights some key practices to ensure that visual assets do not hinder loading speed, a crucial element for Core Web Vitals.

Among the most effective techniques for images, the increasing use of the WebP format emerges, which offers 30 percent more efficient compressions than traditional formats such as JPEG or PNG; however, many pages continue to ignore the importance of declaring static image dimensions (width and height), causing unwanted layout shifts that penalize CLS.

For videos, the report suggests adopting deferred loading, which loads content only when actually needed. This strategy is particularly useful to prevent video files, which are often heavy, from slowing down the initial page load. Using VideoObject markup to provide search engines with useful details, such as duration or descriptions, is another step toward advanced optimization.

Common errors related to visual asset management include:

  • Lack of compression: images or videos that are too heavy, negatively affecting loading times.
  • Absence of lazy loading : immediate loading of all resources, regardless of their actual visibility in the viewport.
  • Missing or nondescript alt text: 42% of the analyzed images do not include correct alt text, reducing both accessibility and SEO optimization.

To improve the impact of visual assets, the report recommends taking an integrated approach to their management: advanced compression for images and videos, intelligent preloading for prioritized assets, and strategic use of structured markup to improve the visibility and quality of content.

The report specifically indicates that only 58 percent of images on mobile pages have alt text, an improvement from 54 percent in 2022. However, a significant percentage (14% of alt-text images included) have empty or generic descriptions, compromising accessibility and SEO optimization.

percentage-of-img-missing-alt

Investing in visual optimization is not just a technical matter: it also means building an engaging and efficient experience for users, while protecting rankings in SERPs and increasing the likelihood of conversion.

Links and relationships in the web ecosystem

Links are the backbone of the digital ecosystem: they connect pages, content and users, facilitating navigation and helping search engines understand the structure and relevance of a site. In the context of SEO, managing internal and external links has always been a strategic component: if well planned, they can strengthen a site’s authority, improve user experience, and optimize scannability by bots.

Web Almanac 2024 highlights significant progress in the use of internal links, particularly in sites that adhere to structured strategies to drive visitors and improve organic rankings. However, gaps remain in complementary aspects such as the adoption of specific attributes (nofollow, sponsored or UGC) in external links and the strategic management of outbound links. Such shortcomings can limit a site’s ability to maximize the benefits of linking, leading to dispersion of value or inefficiencies in indexing.

Investing in a well-structured link system means building bridges: between different sections of a site, between users and relevant content, and between one’s domain and the rest of the Web ecosystem.

Overview of internal and external links

Internal link management is one of the most powerful tools for improving a site’s information architecture and strategically distributing page value. According to data from Web Almanac 2024, sites are showing increased commitment to optimizing internal linking: 129 links per page on average in the top 1,000 sites analyzed, up from 106 in 2022. This underscores the increasing focus on more effective organization of the navigation flow, particularly for complex portals such as e-commerce and publishing sites.

Another positive finding is the use of descriptive anchor text: in fact, in 91 percent of mobile pages and84 percent of desktop pages , links include useful descriptions, improving both user experience and semantic understanding by crawlers.

Well-designed internal linking provides numerous benefits:

  • Improves crawling and indexing: It helps crawlers easily navigate through the site, ensuring that the most important pages get the right attention.
  • Distributes internal authority: allows you to “push” strategic pages by increasing the relevance of priority sections.
  • Guides users: simplifies the browsing experience, reducing the bounce rate and increasing the average time spent on the site.

On the side of external links, however, some issues still emerge. The report indicates that only 32.7 percent of pages use the nofollow attribute to handle sponsored or questionable links, while the use of sponsored and UGC link attributes remains below 1 percent. These data indicate little attention to proper categorization of external links, a tool that helps search engines identify the type of relationship between resources and correctly interpret the site’s contribution.

anchor-rel-attribute-usage

Also common mistake is the lack of a strategy to avoid broken outbound links or to content that no longer exists, a practice that not only negatively affects the user experience but can also technically penalize the domain.

Internal and external linking best practices

To ensure that links are effective both from a user perspective and for indexing, it is necessary to adopt structured and consistent planning. The Web Almanac 2024 highlights how the best results are achieved by following some targeted best practices .

On the internal linking front , the best performing sites adopt strategies such as:

  • Balanced use of relevant anchor text: choosing descriptive and relevant anchors to improve both user experience and understanding by crawlers.
  • Hierarchical categorization: linking strategic pages (e.g., product sections, main categories or in-depth articles) to distribute internal authority in a targeted manner.
  • Continuous monitoring: regularly check for broken or broken links with tools such as Screaming Frog or Google Search Console to avoid value leakage.

In terms of external links , the key is to balance authority with relevance:

  • Label them properly: use attributes such as nofollow, sponsored or UGC to indicate the type of relationship with the target site. For example, links to sponsored articles or user-generated contributions should not convey ranking value but comply with search engine guidelines.
  • Validate links regularly: use tools to monitor outbound links and avoid excessive redirects or 404 errors, which can compromise both usability and ranking.

Finally, it is critical to perceive outbound links as an opportunity to build relationships in the broader Web ecosystem. Linking to high-quality, target-relevant resources offers real value to both search engines and users, enhancing the site’s credibility as a reference point for specific information.

Structured data and SEO internationalization

Structured data and the proper implementation of internationalization techniques , such as the hreflang tag, are key levers for improving organic visibility and optimizing the user experience on a global scale. Through these tools, search engines can more easily interpret a page’s content, enhancing its relevance in search results and encouraging greater geographic or linguistic customization.

Registrazione
Win the visibility challenge!
Identify weaknesses in your technical SEO and improve performance, visibility and UX of your site

Web Almanac 2024 highlights how the adoption of structured data is becoming increasingly common on websites, with 53 percent of mobile sites using at least one type of structured markup. However, large areas of inefficiency remain, especially when getting into the details of advanced markup, such as those for reviews, events or products, which still have limited uptake. At the same time, less than 10 percent of global sites correctly implement the hreflang tag to handle regionalized or multilingual content, a sign of poor SEO investment in internationalization despite its potential.

Overview of adoption of structured data and hreflang

Structured data is based on standards such as Schema.org and is interpreted by search engines to show enhanced content through rich snippets, such as rating stars or product price details. According to the Web Almanac 2024 analysis, structured data is becoming increasingly prevalent in major digital industries, particularly in e-commerce and publishing portals, where it helps improve direct page visibility in search results.

Implementation of structured data is stable, with 53% of mobile pages and 51% desktop pages integrating structured markup directly into raw HTML. Only 2% use JavaScript to introduce them, a choice not recommended by search engines because it is less reliable.

most-popular-home-page-schema-types

Despite this prevalence, many sites do not take full advantage of the potential of structured markup, merely implementing basic formats without delving into more sophisticated categories. For example, VideoObject markup , which enriches search results with information related to video content, is adopted by only a small percentage of the pages analyzed.

As for the hreflang tag, used to signal the language and geographic location of a specific page to search engines, the analysis reveals a low uptake and frequent errors in implementation. Fewer than 10 percent of global sites use it correctly, and among those that do implement it, it is not uncommon to find conflicts with canonical tags or incorrect syntax that undermine its effectiveness. These shortcomings, in an increasingly interconnected world, penalize the ability of sites to effectively reach users browsing in different languages or from different regions.

Best practices, critical issues and suggestions

In order to optimize the use of structured data and properly manage internationalization, it is essential to take an informed and methodical approach, avoiding improvised configurations that can lead to technical errors and inefficiencies.

In particular, for structured data it is advisable to:

  • Choose the correct formats: do not limit yourself to basic markup such as Article or Organization, but integrate specific categories such as Product, FAQ or Review, which significantly improve presentation in SERPs.
  • Validate markup on a regular basis: tools such as Google’s Rich Results Test allow you to verify the correctness of the configuration and avoid syntax errors that could make the markup unreadable by search engines.
  • Integrate structured data into sitemaps: The report emphasizes how linking markup also in XML sitemaps helps crawlers better understand content structure and identify key pages.

As for handling international pages with hreflang, however:

  • Avoid conflicts with canonical tags: implement hreflang and canonical consistently, making sure that the signals clearly direct search engines to the right version of the page. In fact, conflicts are still frequently noticed when sites try to direct pages with duplicate content to a single preferred version while language variants are disregarded. The optimal solution is to implement consistent hreflang instances with canonical tags and ensure that search engines can correctly distinguish regional versions as unique resources.
  • Include all language and region combinations: An effective implementation of the hreflang tag requires that each version of a page also reference all other variants, including itself. Omitting combinations or creating circularity errors (loops) can penalize the entire structure.
  • Use validation tools: platforms such as Google Search Console help monitor the accuracy of hreflang implementation and quickly correct any problems.

Web Almanac 2024 highlights that the proper use of structured data and hreflang is not just a technical best practice, but a real competitive advantage for websites that want to stand out in complex and competitive markets. Investing in these technologies means not only improving organic visibility, but also offering users a highly personalized and relevant experience, wherever they are in the world.

7 days for FREE

Discover now all the SEOZoom features!
TOP