Google Lighthouse: Expectation vs. Reality

Make an informative decision before picking another optimization suggestion

When someone starts looking for optimizing the performance of their web application they immediately come across this tool called lighthouse by Google.

Google lighthouse is an awesome tool to find out the performance issues in your web application and list down all the action items. This list helps you fix the issues and see the green color performance score on your Google lighthouse report.

With time Google lighthouse has become a defacto standard for web performance measurement. Google is pushing it everywhere from chrome dev tools to browser extensions, page speed insight to web.dev, and even webmaster search console. Anywhere if you talk about performance you will see the Google lighthouse auditing tool.

This article will cover the usage of Google lighthouse, its strengths, and its weaknesses. Where to trust it and where to not. Google has advertised all the benefits of the tools and integrated it in all of its other major tools like search console, page speed insight, and web.dev. This forces people to improve their score sometimes at the cost of something important.

Many teams do weird things to see green ticks in their Google lighthouse report without knowing the exact impact of it on their conversion and usability.

Issues which needs to be tackled

A) CPU power issue

Google lighthouse has made it very easy to generate your site performance report. Open your site, go to dev-tools click Audit Tab, and run the test. Boom you got the results. But wait can you trust the score you got, the answer to this is a big no.

Your results vary a lot when they are executed on a high-end machine vs when executed on a low-end machine because of different available CPU cycles to the Google lighthouse process. You can check the CPU/Memory power available to the Google lighthouse process during the test at the bottom of this report.


The Google lighthouse team has done a great job in throttling the cpu to bring computation cycles down to an average of most used devices like MOTO G4 or Nexus 5X. But on a very high-end machine like the new fancy MacBook Pro throttling CPU cycles does not drop CPU cycles to the desired level.

For example

Let a high-end processor like Intel i7 can execute 1200 instructions in a sec by throttling it 4x only 300 instructions will get executed.

Similarly, a low-end processor like intel i3 can only execute 400 instructions in a sec and by throttling it to 4x only 100 instructions can get executed.

It means everything on intel i7 or any other higher-end processor will be executed faster and will result in much better scores.

One of the critical matrices in the Google lighthouse is TBT (Total Blocking Time) which depends on CPU availability. High CPU availability ensures a fewer number of long tasks (tasks that take more than 50ms). Less the number of long tasks lower is the TBT value and higher is the performance score.

This is not the only problem, Google lighthouse scores can differ between multiple executions on the same machine. This is because Google lighthouse or in fact any application cannot control the CPU cycles as this is the job of the operating system. The operating system decides which process will get how many computation cycles. It can reduce or increase CPU availability based on many factors like CPU temperature, other high priority tasks, etc.

Below are the Google lighthouse scores on the same machine when it is executed 5 times for housing.com once serially and once in parallel. When executed serially results are completely different than when run in parallel.

The operating system distributes the cpu cycle among 5 processes when running in parallel. During serial execution, all available cpu cycles were utilized by a single process.

When the Google lighthouse is executed 5 times on the housing home page serially.


 let numberOfTests = 5;
 let url = 'https://housing.com';
 let resultsArray = [];
 (async function tests() {
  for(let i =1;i <= numberOfTests; i++) {
   let results = await launchChromeAndRunLighthouse(url, opts)
   let score = results.categories.performance.score*100;
   resultsArray.push(score);
  }
  console.log(median(resultsArray));
  console.log(resultsArray);
 }());

Median - 84

[ 83, 83, 84, 84, 85]

Results are pretty much consistent.

When the same test is executed in parallel.


const exec = require('child_process').exec;
const lighthouseCli = require.resolve('lighthouse/lighthouse-cli');
const {computeMedianRun as median} = require('lighthouse/lighthouse-core/lib/median-run.js');

let results = [], j=0;
for (let i = 0; i < 5; i++) {
exec(`node ${lighthouseCli} 
 https://housing.com 
 --output=json`, (e, stdout, stderr) => {
   j++;
   results.push(JSON.parse(stdout).categories.performance.score);
   if(j === 5) {
    console.log(median(results));
    console.log(results);
    }
  });
}

Median - 26

[ 22, 25, 26, 36, 36 ]

You can clearly see the difference in scores between the two approaches.

B) Google lighthouse covers only the most generic issues and do not understand your application behavior

This is the most complex issue which I see with Google lighthouse reporting. Every application is different and optimizes the available resource where it sees the best fit.

Gmail is the best example of this case. It prioritizes emails over any other things and mails get interactive as soon as the application loads in the browser. Other applications like Calendar, Peak, Chat, Tasks keep loading in the background.

If you will open the dev tools when Gmail is loading you might get a heart attack seeing the number of requests it makes to its servers. Calendar, Chat, Peak, etc. adds too much to its application payload but Gmail’s entire focus is on emails. Google Lighthouse fails to understand that and gives a very pathetic score to Gmail applications.

There are many similar applications like Twitter, a revamped version of Facebook. Performance is one core metric for these websites but they all fail to impress Google lighthouse.

All these companies have some of the best brains who very well understand the limitations of the tool. They know what to fix and what aspects to be ignored from Google lighthouse suggestions. The problem is with organizations that do not have resources and time to explore and understand these limitations.

Search google for “perfect lighthouse score” and you will find a hundred articles explaining how to achieve 100 on the Google lighthouse. Most of them have never checked other critical metrics like conversion or Bounce rate.

The only solution to this issue is to measure more and regularly. Define core metrics your organization is concerned about and prioritize them properly. Performance has no meaning if it is at the cost of your core metrics like conversion.

Solving the score inconsistency issue

Inconsistency in Google lighthouse scores cannot be solved with 100% accuracy but can be controlled to a greater extent.

A) Using hoisted services

Cloud services are again an awesome way to test your site quickly and get a basic performance idea. Some of the google implementations like page speed insight tries to limit the inconsistency by including Google lighthouse lab data and field data (google tracks the performance score of all sites you visit if you allow Google to sync your history). Webpagetest queues the test request to control CPU cycles.

But again they also have their own limitations.

  • Cannot make all tests serial as this will increase waiting time for tests. Making them parallel on different machines will increase infra cost to infinity. Parallel execution on the same machine will result in uneven CPU cycle distribution.
  • Different providers have different throttling settings like some prefer to not throttle CPU when executing tests for the desktop site. Which may or may not be a perfect setting for most people.
  • Services need to have servers all around the world (webpagetest already has this feature) to understand the latency behavior in your target location.

You will be amazed by seeing the delta between the smallest and largest of ten test runs of a single page on web.dev. Prefer to take the median of all results or remove the outliers and take avg of the remaining tests.

B) Self hoisted Google lighthouse instance

Google lighthouse team has again done a great job here by providing a CI layer for self hoisting. The product is lighthouse CI.

This is an amazing tool that can be integrated with your CI Provider (Github Actions, Jenkins, Travis, etc) and you can configure it as per your needs. You can check the performance diff between two commits, Trigger Google lighthouse test on your new pull request. Create a docker instance of it, this is a way where you can control CPU availability to some extent and get consistent results. We are doing this at housing.com and pretty much happy with the consistency of results.

The only problem at present I see with this approach is It is too complex to set up. We have wasted weeks to understand what exactly is going on. Documentation needs a lot of improvement and the process of integration should be simplified.

C) Integrating Web Vitals

Web vitals are core performance metrics provided by chrome performance API and have a clear mapping with the Google lighthouse. It is used to track field data. Send data tracked to GA or any other tool you use for that sake. We are using perfume.js as it provides more metrics we are interested in along with all metrics supported by web vitals.

This is the most consistent and reliable among all the other approaches as It is the average performance score of your entire user base. We can make huge progress in optimizing our application by validating this data.


performance metrics tracked at housing

We worked on improving our Total Blocking Time(TBT) and the Largest Contentful Paint(LCP) after identifying problem areas. We improved TBT by at least 60% and LCP by 20%.


TBT improvements Graph

total blocking time improvement at housing.com

LCP improvements Graph

improvement in largest contentful paint graph housing.com

The above improvements were only possible because we were measuring things. Measuring your critical metrics is the only way to maintain the right balance between performance, conversion, etc. Measuring will help you know when performance improvement is helping your business and when it is creating problems.

Developers apply all sorts of tricks to improve their Google lighthouse scores. From lazy loading offscreen content to delaying some critical third-party scripts. In most cases, developers do not measure the impact of their change on user experience or the users lost by the marketing team.

Considering Google lighthouse suggestions

Lighthouse performance scores depend upon the three parameters

  1. How fast page rendered (FCP, LCP, Speed Index)
  2. Page Interactivity (TBT, TTI)
  3. Stability (CLS)

To improve your performance score, the Google lighthouse report provides tons of suggestions. You need to understand the suggestions and check how feasible they are and how much value those suggestions will bring to your website.

Let us take a few suggestions from each category of the Google lighthouse report and see what are the hidden cost of implementing them.

How fast page rendered (FCP, LCP, Speed Index)

Google Lighthouse suggests optimizing images by using modern image formats such as webp or avif and also resizing them to the dimension of the image container. This is a very cool optimization and can have a huge impact on your LCP score. You can enhance it further by preloading first fold images.

To build a system where images are resized on the fly or pre resized in many possible dimensions on upload is a tedious task. In both ways, depending upon your scale you might need to take a huge infra burden that needs to be maintained and also invest.

A better approach is to implement it on a single page for a limited image and track your most critical metrics like conversion, bounce rate, etc. And if you are really happy with the ROI then take it to live for all of your images.

Page Interactivity (TBT, TTI)

Google Lighthouse recommends reducing your Javascript and CSS size as much as possible. Javascript or CSS execution can choke the main thread and the CPU will be unavailable for more important stuff like handling user interaction. This is a fair idea and most people understand the limitation of js being single-threaded.

But Google took the wrong path here. In the upcoming version, the Google lighthouse will start suggesting the replacement of larger libraries with their smaller counterparts. There are multiple problems with this approach.

  1. Most libraries get larger because they solve more corner cases and feature requests. Why do people say webpack is tough because it handles so many edge cases that no other bundler handles. Imagine if webpack did not exist then half of us would have stuck in understanding the different kinds of module systems js supports.

    Similarly, the popular frontend frameworks are large because they handle too many things, from backward compatibility to more bugs. Jumping to a new library may cause issues like weak documentation, bugs, etc. So if you plan to pick this item get ready to have an expert developer team.

  2. It is highly unlikely that Google will recommend Preact to React because of the emotional attachment community has with the React framework. Doing this is unprincipled and unfair with the maintainers of projects whose community is not aggressive in nature.

  3. Google itself does not follow rules created by themselves. Most of the google products load way too much Javascript.

    A company which has the best resources around the world has never focused on their own lighthouse score but wants the entire world to take it seriously. There seems to be some hidden agenda of Google behind this like faster the web better is their ad revenue.

Google should learn from this famous quote

“Be the change that you wish to see in the world.”

- Mahatma Gandhi

Before taking any step to reducing javascript on your page like lazy loading off-screen components please calculate its impact on your primary metrics like conversion, user experience, etc.

Stability (CLS)

Every website must try to avoid any kind of layout shift which may cause issues in user experience. But there will be cases where you will not have many options to avoid CLS.

Let a website want to promote app downloads to users who have already not installed the app. Chrome has added support to detect if your app is already installed on the device(using getInstalledRelatedApps API) but this information is not available to the server on the first request.

What the server can do is make a guess and decide if it needs to append the app download banner on the page or not. If the server decides to add it and the app is already present on the device, the Download banner needs to be removed from the page, and similarly when the server decides to not include the download banner and the app is already not installed on the device it will be appended to the DOM on the client which will trigger Cumulative layout shift(CLS).

To avoid CLS you will remove the banner from the main layer of the page and show it as a modal, floating element or find some other way to show it, but what if you get maximum downloads when the banner is part of your page. Where will you compromise?

On a funny note, Most people have already experienced CLS on the google search result page.


google search result cls gif image


Conclusion

  1. Google Lighthouse is an awesome performance tool built by Google and can help you improve your website performance.
  2. There are many issues related to how Google lighthouse work and the consistency of the results.
  3. Devices with different configurations can give completely different scores so it is important to stick to a single device configuration while running a Google lighthouse process.
  4. The same device can give different scores based on how much CPU is available to the Google lighthouse process during the test.
  5. Using cloud solutions like web.dev is a better solution to get consistent results than running a Google lighthouse on your local machine.
  6. Running self hoisted service is better than cloud solutions because results in cloud solutions can get inconsistent based on the amount of traffic they are handling. Also, Google lighthouse settings can be better manipulated in a self-hosted environment.
  7. A self-hosted environment requires expertise and time because of limited resources and documentation but is very scalable and integrates very well with the most popular CI providers.
  8. Tracking real user data is the most reliable approach to track web performance. Google web vital or perfume.js is some of the lovely libraries to track real user data.
  9. Define critical metrics to your website like conversion, bounce rate, user experience, etc. Plan any optimization suggestion from the Google lighthouse after tracking the impact of it on your critical metrics.
  10. Never do premature optimization for the sake of a high Google lighthouse score. Simple lazy loading of offscreen components to reduce javascript size in some cases can reduce user experience so prefer caution while making such changes.
WRITTEN BY
Ashutosh Sharma
Engineering Manager @ Housing.com. I love learning and teaching. Web performance enthusiast.