DeveloperJoy Logo DeveloperJoy

How I improved the loading time of a MySQL table with +20M rows

April 5, 2024
How I improved the loading time of a MySQL table with +20M rows

At MetricsWave we had a table in which we recorded all the visits in MySQL. Right now MetricsWave is receiving around 3M visits per month, so this table is growing very quickly lately and it has been noticed in the user experience.

Luckily, the availability of the service has not been affected at any time, but the dashboard loading time could sometimes exceed 30 seconds, and sometimes it even failed and forced you to refresh the screen. page several times.

Furthermore, all of this was being run on a single EC2 instance (a t4g.small, to be exact) and, furthermore, it was all together, both the basic user data, such as visits, the frontend and the backend. This should not be a problem today, but it is a matter of time.

A little context

Before getting to work, let's see how we are saving the data.

Visits Table - Data structure

A visit is, more or less, this in the database.

They are pre-calculated data that allow us, if we want to generate a graph, to do the query in a simple way and directly return the data to the front without any transformation.

Visits Table - Query example

Using the correct indexes, the data would be accessible and to save we simply have to search or create the indicated record and increment the score.

The data structure, in general, was good. There was some index that could be improved, but little else. The problem, having ~25M records in a single table made it very difficult to change the structure and add or remove any indexes.

With this basis, the reading speed had to be improved, without increasing the server. It was very easy to increase the size of the machine, but for now, it is something I want to avoid.

First step, make the data manageable

Looking at the app usage data, I noticed that most of the queries users make are for the most recent data. The last 7 days, 2 weeks or a month, they rarely query data from 1 year ago or earlier.

So the decision was clear, the new data has to be more at hand. We start with a partition.

We are going to put the recent data in a "hot" table and we keep the old data as history. Right now MetricsWave doesn't offer any option to import data from previous visits, so if you think about it, old tables are going to be read only.

In the "hot" table with current data, more will be written and read, but since there is less data, access is instantaneous.

Queries, and endpoints

Another problem that I saw in the app is that, many times, more data than necessary was being consulted.

When the data was requested to see a certain parameter, for example the origin of the visits, other nearby parameters were also returned, such as the country, the browser, or the destination page.

Obviously, this is unnecessary and only slows down data access.

Settings in the app

Obviously, by having the data separated in two different databases, we now have to change how the backend queries this data.

We have to be able, depending on the date range that is being requested, we have to be able to go to one DB or another, or both, and mix the data when giving a response.

There is some extra work in the backend, but right now the project is ready to work and mix visit data in several different databases, so, in a year, if the table continues to grow, scaling is very easy.

← Go back to the blog

Work with me

Do you own a company or need help with your Laravel project? I can help you with that. Check the plans and let me know if you have any questions.

Get 1 month free with yearly plan during checkout.

Technical Advisor

$ 3 ,995

/m

What's included:

  • Lead your team to a better code architecture, testing mentality, clean code, and more.
  • Lead knowledge-sharing sessions depending on current company status and requirements.
  • Help with product and technical decisions.
  • Pause or cancel anytime.

Integrated Senior Developer

$ 5 ,995

/m

What's included:

  • Includes all Technical Advisor services.
  • Engages as a team member in daily tasks.
  • Participates actively in day-to-day development.
  • Direct communication:
    Slack, and meetings included.
  • Pause or cancel anytime.

Want to talk first?

Ok, just book a call now.

FAQ