Infrastructure auto-scaling: a concrete case of Time Series Forecasting

Hey! This is my first article on this blog! I hope you will like it. If you don’t please let me know what I can improve in the comments below.

In this article I will talk about time series forecasting and how this can help us auto-scale our web infrastructure. Although the proposed approach is quite simple, we will see that this approach is interesting for auto-scaling infrastructure.

The first section of this article is a little bit theoretical: I will talk a little bit about the math hidden behind time series modeling. The second section is about the usual approaches of time series forecasting, and the pros and cons. The third section is about infrastructure auto-scaling and the last section is about the theorical approach I propose. In this section I use a quite simple machine learning implementation to perform server request hits forecasting for infrastructure auto-scaling.

I) Time Series – Mathematical Modeling

First of all, let’s give a simple definition of time series: A time series is a set of data points sampled at equally spaced times (Fig.1).

Figure 1.

This can be, for example, a dataset of subway passengers per hour, a dataset of beer consumption in the neighborhood pub per day (beer), a dataset of server request hits per hour (in our case) and so on. When modeling a time series, we consider that: it may have a progressive trend over time (increasing/decreasing) and/or a seasonal pattern (the signal seems to repeat every week or every month, for instance) with eventually, on top of that, a signal noise slightly altering the signal.

Mathematically, a time series can be modeled as follows:

Equation 1.

Where: T is the trend component, S is the seasonal component and N is the noise component.

Time series can be modelled as the addition, the multiplication or a mix of addition and multiplication of these three components. The trend and the seasonal components of a time series can be estimated using the method of the least squares. The noise can be attenuated in the first place using methods such as the moving average filter, this helps estimate the trend and seasonal components.

More accurate methods exist, such as the ARIMA method, but they tend to be more difficult to grasp. If you want to know more about time series analysis and forecasting, please have a look at this course: here [1].

You might wonder: why modelling a time series? Apart to discover the structural behavior of it, this is a serious help to forecast new upcoming values of the series, and we will see how in the next section.

II) Time Series Forecasting

Forecasting a time series is a good way to approximately know how many passengers there will be in the subway in one hour, or, how many pints people are going to drink tonight in your favorite pub. All you need to have is the right data, and to properly model the time series …

The three most common approaches to time series forecasting are:

  1. Exponential smoothing
  2. Regression techniques
  3. ARMA/ARIMA forecasting

Exponential Smoothing is the only approach above that does not require a math modelization in the first place.  All you need to know is the k last data points and an alpha parameter:

Equation 2.

Where:  alpha is the smoothing factor and is a value between 0 and 1 and x are raw data points. S are smoothed next values of the Time Series. Although it does not require a modelization, we need to spend time estimating the alpha parameter. This is a rather coarse technique, putting more emphasis to the most recent data points. Besides, it cannot generalize to multivariate cases (ex: if the seasonal component depends on two nested periodicities: weeks and months).

With Regression, the underlying model is the explicit signal behind the time series (minus the residual noise component). Theoretically, this permits to forecast the t+n value as accurately as the t+1 value. Depending on the signal, T and S might be easy to find and there might be more or less parameters to figure out. In any case, that a lot of things to determine!

ARMA/ARMIA methods better fit time series and can generalize to multivariate cases. However, these methods are really not intuitive and quite complex to put up.

With methods mentioned above, you will have to re-estimate the parameters the hard-way, if the prediction no longer matches the behavior of your dataset. This is likely to happen each time the phenomenon you are observing is subject to an abrupt structural changes.

Why not let a recurrent Neural Network learns the pattern of the time series the best way each time we need to? In section IV), that’s exactly what I propose to do in order to forecast the number of server requests.

III) Infrastructure autoscaling

Whether you host a RESTful API, a simple website or a any other kind of server you need a proper web-server infrastructure to handle the requests. The “scaling” of this infrastructure is directly dependent on how many requests you think you will have to process (per second). Having an oversized infrastructure means you will throw money down the drain. On the contrary, having an undersized infrastructure means your users won’t be happy by the slowness of your service, and might not use your service any longer in the future. Auto-scaling the infrastructure when needed allow to save resources when possible while offering a high-quality web service. And this without a human intervention. Note that cloud auto-scaling is a green technology [2].

There are two types of infrastructure scaling: vertical scaling and horizontal scaling. The first approach consists of upgrading the hardware of your server nodes (more CPU and/or more memory) whereas the second approach consists of adding new server nodes. With multiple nodes, a load-balancer is used to evenly distribute the requests towards them. Major cloud computing providers such as Amazon Web Services (a.k.a AWS) and Microsoft Azure are already providing auto-scaling solutions. With AWS Autoscaling you can choose amongst predefined scaling strategies or define a custom one. With Microsoft Azure you have built-in rules you can use or you can define your custom rules based on any metrics you think relevant.

In this article I will solely base my approach on one metric: the number of request hits, with the following assumption: one request hit = one well-defined amount of resource consumption. You can have a different way of seeing it and define the maximum number of requests a node of your infrastructure can handle in one hour. In practice, I will use this forecasting algorithm to guess, one hour before, if it is required or not to scale up or down the infrastructure for the coming hours. Furthermore, you are probably facing a pattern anomaly if the number of request hits for the current hour is significantly larger than the forecasted number of hits you had for the same period; In this case, you can let the rate-limiting handles this and not auto-scale the infrastructure (someone is probably using too much your service!). Note that, if too many pattern anomalies occur, it may mean you need to re-train your model with fresher data. Keep in mind that the general request hit pattern of your service is directly impacted by higher level changes: a new product release, a new country support added to your product, and so on. So you constantly need to re-train your model otherwise you will keep getting anomalies and the auto-scaling won’t do its job correctly. This deviation can be measured with the Mean Absolute Percentage Error measure (MAPE) every week or two to know if you need to re-train your model in an emergency. Nevertheless, you should re-train your model at least every month to ensure it is kept updated.

As an alternative to what I proposed above, this forecasting algorithm could be used to detect “Slashdot Effects” due to a major news site citing your product, for instance.

IV) Approach and results

The Machine Learning model tested for this approach is quite simple: it is a neural network made of one LSTM layer followed by one dense layer. As mentioned above, neural networks have one huge advantage: no tricky parameters are required to estimate, such as those of the ARIMA or ARMA models. The only parameters required to be estimated are those of the network, and they can be estimated through a grid search. This makes the update of this model pretty easy and fast. You can find the entire code here [4]. I tested this approach on the NASA HTTP requests log dataset, that you can find here [5]. A first analysis of the request hits pattern on a daily basis gives us the results showed in Fig.2.

Figure 2: Daily basis request hits analysis.

As you can see, people less use the server in the morning than during day time (right side of Fig 2.): the number of request hits is progressively increasing from 5 AM to 4 PM, then, the number of request hits is decreasing afterwards to reach 1000 request hits on average at 5 AM. A second analysis of the request hits pattern on all he dataset (1400 hours) gives us the trends showed on Fig 3.

Figure 3: Broad request hits analysis.

As you can see, a clear trend and seasonal components can be observed (left side of Fig.3). Besides, two cyclic phenomena seem nested: series of high rises and falls are followed by fewer lower rises and falls: there is a day of week effect combined to a daily basis effect. We cannot  see a clear pattern in the amount of transferred data though (right side of Fig 3.). Still, this information might give us additive information during the training.

These two analyses suggest that we have identified clear seasonal and trend components: a daily basis and a day of week seasonal pattern, as well as a rather non-increasing and non-decreasing monotonous trend. The forecasting model has been trained on the first 80% of this dataset and the model has been tested on the last 20% of the dataset (Fig 4.).

I modelled a multivariate time series. This means for each sample there are several features: the day of the week, the amount of transferred data and, of course, the number of request hits per hour. The sample shape is 3×3: there are 3 features for a time window of 3 (the three latest observations are analysed to predict the next observation).

Figure 4: Predictions vs ground truth

As you can see Fig 4 the prediction curve is quite close to the ground truth curve (the ground truth is in green). The difference between the two curves is relatively small (right side of Fig 4.): the mean of the difference is -47 and the standard deviation is 400. It means that in average the model predicts a less important number of request hits per hour on the test dataset (conservative behavior). However, at some points of the difference curve, there are pretty bad predictions: this might be due to the fact the test dataset may contain event related peaks or “Slashdot Effects”. Nevertheless, the performance of this model could be greatly improved with a parameter grid search.

V) Conclusion

As you can see, training a model to forecast the number of request hits for the coming hours is pretty easy. As mentioned in this article, this can be useful, for instance, to perform infrastructure auto-scaling, outlier detection (DoS attacks, scraping and so on) or even “Slashdot Effect” detections. This article covers the theoretical aspects of time series forecasting with the goal of predicting the number of request hits and perform auto-scaling afterwards. I did not talk much about all the different strategies one can think of to auto-scale the infrastructure neither I talked much about how this forecasting algorithm could be integrated into your infrastructure. These more technical topics would need a new complete article to be covered in more details. Hope you enjoyed!

[1]: Time series forecasting course:

[2]: “Model-driven auto-scaling of green cloud computing infrastructure”:

[3]: Project github repository:

[4]: NASA HTTP request logs:

Paul.B Écrit par :

Soyez le premier à commenter

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *