Inflation and Inflation Expectations: A Distributional Mapping

RbiAnnouncementWeb

Asset Publisher

प्रकाशित तारीख सप्टेंबर 18, 2023

Inflation and Inflation Expectations: A Distributional Mapping

by R. K. Sinha ^{^}

The article analyses statistical characteristics of the Consumer Price Index-Combined (CPI-C) based inflation and inflation expectations datasets and identifies suitable statistical distributions for these. The identification of appropriate distributions facilitates in establishing a one-to-one mapping of these distributions. The mapping provides a conversion/correspondence of a data point from one dataset to another. These models have the potential to forecast inflation and are also potentially useful to measure Inflation-at-Risk (IaR).

Introduction

The CPI-C based inflation data is published by the Ministry of Statistics and Programme Implementation (MoSPI) together with granular-level data. One type of granularity is by-product item at the all-India level. Another is according to the product group and sub-group level according to States/Union Territories (UTs) and Regions (Rural/Urban). The Reserve Bank conducts the Inflation Expectations Survey of Households (IESH), which provides expectations of the respondents (surveyed households) on inflation for the near term. Such surveys are known for biases internationally, and accordingly, the levels of inflation expectations often differ from the realised inflation. Nevertheless, they have proved to be very useful for tracking the directional changes. Several recent studies (Das et al., 2019; Shaw, 2019; Muduli et al., 2022) have attempted to assess the inherent biases in such surveys and removed them to establish a meaningful comparison between inflation and inflation expectations.

In this article, we carry out a comparative study of the statistical characteristics of entire distribution of the datasets of actual inflation of MoSPI and inflation expectations ¹ of the surveyed households rather than just modelling and mapping the central tendencies of the two datasets. It may please be noted that comparing and modelling aggregate inflation/inflation expectation numbers often lose inherent information in the dataset, as these are just the derived numbers.

The article is divided into five sections. After the introductory section, the datasets of inflation and inflation expectations are described in the second and third sections, respectively. The fourth section connects the findings of these two sections through suitable mappings and suggests possible uses of it. The last section concludes the article.

II. Statistical characteristics of CPI-C based Inflation Dataset

The data on CPI-C based inflation (aggregate as well as granular level) is published by the MoSPI on a monthly frequency. Statistically, the mean of inflation of the aggregate and granular-level datasets of the same period should match closely, the standard deviation (SD) of granular data can be expected to be higher as compared to the SD of aggregate data, as aggregate data is a distribution of the mean of the granular data. The modal inflation of the aggregate data falls in the band of 5 per cent to 6 per cent, while it is in the band of 4 per cent to 5 per cent in the case of disaggregate data for the period January 2014 to June 2023. The greater variability in the granular data represents individual product level shocks, which can be favourable (bringing the aggregate level inflation towards target point) or adverse (moving away the aggregate level inflation from the target point). The lowest and highest inflation in the aggregate level data stand at 1.46 per cent (recorded in June 2017) and 8.60 per cent (recorded in January 2014), respectively during January 2014 to June 2023 (Chart 1).

Chart 1: Probability Density of CPI-C Granular and Aggregate Data

The distribution of inflation in the granular level has varied significantly across the months driven by the relative presence of extreme values. We attempt to analyse the statistical properties of the granular dataset ² over the period January 2014 to June 2023.

The disaggregated dataset of CPI-C may, initially, appear to have some characteristics of a normal (bell curve). ³ However, the dataset is found to be very leptokurtic i.e., having high peak than normal, with a kurtosis at 15.856. The distribution visually appears to be more-or-less symmetric, although has a mild positive skewness of 0.869. A best fit Normal distribution, viz., N (5.0430, 7.1185) is also plotted, demonstrating the nature of poor fitting with under-estimation at around central and extreme values, and compensating over-estimation in between (Chart 2). The underlying leptokurtic dataset has fatty tails with around 2.5 per cent of observations each in extreme parts, i.e., inflation lower than -10 per cent in the left tail and more than 20 per cent in the right tail, representing severe shocks (Chart 1).

As the normal distribution fails to explain characteristics of the dataset, we explore and search for other suitable statistical distributions, which may potentially explain the nature of this dataset. It is observed that no single statistical distribution explains the dataset adequately. Two best-fit distributions were identified as Cauchy (μ = 4.7930 and σ = 2.4758) and Laplace (μ = 5.0430 and λ = 0.1987) though they also do not fit the dataset appropriately (Chart 2 and Table 1).

Chart 2: Probability Density of CPI-C Disaggregate Data

It may be mentioned that the granular level dataset is composed of various product/sub-product groups across the regions (Rural/Urban) and States/ UTs leading to wide heterogeneity. Fitting of sub-sets of datasets by product categories, having larger heterogeneity than regions and States/UTs, indicate more precise modelling for some of the products. Also, we observe larger variations in the descriptive statistics of these subsets. For example, inflation of ‘cloth and footwear’ appeared to be closest to a bell curve (normal); inflation of ‘housing’ hovered in a tight spread (narrow range) over time ( Annex - Chart A1 and Table A1).

Table 1: Fitting of CPI-C Disaggregate Dataset through Suitable Statistical Distributions
Name of the Distribution	Estimated Parameters	(Kolmogorov – Smirnov Test (Critical Value at 5 per cent = 0.01944)	Rejection at 5 per cent level of significance
Normal	µ = 5.0430 σ = 7.1185	K-S Statistics = 0.12413	Yes
Cauchy	µ = 4.7930 σ = 2.4758	K-S Statistics = 0.03687	Yes
Laplace	µ = 5.0430 λ = 0.1987	K-S Statistics = 0.06548	Yes
Source: Author’s calculations.

If a single distribution fails to fit the underlying dataset appropriately, various studies have explored and demonstrated the use of mixture distributions, having potential to capture the characteristics of the dataset more appropriately. These mixture distributions can be constructed with or without the identification of a threshold, a particular value of the random variable. The threshold approach partitions the dataset into two parts, and the parts are modelled separately using different statistical distributions.

Several studies e.g., Cooray and Ananda (2005) and Scollnik (2007) used the Lognormal-Pareto model; Ciumara (2006) and Scollnik and Sun (2012) applied the Weibull-Pareto model; Nadarajah and Bakar (2014) suggested Lognormal-Burr model; to mix two distributions with a threshold. Other approaches suggest mixing of two statistical distributions across the entire distribution without any threshold, but that might have fixed or dynamic weights (mixing parameters). Frigessi et al. (2002) demonstrated a dynamic mixture model for the unsupervised tail estimation without estimating the threshold. The study used a Weibull-Pareto pair, which assigned a higher weight, starting from one, to Weibull at the left part of the distribution which is gradually reduced and tend to zero at the right tail of the distribution. All these studies demonstrated the same dataset viz., Danish fire loss data, a famous insurance dataset known for its heavy right tail.

Unlike the above dataset, which has only one possible heavy tail loss, as values are bounded at zero, the underlying inflation dataset has two clear tails, which have varied significantly across the months. We split the data into two parts with inflation at 4 per cent, as the threshold. These two parts of the dataset are modelled separately. However, we transform the data before the modelling as detailed below:

Let {x_L} and {x_R} are the data points of the initial granular dataset of inflation covering the distinct ranges (-∞ to 4 per cent) and [4 per cent to ∞), respectively. We define:

Both, X_L and X_R range from 0 to ∞ now.

We now fit the data and identify that 3-parameter Burr and 3-parameter Dagum ⁴ are the two distributions, which could explain the characteristics of the data appropriately for both the parts. We use Burr distribution in our case, and rest of the analysis is centered around Burr. Burr distribution is a versatile distribution and has been found to be suitable for many insurance datasets. Sastry and Sinha (2010) used a 4-parameter Burr distribution to describe Danish fire loss data and found it to be competitive to several mixture distributions, as proposed by some of the studies for this dataset, as mentioned earlier in this section.

The probability density function (pdf) of a 3-parameter Burr distribution is defined as:

Where, k (>0) and α (>0) are the first and second shape parameters, respectively; β (>0) is the scale parameter.

The distribution function (df) of a 3-parameter Burr distribution is defined as:

The descriptive statistics (DS) of each part of the distribution indicates resemblances of observed data and fitting by Burr (Table 2).

The fitting by using mixture distribution indicates stark improvement over the initial approach of using single distribution. Now the derived variables (X_L and X_R) are transformed back to the original variable (X) and the modelled probability density functions are proportioned into their respective weights and stitched together. This way, the derived single pdf from the synthetic pair of Burr (2.2685, 1.1129, 5.6631) and Burr (2.7037, 1.2135, 7.9923) adds to unity with appropriate weights ⁵ and explains the data in a much better way (Chart 3).

Table 2: Descriptive Statistics of two Parts of the Distribution
DS	Left Part (X_L)	Right Part (X_R)
Weight in Full Data	0.41015	0.58985
Mean	4.0977 (-0.0977 for X)	4.6112 (8.6112 for X)
Range	0 to 46.90 (-42.90 to +4.00 for X)	0 to 89.19 (+4 to +93.19 for X)
SD	5.4349	5.8530
Skewness	2.7690	4.1868
Kurtosis	11.933	31.462
Fitting of Burr Distribution
Parameters	k = 2.2685	k = 2.7037
	α = 1.1129	α = 1.2135
	β = 5.6631	β = 7.9923
K-S Statistics (Critical value at 5 per cent)	0.01839 (0.02611)	0.01381 (0.02384)
Source: Author’s calculations.

Chart 3: Fitting of CPI-C Disaggregate Data - Synthetic Burr

If a single distribution, whether derived on a standalone basis or through the mixing of distributions, fails to capture the characteristics of extreme tails adequately and precisely, its estimates of probabilities in the extreme tails are neither reliable nor usable, as it may be under or over-estimating these consistently. In such cases, the alternative solution is to model the extreme tails exclusively through the Extreme Value Theory (EVT) tools. In the current case, the synthetic Burr appears to fit well the entire curve including the tails (Chart 3). We examine the same statistically and explore if EVT tools would be a valuable addition in this context.

The distribution of excesses over a high threshold, say u, in the right tail of inflation is defined as:

The distribution of excesses represents the probability that inflation (X) exceeds the threshold inflation u by at most an amount x_R, where, x_R = x-u, given the information that X exceeds the threshold u. In terms of the underlying function, the same is as below:

The functions, F_u(x_R) and F(x_R + u), are the conditional and unconditional distribution functions, respectively. The function F(x_R + u) is equivalent to F(x), as x _R = x - u. The F(u) is the cumulative probability at the threshold “u”. The underlying distribution function may have an infinite right endpoint, i.e., it allows the possibility of arbitrarily very large inflation value with a very small probability.

Similarly, the distribution of shortfall over a lower threshold u in the left tail of inflation is defined as:

where, x_min is the smallest observation.

The distribution of shortfall represents the probability that inflation (X) falls short the threshold inflation u by at most an amount X_L, where, x_L = u - x, given the information that X falls short of the threshold u. In terms of the underlying function, the same is as below:

The EVT essentially considers the larger/smaller few observations of the dataset at the extreme ends and not the complete dataset. The EVT deals with conditional probabilities for example, what is the probability of inflation exceeding 25 per cent, given that it is more than 10 per cent. The challenge with EVT is determining the threshold level. Ideally, a higher threshold should be preferred. However, as the threshold increases, the modeller is left with a very small number of observations raising debatable issues on the reliability of probability estimates. Accordingly, there has to be an optimum level of threshold. There are a few standard techniques to determine the threshold statistically. One such technique is plotting the ‘Mean Excess Function’. The same is described below:

If u is the threshold, the mean excess function e(u) can be estimated by the sum of exceedances (or shortfall, in case of left tail of the distribution) over the threshold u divided by the number of data points exceeding the threshold u. In other words, the mean excess function indicates the expected overshoot of a threshold given that it exceeds the threshold.

For the right tail, an upward trend of the mean excess function may indicate heavy-tailed behaviour of the data; a horizontal line may suggest an exponentially distributed data, and a downward trend may indicate a short-tailed data. The underlying data may follow GPD, if the empirical mean excess function shows an upward trend, in particular, a positive gradient (upward sloping) straight line (McNeil, 1997). In our context, the mean excess plot of the data (Chart 4) is a clear upward sloping line exhibiting suitability for a Generalised Pareto Distribution (GPD) at various possible threshold points.

A threshold is chosen from inspecting the plot of mean excess. Accordingly, the mean excess function is computed for our dataset ⁶ . It is observed that the mean excess function more-or-less maintains linearity and does not diverge across the board. This indicates that GPD may be potentially an appropriate choice for fitting the exceedances ( Chart 4).

We explore and examine the appropriateness of GPD in our context and also compare it with our fitted synthetic Burr distribution. We find that the GPD is inferior to synthetic Burr, which has a poor fit at many thresholds (especially at lower u values) and has higher K-S Statistics values. The GPD appears to improve with the increase of threshold and converges with Burr though does not exhibit betterment over it. We demonstrate this for the right tail of the data (Chart 5).

We now move to the distribution of inflation in the aggregate data. As we observed earlier (Chart 1) that the distribution of CPI-C based aggregate inflation has much shorter tails, as compared to the granular level inflation distribution. The distribution is found to be almost symmetric and platykurtic (less peaked than normal). The Log-Pearson Type III distribution is identified to be the closest representation of the aggregate level inflation data (Table 3 and Chart 6).

Table 3: Descriptive Statistics and Fitting of CPI-C Aggregate Inflation
DS	Estimates	Fitting
Sample Size (N)	112	3-parameter Log – Pearson Type III α = 6.4118 β = - 0.1371 γ = 2.4545 K-S Statistics = 0.03671 (Critical value at 5% = 0.12832)
Mean	5.1039
Range	7.14 (1.46 to 8.60)
SD	1.5925
Skewness	0.0653
Kurtosis	2.4621
Note: The data period is from January 2014 to June 2023. The data of April 2020 and May 2020 are not included, which were imputed and published by MoSPI subsequently. Source: Author’s calculations.

Chart 6: Fitting of CPI-C Aggregate Inflation

The quantile-quantile (QQ) plot exhibits a straight line highlighting the apprpriateness of the 3-parameter Log-Pearson Type III distribution for the CPI-C headline distribution (Chart 7).

We also analyse the evolution of inflation distribution with the incoming of each incremental data point for the CPI-C aggregate data. ⁷

Evolution and Stabilisation of Statistical Moments of Inflation (January 2014 to June 2023)

The mean inflation⁸ witnessed a more-or-less consistent drop since the beginning of January 2014 till September 2019, touching a trough of 4.54 per cent, which rose gradually in the subsequent period to 5.10 per cent in June 2023. The Standard deviation (SD) of the distribution appears to be settling at around 1.6 per cent (Chart 8).

The skewness of the distribution dipped gradually in recent years towards zero-level, leading to a symmetric distribution. The distribution turned platykurtic (less peaked than normal) again in February 2020, just prior to COVID emergence, which remained leptokurtic (more peaked than normal) throughout December 2017 to January 2020. The unstable values of skewness and kurtosis during 2014 are due to the small sample size. Further, these do not appear to precisely converge given the current sample size (Chart 9).

Now, we explore the statistical properties of inflation expectations in the following section, which is sourced from the Inflation Expectations Survey of Households (IESH) conducted by the Reserve Bank. There are other sources of inflation expectations/forecasts such as the Survey of Professional Forecasts (SPF), which is also conducted by the Reserve Bank. We restrict the analysis to IESH in the current context, as mentioned in the introductory section.

III. Statistical Characteristics of the Inflation Expectations Dataset

The inflation expectations survey of households is a bimonthly survey, wherein qualitative and quantitative expectations on inflation are sought from around 6,000 households ⁹ in select cities in the urban areas. Here, we analyse only the quantitative inflation expectations of households, which are captured from the households for three-time points - current period, 3-month ahead period and 12-month ahead period. We consider the dataset for the IESH starting from March 2014 (Round 35) to May 2023 (Round 71B) including the two bimonthly surveys conducted every year in addition to four quarterly surveys. The granular (unit) level data on inflation expectations are also released by the Reserve Bank, in addition to web releasing the summary (aggregate) data. Accordingly, we analyse both the datasets, as carried out for CPI-C based inflation in the previous section.

Chart 9: Evolution of Skewness and Kurtosis

The households happen to generally report higher inflation than actual inflation. Further, there is a tendency to report higher inflation for 3-month ahead and further to 12-month ahead as compared to the current inflation (Table 4). We identify that the 4-parameter Burr distribution explains the IESH unit-level data appropriately. In the previous section, we identified that a mixture of two 3-parameter Burr distributions explains the distribution of CPI-C granular inflation well. From the IESH granular dataset, we observe that there has been a preference of households to report inflation in round numbers. This preference leads to the bunching of frequencies at round numbers and distorts the distribution. Further, unlike CPI-C granular inflation, the IESH granular inflation expectations have only one tail viz., right tail, as the lowest band (inflation less than one per cent) does not produce an extreme left tail, although it is unbounded theoretically, it is likely to be considered as between zero to one per cent by the respondents, which indeed appeared to contain a low frequency, barring 12-month inflation expectations dataset. The descriptive statistics and fitting of distribution are provided in Table 4 and Chart 10, respectively.

Table 4: Descriptive Statistics and Fitting of IESH Granular Data
DS	Dataset
DS	Current Period	3-month ahead	12-month ahead
Mean	11.765	13.150	13.849
Median	8.50	9.50	10.50
Range	99.50 (0.50 to 100.00)	98.50 (0.50 to 99.0)	99.50 (0.50 to 100.0)
SD	11.241	12.689	14.131
Skewness	3.1441	2.9822	2.7168
Kurtosis	15.916	14.001	12.163
Fitting	4-parameter Burr k = 0.2894 α = 8.0122 β = 9.8506 γ = - 4.7867 K-S Statistics = 0.09987 (Critical value at 5% = 0.13675)	4-parameter Burr k = 0.3447 α = 6.3927 β = 10.1440 γ = - 4.2926 K-S Statistics = 0.07008 (Critical value at 5% = 0.13446)	4-parameter Burr k = 0.9789 α = 2.4867 β = 11.8570 γ = - 2.0122 K-S Statistics = 0.08236 (Critical value at 5% = 0.13446)
Source: Author’s calculations.

Chart 10 exhibits the characteristics of respondents regarding their preference for round numbers, as mentioned, with round numbers in the multiples of 5, viz., 5, 10, 15, 20…..and so on. The distribution is found to be very (positively) skewed as well as very leptokurtic for all three datasets (each for current inflation, 3-month ahead inflation and 12-month ahead inflation). The 4-parameter Burr appears to pass the goodness of fit at 5 per cent though, a superior fitting may still be feasible possibly through a mixture distribution due to the inherent nature of round number preferences while responding at survey rounds. The same is not attempted in the current context though.

Now, we move to the fitting of IESH aggregate data. The summary statistics of the IESH data is released for two central tendencies (mean ¹⁰ and median). We identify that 4-parameter Johnson SB distribution ¹¹ tracks the distribution of mean inflation of IESH well. The descriptive statistics and fitting are exhibited in Table 5 and Chart 11, respectively.

Chart 11: Fitting of IESH Aggregate Datasets (Perception and Expectations)

Table 5: Descriptive Statistics and Fitting of IESH Aggregate Data (Mean)
DS	Dataset
DS	Current Period	3-month ahead	12-month ahead
N	51	51	51
Mean	9.284	9.961	10.051
Range	5.40 (7.30 to 12.70)	4.80 (7.90 to 12.70)	5.20 (8.30 to 13.50)
SD	1.246	1.258	1.171
Skewness	0.7234	0.2401	0.9043
Kurtosis	3.7781	2.2039	4.0463
Fitting	4-parameter Johnson SB γ = 5.0416 δ = 2.8226 λ = 27.7870 ξ = 5.1452 K-S Statistics = 0.08065	4-parameter Johnson SB γ = 0.3289 δ = 0.9778 λ = 6.0331 ξ = 7.3584 K-S Statistics = 0.06640	4-parameter Johnson SB γ = 3.0075 δ = 1.8969 λ = 15.1540 ξ = 7.2878 K-S Statistics = 0.08959
Fitting	Critical value at 5% = 0.18659 (same for all the datasets, as these are ungrouped datasets with same sample size)
Source: Author’s calculations.

The distribution of mean inflation of IESH is found to be (positively) skewed unlike the CPI-C aggregate inflation, which was found to be almost symmetric.

After studying and analysing the statistical properties of inflation and inflation expectations at the granular and aggregate level, we attempt to map these in the following section.

IV. Mapping of Distributions

The findings of section II and III reveal that the statistical moments of the distributions of various analysed datasets differ significantly from each other. The findings are summarized in Chart 12.

The mappings, as collated in Chart 12, provide an equivalence of distribution with the other. For example, a data point of IESH (aggregate) for current inflation following the 4-parameter Johnson SB (γ = 5.0416, δ = 2.8226, λ = 27.7870, ξ = 5.1452) has a correspondence with a data point of CPI-C (aggregate) following Log-Pearson Type III (α = 6.4118, β = - 0.1371 and γ = 2.4545). The functional relationship of two datasets can be used in many ways. A simple approach is to map through the cumulative distribution function (CDF). These are potentially useful as IESH is forward-looking, whereas CPI-C realised inflation is post-facto. Accordingly, the mapping has the potential to forecast inflation. The forecast for CPI-C inflation using IESH data can be possible under two mappings – direct and indirect, as below:

Direct mapping

It is based on aggregate numbers and does not use granular-level information. Let X and Y be random variables representing realised inflation and 3-month ahead inflation expectations of IESH, respectively, both at an aggregate level. The mapping of a particular value ‘y’ of the 3-month ahead inflation expectation to a value of x (of realised inflation) can be done by equating F_{Johnson SB} (y) with F _{log-Pearson III} (x). The steps to do this mapping is given as below:

For example, the result of the 72nd round of IESH, which was conducted during July 1-10, 2023, indicated a mean of 10.20 per cent for the 3-month ahead inflation (i.e., forecast for October 2023). Using the 4-parameter Johnson SB (γ = 0.3289, δ = 0.9778, λ = 6.0331, ξ = 7.3584) distribution of 3-month ahead inflation in IESH, we compute F_{Johnson SB} (10.20) = 0.58525. We compute x, by solving the equation, x = F^-1 _{log-Pearson III} (0.58525), wherein F^-1 is an inverse CDF. This provides an estimate for x = 5.43 per cent.

The above mapping could be an alternative to the traditional econometric models, which are commonly used to forecast inflation through forward-looking inflation expectations. Based on the above approach using the identified distributions with estimated parameters, we estimate the inflation for the months since March 2014, barring those months, wherein any of the two – inflation and inflation expectations are not available. We compare the estimates of inflation with realised inflation using this approach and a simple regression-based approach, an econometric tool, and observe that the proposed approach is quite competitive, which additionally provides valuable insights into the detailed profile of the datasets (Chart 13).

More complex mapping of distributions could be done using Copula ¹² functions, although they are not attempted in this article.

Indirect mapping

In direct mapping, one data point of one distribution is mapped with one data point of the other distribution, which is suitable for the aggregate dataset, as we have only one (aggregate) number per month for inflation expectations and one for the realised inflation. Instead of aggregating single numbers, we can establish a mapping between granular datasets of inflation expectations and realised inflation through many one-to-one mappings. These mappings could include the mapping of inflation at disaggregate level such as urban city (centre) of IESH versus corresponding State of CPI-C, etc. The indirect mapping could be complex and could be done in two stages, as below:

Chart 13: Scatter Plot – Estimated Inflation and Realised Inflation

In the indirect mapping, it is feasible to analyse the distribution of incoming (new) data of a month at a granular level and compare the historical distribution (covering several months) to identify changes in the shape of the distribution, in terms of changes in moments such as skewness, kurtosis etc. This is not possible in the aggregate, being a single number. We fit the 4-parameter Burr distribution to all rounds under study on an individual basis also to get estimates of parameters of the distribution as also the round-wise descriptive statistics (Annex Table A2).

We find that the mean and standard deviation of the granular level IESH data are positively correlated with the realised CPI-C inflation. The skewness and kurtosis are negatively correlated. Similarly, two parameters (α and β) appeared to be linked positively with the realised inflation, while the other two (viz., k and γ) parameters are inversely correlated. These indicators may play a useful role in econometric models as input variables aiding in forecasting inflation (Annex Table A3).

The identification of one-to-one mappings in stage 1, which exhibits closer co-movement, shall be a useful exercise. The same is not explored in the current article. The second stage of the indirect mapping is expected to be stronger, being part of the same dataset.

Inflation-at-Risk (IaR)

The above mappings may also be useful in assessing Inflation-at-Risk (IaR) ¹³ . As lower inflation (left tail) has not been a concern in the Indian context, we compute IaR at 95 per cent and 99 per cent for the CPI-C aggregate data using historical (observed) inflation and using inflation expectations from IESH.

The estimated distribution of CPI-C inflation i.e., Log-Pearson Type III (α = 6.4118, β = - 0.1371 and γ = 2.4545) corresponds to the inverse CDFs - F^-1(0.95) and F^-1(0.99) viz. the Inflation-at-Risks - IaR_0.95 and IaR_0.99 at 7.84 per cent and 8.84 per cent, respectively. The same using the estimated distribution of IESH i.e., 4-parameter Johnson SB (γ = 0.3289, δ = 0.9778, λ = 6.0331, ξ = 7.3584) for 3-month ahead inflation corresponds the inverse CDFs of F^-1(0.95) and F^-1(0.99) to IaR_0.95 and IaR_0.99 at 12.15 per cent and 12.70 per cent, respectively. Thus, the IaR_0.95 = 12.15 per cent and IaR_0.99 = 12.70 per cent of 3-month ahead inflation expectations have equivalence with the IaR_0.95 = 7.84 per cent and IaR _0.99 = 8.84 per cent of realised inflation, respectively. The detailed quantile mapping of IESH current and 3-month ahead inflation along with CPI-C inflation is provided for completeness ( Table 6). Similar computations could be carried out using granular level data under the indirect mapping approach, as discussed earlier.

Table 6: Quantile Mapping of IESH Inflation Expectation and CPI-C Aggregate Inflation
Quantiles	CPI-C Aggregate Inflation	IESH Current Inflation Perception	IESH 3-Month Ahead Inflation Expectation
0.01	1.78	7.05	7.73
0.05	2.55	7.52	8.07
0.10	3.04	7.82	8.33
0.15	3.39	8.04	8.56
0.20	3.69	8.22	8.76
0.25	3.95	8.38	8.95
0.30	4.18	8.54	9.14
0.35	4.41	8.69	9.32
0.40	4.63	8.84	9.50
0.45	4.84	8.98	9.69
0.50	5.06	9.13	9.87
0.55	5.27	9.29	10.06
0.60	5.49	9.45	10.26
0.65	5.72	9.62	10.46
0.70	5.96	9.81	10.68
0.75	6.22	10.02	10.90
0.80	6.51	10.26	11.15
0.85	6.84	10.56	11.42
0.90	7.25	10.95	11.74
0.95	7.84	11.56	12.15
0.99	8.84	12.83	12.70
Source: Author’s calculations.

V. Conclusion

The statistical properties of granular-level inflation and inflation expectation datasets remain important and can be analysed through suitable statistical distributions. This article attempts to map the datasets of survey-based inflation and actual inflation through their long-run statistical distributions, which appear to be an unexplored area of research. The variants of Burr distributions are found to be appropriate in explaining statistical characteristics of both the granular level datasets, viz. survey-based inflation expectations and the realised inflation. The aggregation of these datasets provides useful summary statistics such as headline inflation numbers.

As the survey-based inflation expectations are forward-looking and have been useful in forecasting inflation for the short-term for which econometric tools are widely used, the functional relationship through suitable statistical distributions derived in the article may facilitate short-term forecasting as a non-econometric tool. Further, the roun-dwise estimated parameters for the survey-based inflation expectations may also be used as an input to the suitable econometric models. The identified distributions can also be used to measure Inflation-at-Risk for the observed inflation and survey-based inflation expectation datasets.

References

Andrade, P., Ghysels, E. and Idier, J. (2012), “Tails of Inflation Forecasts and Tales of Monetary Policy”. Working Papers, Banque de France.

Carreau, J. and Bengio, Y. (2009), “A hybrid Pareto model for asymmetric fat-tailed data: the univariate case”, Extremes 12, 53-76.

Cooray, K. and Ananda, M. M. A. (2005), “Modelling actuarial data with a composite Lognormal-Pareto model”, Scandinavian Actuarial Journal (5), 321-334.

Das, A., Lahiri, K. and Zhao, Y. (2019), “Inflation expectations in India: learning from household tendency surveys”. International Journal of Forecasting, 35 (3), 980-993.

Frigessi, A., Haug, O. and Rue, A. (2002), “Dynamic mixture model for unsupervised tail estimation without threshold selection”, Extremes, 5, 219-235.

McNeil, A. J. (1997), “Estimating the tails of loss severity distributions using extreme value theory”, ASTIN Bulletin, Vol. 27, No. 1, 117-137.

Muduli, S., Nadhanael, G. V. and Pattanaik, S. (2022), “Assesing inflation expectations adjusting for households’ biases”, Monthly Bulletin, Reserve Bank of India, December.

Nadarajah, S. and Bakar, S. A. A. (2014), “New Composite Models for the Danish Fire Insurance Data”, Scandinavian Actuarial Journal, 2, 180-187.

Salido, D. L. and Loria, F. (2021), “Inflation at Risk”, Federal Reserve Board, September 08.

Sastry, D. V. S. and Sinha, R. K. (2010), “A Revisit to Danish fire loss data”, Conference Proceedings, 12th Global Conference of Actuaries (GCA), Mumbai, India.

Scollnik, D. P. (2007), “On composite Lognormal-Pareto model”, Scandinavian Actuarial Journal, Vol. 2007, Issue 1/2007, 20-33.

Scollnik, D. P. and Sun, C. (2012), “Modelling with Weibull-Pareto models”, North American Actuarial Journal, 16 (2), 260-272.

Shaw, P. (2019), “Using rational expectations to predict inflation”, Reserve Bank of India Occasional Papers, Vol. 40, No. 1.

Sinha, R. K. (2023). “India’s Steady State Equilibrium Inflation: A Revisit”, Monthly Bulletin, Reserve Bank of India, May.

Annex

Chart A1: CPI-C Granular Data (Product Group-Wise Fitting of Inflation)

Chart A1: CPI-C Granular Data (Product Group-Wise Fitting of Inflation) (Concld.)

Table A1: CPI-C Granular Data (Goodness of Fit)
DS		Best Fit	Goodness of Fit
1. Food and Beverages
Mean	4.72	Cauchy (μ, σ) μ = 4.455 σ = 3.420	K-S Statistics = 0.03111 Critical Value at 5 per cent = 0.01928
SD	9.46
Range	136.09
Skewness	0.8244
Kurtosis	10.5708
2. Pan and Tobacco
Mean	6.22	Johnson SU (γ, δ, λ, ξ) γ = - 0.2939 δ = 1.5054 λ = 5.6234 ξ = 4.8419	K-S Statistics = 0.03005 Critical Value at 5 per cent = 0.03314
SD	4.84
Range	53.85
Skewness	0.6025
Kurtosis	7.3770
3. Cloth and Footwear
Mean	5.65	Generalised Gamma (k, α, β, γ) k = 1.6752 α = 94.813 β = 3.5953 γ = - 48.695	K-S Statistics = 0.02143 Critical Value at 5 per cent = 0.0331
SD	3.32
Range	30.16
Skewness	0.0040
Kurtosis	3.5698
4. Housing
Mean	5.44	Burr (k, α, β, γ) k = 0.3489 α = 28.030 β = 20.683 γ = - 17.264	K-S Statistics = 0.02326 Critical Value at 5 per cent = 0.04537
SD	2.92
Range	19.38
Skewness	1.7588
Kurtosis	7.5862
5. Fuel and Light
Mean	5.45	Johnson SU (γ, δ, λ, ξ) γ = - 0.0883 δ = 1.4241 λ = 6.8561 ξ = 4.9099	K-S Statistics = 0.03855 Critical Value at 5 per cent = 0.02995
SD	6.30
Range	104.16
Skewness	0.2176
Kurtosis	7.8465
6. Miscellaneous
Mean	5.08	Johnson SU (γ, δ, λ, ξ) γ = -0.3685 δ = 1.6384 λ = 4.8154 ξ = 3.7645	K-S Statistics = 0.00780 Critical Value at 5 per cent = 0.02852
SD	3.69
Range	74.42
Skewness	0.5900
Kurtosis	6.3318
Source: Author’s calculations.

Table A2: IESH Data (Round-Wise Fitting of 3-Month Ahead Inflation)
Survey Rounds	Period of Survey	4-Parameter Burr Distribution (Estimated parameters)				Descriptive Statistics				Published Prints
Survey Rounds	Period of Survey	k	α	β	γ	Mean*	SD	Skewness	Kurtosis	IESH 3-month ahead Mean Inflation	IESH 3-month ahead Median Inflation	Realised CPI-C Inflation^#
35	Mar-14	0.3890	7.320	18.287	-9.292	17.601	14.993	2.486	10.233	12.3	12.9	6.77
36	Jun-14	0.2289	11.157	23.521	-16.494	20.748	19.468	2.112	7.531	12.5	14.0	5.63
37	Sep-14	1.0350	2.039	15.455	0.324	21.805	19.744	2.004	7.277	12.7	14.6	4.28
38	Dec-14	0.9008	2.974	8.637	-1.012	10.365	9.494	3.827	25.393	8.8	8.3	5.25
39	Mar-15	0.7021	3.756	8.626	-1.498	10.731	9.528	3.882	25.077	9.0	8.5	5.40
40	Jun-15	0.6444	4.283	10.484	-2.526	12.231	10.429	3.580	21.570	10.1	10.0	4.41
41	Sep-15	0.5529	4.307	10.101	-2.216	13.519	11.909	3.255	17.591	10.6	10.4	5.61
42	Dec-15	0.3618	6.313	10.005	-3.071	13.749	12.144	2.983	14.159	10.5	9.9	4.83
43	Mar-16	0.6304	3.272	6.807	-0.547	10.946	11.036	3.730	21.811	8.8	8.1	5.77
44	Jun-16	0.4702	5.158	10.030	-3.533	12.364	11.773	3.231	16.556	9.7	9.2	4.39
45	Sep-16	0.6900	3.805	9.991	-2.241	12.109	11.521	3.573	19.768	9.7	9.5	3.41
45B	Nov-16	0.5190	4.441	8.395	-2.288	11.356	11.504	3.709	21.040	9.1	8.2	3.65
46	Dec-16	0.6270	5.581	10.085	-4.180	8.843	7.692	4.331	32.604	7.9	7.3	3.89
47	Mar-17	0.6111	4.951	9.175	-3.145	9.477	8.723	3.982	24.903	8.2	7.5	1.46
47B	May-17	0.5241	6.050	9.976	-4.386	9.401	9.014	4.182	26.427	8.1	7.3	3.28
48	Jun-17	0.6939	5.146	9.604	-3.202	8.899	7.203	4.481	34.669	8.1	7.5	3.28
49	Sep-17	0.4139	7.261	9.362	-4.067	9.503	8.886	4.255	27.648	8.2	7.2	5.21
49B	Nov-17	0.3860	6.654	8.724	-3.494	10.304	10.210	3.773	21.608	8.4	7.5	4.44
50	Dec-17	0.4357	5.831	8.724	-3.502	10.159	10.146	3.736	21.247	8.4	7.5	4.28
51	Mar-18	0.4533	6.737	10.251	-4.550	9.967	8.848	4.042	26.851	8.6	7.8	4.92
51B	May-18	0.4994	4.642	8.454	-2.125	11.544	10.892	3.553	19.713	9.4	8.7	3.69
52	Jun-18	0.3841	6.238	10.144	-4.293	12.201	11.690	3.122	15.356	9.5	8.9	3.70
53	Sep-18	0.3098	6.523	9.215	-3.723	13.224	12.953	2.779	11.973	9.9	9.4	2.11
53B	Nov-18	0.3102	7.708	10.619	-5.227	12.166	11.315	3.025	14.568	9.7	9.0	2.57
54	Dec-18	0.3726	6.174	8.590	-3.194	11.263	11.273	3.522	18.303	9.1	8.2	2.86
55	Mar-19	0.4732	5.271	8.028	-2.320	10.162	9.210	3.986	25.473	8.7	7.8	3.18
55B	May-19	0.3909	6.603	8.869	-3.649	10.119	9.225	3.642	21.563	8.6	7.6	3.28
56	Jul-19	0.4151	5.723	7.895	-2.590	10.254	9.702	3.646	20.489	8.6	7.6	4.62
57	Sep-19	0.4767	4.941	8.098	-2.364	10.554	9.325	3.119	16.027	8.9	8.0	7.35
57B	Nov-19	0.2914	8.021	10.718	-5.315	12.540	11.751	3.018	14.512	9.8	9.2	6.58
58	Jan-20	0.2456	9.788	11.045	-6.055	12.192	11.575	3.088	15.106	9.5	8.6	NA
59	Mar-20	0.4305	5.954	8.876	-2.782	10.980	9.610	3.510	19.517	9.2	8.5	6.23
59B	May-20	0.1809	16.522	19.864	-14.896	14.411	13.375	2.654	11.462	10.6	10.4	6.69
60	Jul-20	0.1979	15.068	20.132	-14.933	14.735	13.551	2.582	11.019	10.8	10.5	7.61
61	Sep-20	0.1767	16.053	20.088	-15.110	15.261	14.410	2.492	10.252	10.8	10.4	4.59
61B	Nov-20	0.2505	8.788	11.589	-5.986	14.202	13.503	2.653	10.997	10.4	10.1	5.03
62	Jan-21	0.2865	7.708	10.275	-4.722	13.110	12.510	2.869	12.823	10.0	9.3	4.23
63	Mar-21	0.2244	10.097	12.130	-6.629	14.178	13.369	2.689	11.339	10.4	10.1	6.26

Table A2: IESH Data (Round-Wise Fitting of 3-Month Ahead Inflation) (Concld.)
Survey Rounds	Period of Survey	4-Parameter Burr Distribution (Estimated parameters)				Descriptive Statistics				Published Prints
Survey Rounds	Period of Survey	k	α	β	γ	Mean*	SD	Skewness	Kurtosis	IESH 3-month ahead Mean Inflation	IESH 3-month ahead Median Inflation	Realised CPI-C Inflation^#
63B	May-21	0.1680	15.823	19.861	-14.492	16.561	15.527	2.299	8.642	11.3	10.8	5.30
64	Jul-21	0.1335	23.541	26.553	-21.489	16.904	15.415	2.149	7.879	11.7	11.3	4.48
65	Sep-21	0.1571	16.348	19.905	-14.466	17.231	16.112	2.193	8.020	11.4	10.8	5.66
65B	Nov-21	0.2908	6.865	13.349	-6.371	17.914	15.994	2.141	7.856	11.9	12.3	6.07
66	Jan-22	0.1776	14.816	17.495	-12.027	15.480	14.241	2.474	9.884	11.1	10.6	7.79
67	Mar-22	0.1706	16.596	20.122	-14.744	15.628	14.173	2.312	8.772	11.1	10.7	7.01
67B	May-22	0.1419	23.030	26.925	-21.733	16.545	14.856	2.233	8.514	11.4	10.8	7.00
68	Jul-22	0.3039	7.324	10.986	-4.521	14.302	12.710	2.633	11.188	10.7	10.3	6.77
69	Sep-22	0.2155	11.932	17.219	-11.210	16.070	14.675	2.465	9.847	11.3	10.8	5.72
69B	Nov-22	0.2651	9.224	13.205	-6.982	14.432	12.714	2.691	11.788	10.9	10.4	6.44
70	Jan-23	0.3470	6.865	11.370	-4.566	13.917	11.883	2.747	12.329	10.8	10.5	4.70
71	Mar-23	0.3251	7.186	10.753	-4.299	13.504	11.829	2.937	13.665	10.5	10.2	4.87
71B	May-23	0.4260	5.205	9.225	-2.338	13.186	11.666	2.939	13.570	10.3	10.1	6.83
Note: *: Computed from the raw data (without incorporating any filter/trimming) and is unweighted. #: Realised CPI-C Inflation is 3-month ahead print from the month of the survey. NA: The CPI-C inflation for April 2020 (3-month ahead from Round 58) was not initially published by MoSPI, which was imputed and published subsequently. Source: Author’s calculations.

Table A3: Correlation Matrix of IESH Indicators (with Published CPI-C Inflation)
	Realised CPI-C Inflation	k	α	β	γ	Mean*	SD	Skewness	Kurtosis	Published Mean	Published Median
Realised CPI-C Inflation	1.000
k	-0.378	1.000
α	0.406	-0.774	1.000
β	0.433	-0.577	0.894	1.000
γ	-0.414	0.738	-0.967	-0.954	1.000
Mean*	0.424	-0.380	0.514	0.766	-0.600	1.000
SD	0.367	-0.408	0.522	0.749	-0.603	0.982	1.000
Skewness	-0.496	0.591	-0.625	-0.746	0.655	-0.927	-0.919	1.000
Kurtosis	-0.474	0.628	-0.586	-0.672	0.602	-0.878	-0.888	0.983	1.000
Published Mean	0.482	-0.429	0.524	0.751	-0.599	0.976	0.937	-0.940	-0.900	1.000
Published Median	0.426	-0.271	0.396	0.695	-0.502	0.975	0.932	-0.878	-0.825	0.975	1.000
Source: Author’s calculations.

^{^} The author is from the Monetary Policy Department (MPD). The views expressed in this article are those of the author and do not represent the views of the Reserve Bank of India.

¹ The comparative study is also possible using the information on inflation expectations of other respondents such as professional forecasters. The same is not explored in this article.

² The study considers 22 larger States/UTs, which have individual weights of more than 0.25 per cent in the CPI-C basket. These States collectively cover 98.30 per cent of CPI-C basket and have greater level of granularity as compared to the set of smaller States.

³ The probability density function of the disaggregate and aggregate inflation data would be different though the central tendency derived from these two datasets would be comparable. However, other statistical moments of the data (viz., standard deviation, skewness and kurtosis) may differ significantly. For example, the standard deviation of the granular data would be higher than that of the aggregate data.

⁴ Dagum distribution is the inverse of Burr distribution, which is used to fit heavy tailed distributions.

⁵ The appropriate weights of the left and right tails are derived from the observed data at around 0.41 and 0.59 (Table 2).

⁶ We use probability weighted counts of observations in our context to reflect varied and appropriate weights of the products/sub-products. This is unlike many other studies, wherein simple counts are applied to derive mean excess function, which is meaningful in loss distributions such as Danish fire loss dataset, demonstrated by many studies.

⁷ The evolution and stabilisation of granular-level data of CPI-C based inflation was discussed in the study (Sinha, 2023) covering the pre-COVID and post-COVID periods with additions of half-yearly data. The study found a consistent right-ward shift in the distribution in the post-COVID period.

⁸ The mean inflation at a month (t) is a simple average of inflation starting from January 2014 to the month (t).

⁹ The number of surveyed households has increased in recent times with the introduction of new cities/centres for the survey.

¹⁰ The published mean of the IESH may not match with the mean computed from the granular level data due to various aggregation and methodological issues.

¹¹ Johnson System Bounded (SB) distribution is a system of curves for bounded data, which can be transformed to an approximately normal distribution through an appropriate transformation function. Similarly, Johnson System Unbounded (SU) distribution is a system of curves for unbounded data, which is found to be suitable in this study for select product groups in the CPI-C granular dataset ( Annex Table A1).

¹² Copula is a multivariate cumulative distribution function for which the marginal (individual) probability distribution of each variable is uniformly distributed in the interval [0, 1]. It models the dependence (inter-correlation) structure between random variables and is widely used in financial datasets.

¹³ There are different approaches to define and measure Inflation-at- Risk (IaR). IaR was originally introduced by Andrade et al. (2012) to assess the risks to the inflation outlook. The study constructed a Value-at-Risk (VaR)-type measure of tail risk for inflation using survey-based conditional density forecasts. They observed that the magnitude and the asymmetry of inflation risks varied over time. A recent study by Salido and Loria (2021) used a probability-type measure for IaR. It also highlighted importance of skewness in the IaR. In the current context, we use a simple VaR-type measure for IaR.