DOCUMENT DE TREBALL XREAP2011-10 Mixture of bivariate Poisson regression models with an application to insurance Lluís Bermudez (RFA-IREA) Dimitris Karlis Mixture of bivariate Poisson regression models with an application to insurance ∗ Llu´ Berm´deza†& Dimitris Karlisb ıs u July 13, 2011 a Risc en Finances i Assegurances-IREA, Universitat de Barcelona, Spain b Athens University of Economics and Business, Greece Abstract In a recent paper Berm´dez [2009] used bivariate Poisson regression models for ratemaku ing in car insurance, and included zero-inflated models to account for the excess of zeros and the overdispersion in the data set. In the present paper, we revisit this model in order to consider alternatives. We propose a 2-finite mixture of bivariate Poisson regression models to demonstrate that the overdispersion in the data requires more structure if it is to be taken into account, and that a simple zero-inflated bivariate Poisson model does not suffice. At the same time, we show that a finite mixture of bivariate Poisson regression models embraces zero-inflated bivariate Poisson regression models as a special case. Additionally, we describe a model in which the mixing proportions are dependent on covariates when modelling the way in which each individual belongs to a separate cluster. Finally, an EM algorithm is provided in order to ensure the models’ ease-of-fit. These models are applied to the same automobile insurance claims data set as used in Berm´dez [2009] and it is shown that the u modelling of the data set can be improved considerably. JEL classification: C51; IM classification: IM11; IB classification: IB40. Keywords: Zero-inflation, Overdispersion, EM algorithm, Automobile insurance, A priori ratemaking. 1 Introduction In a recent paper Berm´dez [2009] describes bivariate Poisson (BP) regression models for u ratemaking in car insurance. The central idea is that the dependence between two different types of claim must be taken into account to achieve better ratemaking. BP regression models are presented, therefore, as an instrument that can account for the underlying correlation between two types of claim arising from the same policy (i.e. third-party liability claims and all other automobile insurance claims). The paper concludes that even when there are small correlations between the claims, major differences in ratemaking can nevertheless appear. Thus, using a BP model results in ratemaking that has larger variances and, hence, larger loadings in premiums than those obtained under the independence assumption. ∗ Acknowledgements. Research Finance and Insurance” Research † Corresponding Author. tat de Barcelona, Diagonal 690, lbermudez@ub.edu for this paper was initiated while the second author was visiting the “Risk in Group at the University of Barcelona Departament de Matem`tica Econ`mica, Financera i Actuarial, Universia o 08034-Barcelona, Spain. Tel.:+34-93-4034853; fax: +34-93-4034892; e-mail: 1 The paper also includes zero-inflated bivariate Poisson (ZIBP) models so as to inflate the (0,0) cell and to account for the excess of zeros and overdispersion typically observed in this type of dataset. This produces the best goodness of fit among the bivariate Poisson models considered. In conclusion, the independence assumption should be rejected when using either BP or ZIBP regression models, but one question still remains unresolved: do ZIBP models constitute the best option for dealing with overdispersion? The aim of the present paper is to examine this question further by considering alternative bivariate models that might account for these features of the data, i.e. the excess of zeros and overdispersion. In the univariate case, Lambert [1992] introduced the zero-inflated Poisson regression model. Since then, there has been a considerable increase in the number of applications of zero-inflated regression models based on several different distributions. A comprehensive discussion of these applications can be found in Winkelmann [2008] and a specific application to insurance ratemaking is addressed in Boucher et al. [2007]. Zero inflated negative binomial regression models have been also described as for example in Wang [2003] and Garay et al. [2011]. See again Winkelmann [2008] for a description of a variety of such models and Denuit et al. [2007] for an exhaustive review of the models used in ratemaking systems for automobile insurance. In the bivariate (or multivariate) case, the literature analysing the excess of zeros and overdispersion is less developed. For example, zero-inflation in the bivariate case is examined in Gurmu and Elder [2008], Karlis and Ntzoufras [2003] and the references therein, while in the multivariate case it is analysed in Li et al. [1999]. Recently, in the actuarial literature and for ratemaking purposes, Berm´dez [2009] and Berm´dez and Karlis [2011] deal with the bivariate and multivariate u u versions of the zero-inflated Poisson regression models, respectively. They tackle overdispersion via the excess of zeros, i.e. zero-inflated models. A natural approach for accounting for overdispersion is to consider models with some overdispersed marginal distribution, as opposed to bivariate Poisson models. In this paper we consider an m-finite mixture of bivariate Poisson regressions (m-FMBP) extending the no-covariate cases presented in Karlis and Meligkotsidou [2007]. This model has a number of interesting features: first, the zero-inflated model represents a special case; second, it allows for overdispersion; and, third, it allows for an elegant interpretation based on the typical clustering application of finite mixture models. To the best of our knowledge, this model is new to the literature, so in what follows we seek to explain its properties as well as to discuss appropriate estimation approaches. The rest of of the paper proceeds as follows. The new models are described in the next section, followed by the development of an EM algorithm for parameter estimation. The models are then applied to the same data set as in Berm´dez [2009]. Finally, we conclude with some u remarks. 2 2.1 The proposed model A bivariate Poisson distribution Consider random variables Xk , k = 1, 2, 3 which follow independent Poisson distributions with parameters λk ≥ 0, respectively. Then the random variables Y1 = X1 + X3 and Y2 = X2 + X3 jointly follow a bivariate Poisson distribution, BP (y1 , y2 ; λ1 , λ2 , λ3 ), with joint probability 2 function given by PY1 ,Y2 (y1 , y2 ) = P (Y1 = y1 , Y2 = y2 ) = e −(λ1 +λ2 +λ3 ) λy1 λy2 1 2 y1 ! y2 ! min(y1 ,y2 ) ( ∑ s=0 y1 s )( y2 s ) s! ( λ3 λ1 λ2 )s . The above bivariate distribution allows for dependence between the two random variables. Marginally each random variable follows a Poisson distribution with E(Y1 ) = λ1 + λ3 and E(Y2 ) = λ2 + λ3 . Moreover, Cov(Y1 , Y2 ) = λ3 , and hence λ3 , is a measure of dependence between the two random variables. If λ3 = 0 then the two variables are independent and the bivariate Poisson distribution reduces to the product of two independent Poisson distributions (also known as a double Poisson distribution). For a comprehensive treatment of the bivariate Poisson distribution and its multivariate extensions the reader is referred to Kocherlakota and Kocherlakota [1992] and Johnson et al. [1997]. For greater flexibility, we can assume a bivariate Poisson regression model where each of the parameters of the BP is related to some covariates through a log link function, i.e. by assuming log λki = β T xki , k = 1, 2, 3, i = 1, . . . , n k where xki is a vector of covariates for the i-th observation related to the k-th parameter and β k is the associated vector of regression coefficients. Note that x does not need to be the same for all the parameters. Likewise note that according to Karlis and Ntzoufras [2003], it is perhaps a good idea not to use the same covariates in all the parameters since this may lead to problems in their interpretation. For example, since the marginal mean for Y1 is λ1 + λ3 using the same covariates in both may create problems of interpretation especially if the signs of the regression coefficients differ. R package bivpois can be used to fit this model based on an EM algorithm. In this model, and as the marginal distributions are Poisson, we assume that the marginal means and variances are equal. Moreover, we assume that the correlation is positive. Therefore, there we need to consider extensions to allow for overdispersion (variance greater than the mean) and a possible negative correlation. 2.2 Mixed bivariate Poisson models A natural way to allow for overdispersion is to consider mixtures of a simpler model. This is best achieved in the univariate setting by moving from the simple Poisson model to the negative binomial model. Such an approach while applicable in the bivariate setting, is not so widely used here, primarily because there is no one way of doing so and, hence, questions of ease and interpretation acquire greater importance. Mixtures of BP distribution can be considered in at least two different ways. In the first we start with a BP (aλ1 , aλ2 , aλ3 ) distribution where a follows some distribution. We can assume λ3 = 0 which makes the calculation much easier and assumes that all the correlation comes from the common a. If λ3 > 0 then the correlation is twofold, due to λ3 (known as an intrinsic correlation) and due to the common a. This complicates the interpretation of the parameters. A natural assumption in this case is that E(a) = 1 so a does not inflate the means. This is a very typical extension of a simple mixed Poisson regression models. One drawback, however, is that the model only allows a positive correlation. The literature on this approach includes the works of Stein et al. [1987], Stein and Yuritz [1987] and Kocherlakota [1988] for the case without covariates. Munkin and Trivedi [1999] described multivariate mixed Poisson regression models 3 based on this type of mixing and a gamma mixing distribution. Gurmu and Elder [2000] used an extended gamma density as a mixing distribution. This approach also has a random-effect representation if covariates are used. This assumes that Y1i , Y2i ∼ BP (λ1i , λ2i , λ3i ) log λki = β T xki + ui , k = 1, 2, 3, i = 1, . . . , n k ui ∼ G(u) where ui is the random effect associated with the i-th observation, common to all the parameters. In fact this approach is equivalent to a frailty model. In the second case, we start with a BP (a1 λ1 , a2 λ2 , a3 λ3 ) distribution, but now the a’s are different. We need to assume that they jointly follow a trivariate (or bivariate if we assume that λ3 = 0 ) distribution. Clearly such a construction is much more complicated and, in practice, not especially useful. The case when λ3 = 0 has received attention primarily because it can induce negative correlation between counts. Steyn [1976] proposed the use of a bivariate normal distribution as the mixing distribution. Some years later, Aitchinson and Ho [1989] proposed using the bivariate lognormal distribution instead of the simple bivariate normal distribution. For a Bayesian application of this distribution see Chib and Winkelmann [2001]. To put it in a random effect format the above model is equivalent to assuming Y1i , Y2i ∼ BP (λ1i , λ2i , λ3i ) log λki = β T xki + uki , k = 1, 2, 3, i = 1, . . . , n k u1i , u2i , u3i ∼ G(·) where now G(·) is a trivariate distribution and, hence, the random effects are different, albeit related, for each parameter. Again for purposes of identifiability, it must be assumed that the expectation for each random effect is 1. In both of the above models the specification of the random-effects distribution G(·) can be a continuous, a discrete or a finite distribution. In the paper, we consider the latter case assuming that the joint distribution for the random effects is a finite distribution, i.e. the case in which only a finite number of points have positive probabilities. Such an assumption gives rise to finite mixture models, which are very popular in a range of disciplines. These models, i.e. finite mixtures of multivariate Poisson distributions, have been described in Karlis and Meligkotsidou [2007]. The novelty of our approach lies in the fact that we assume different regression lines for each component in the mixture, extending the finite mixture Poisson regression model of Wang et al. [1998] (see Grun and Leisch [2007] for the implementation of models of this type) in two dimensions. Thus, in the next section we introduce the finite mixture of bivariate Poisson regressions. 2.3 The finite mixture of bivariate Poisson regressions Let the θ = (λ1 , λ2 , λ3 ) denote the vector of parameters. We define as an m-finite mixture of bivariate Poisson distributions the distribution with joint probability function P (y1 , y2 ) = m ∑ j=1 pj BP (y1 , y2 ; θ j ) 4 where pj > 0, j = 1, . . . , m are the mixing proportions with m ∑ j=1 pj = 1 and θ j are the component-specific vectors of parameters, namely θ j = (λ1j , λ2j , λ3j ). In the sequel the first subscript denotes the parameter and the second the component, while if we require a further subscript to indicate the observation we will use a third one. In this mixture model, the marginal expectations are given by E(Yk ) = m ∑ j=1 pj (λkj + λ3j ), k = 1, 2, while its variance covariance matrix of Y = (Y1 , Y2 )T is given by   T   m m m ∑ ∑ ∑  V ar(Y ) = A  pj Σj −  pj θ j   pj θ j   A T , j=1 j=1 j=1 where λ2 + λ1j 1j Σj =  λ1j λ2j λ1j λ3j [ A=  λ1j λ2j 2 +λ λ2j 2j λ2j λ3j 1 0 1 0 1 1 ] .  λ1j λ3j λ2j λ3j  2 +λ λ3j 3j and This can be written in the following interesting form V ar(Y ) = AD(θ)AT , where  V ar(λ1 ) + E(λ1 ) Cov(λ1 , λ2 ) Cov(λ1 , λ3 )  Cov(λ1 , λ2 ) V ar(λ2 ) + E(λ2 ) Cov(λ2 , λ3 ) D(θ) =  Cov(λ1 , λ3 ) Cov(λ2 , λ3 ) V ar(λ3 ) + E(λ3 )  which results in Cov(Y1 , Y2 ) = Cov(λ1 , λ2 ) + Cov(λ2 , λ3 ) + Cov(λ1 , λ3 ) + V ar(λ3 ) + E(λ3 ). Thus if the λ’s are negatively correlated we can end up with negative correlation. The above model has some interesting properties. First, as shown in Karlis and Meligkotsidou [2007], even if λ3 = 0, i.e. within each component the two variables are uncorrelated, the Y ’s are correlated due to the correlation induced by the finite distribution of the λ’s. Such a model, with λ3 = 0 for all the components, actually assumed independence within each component, but again overall we can have correlation. Second, the correlation between Y1 and Y2 can be negative, while Y1 and Y2 are overdispersed if m > 1. Note also that the marginal distributions are finite Poisson mixtures. Finally, as we prove in Appendix A, mixed bivariate Poisson distributions always give equal or greater probability to the (0,0) cell from the corresponding bivariate Poisson with the same marginal means. Furthermore, zero-inflated bivariate Poisson models can be considered a special case of this model, when the first component has λ1 = λ2 = λ3 = 0 and, hence, all the probability 5 mass is given in the (0,0) cell. This also suggests why zero-inflated models are overdispersed and can induce different correlation structures. In Appendix B, we summarize some of the moments of the finite mixture of bivariate Poisson distribution. These quantities can be used for actuarial purposes as in Berm´dez [2009] and u Berm´dez and Karlis [2011]. u In order to include covariates and thus allow for greater flexibility we assume that each parameter is associated to a vector of regressors. Namely our model takes the form Y i = (Y1i , Y2i ) ∼ m ∑ j=1 pj BP (y1 , y2 ; λ1ji , λ2ji , λ3ji ), i = 1, . . . , n, j = 1, . . . , m, (1) log(λkji ) = β T xkji , k = 1, 2, 3, j = 1, . . . , m, kj where xkji is a vector of covariates for the i-th observation associated with the k-th parameter of the j-th component of the mixture and β kj is the set of regression coefficients. It is clear that the covariates can differ for different parameters. This model extends the finite mixture of Poisson regression model of Wang et al. [1998]. The model assumes that for each variable we have m distinct Poisson regression models that relate the variable of interest with different covariates. Hence, we assume that the population has several distinct clusters presenting different behaviour. The added feature is that now we model two variables together and so we are able to take into account their relationships and correlation. Moreover, starting from a bivariate Poisson model, within each group we may assume a different correlation structure. A natural extension of the model is to use covariates also in the mixing proportions, i.e. the vector of probabilities (p1 , . . . , pm ). A typical choice is to assume a multinomial logistic model for the vector of mixing proportions (reducing to simple logistic regression if only two components are present). In the next section, we provide an EM algorithm to allow for a relatively simple maximum likelihood (ML) estimation of the model. It is based on the standard EM for finite mixtures but also takes into account the trivariate reduction derivation of the bivariate Poisson model. 3 ML estimation via an EM algorithm In this section we develop an EM algorithm. The parameters to be estimated are the mixing proportions pj , j = 1, . . . , m − 1, and the component-specific vector of regression coefficients β kj , k = 1, 2, 3. Being a finite mixture, standard missing data representation is possible. Let Z i = (Z1i , . . . , Zmi ) be a vector with Zji = 1 if the i-th observation belongs to the j-th group and 0 elsewhere. We also introduce component-specific latent variables, i.e. for the j-th component we use the unobservable vectors Y j∗ = (T1ji , T2ji , Sji ) such as Y1i = T1ji + Sji and Y2i = T2ji + Sji , as i the trivariate reduction derivation implies. The algorithm is similar to that described in Brijs et al. [2004], but here we also have regressors. Clearly if Z i and Sji were observables then estimation would have been a simple task, since at the E-step we need to obtain the conditional expectations. The algorithm is now given by: (τ ) E-step: Given the values of the parameters after the rth iteration we obtain from (1), λ1j , 6 λ2j and λ3j and then we calculate the expected values of the unobservables: sji = E(Sji | and wji = (τ ) (τ ) (τ ) Y1i , Y2i , λ1j , λ2j , λ3j ) (τ ) (τ )   =  λ3ji 0 (τ ) BP (y1i −1,y2i −1|λ1ji ,λ2ji ,λ3ji ) (τ ) (τ ) (τ ) BP (y1i ,y2i |λ1ji ,λ2ji ,λ3ji ) (τ ) (τ ) (τ ) , if y1i y2i > 0 if y1i y2i = 0 pj BP (y1i , y2i | λ1ji , λ2ji , λ3ji ) . m ∑ pj BP (y1i , y2i | λ1ji , λ2ji , λ3ji ) j=1 M-step: Update the estimates by pj (τ +1) = n ∑ i=1 wji /n , β 1j (τ +1) (τ +1) (τ +1) ˆ = β(y 1 − sj , x1 , wj ), ˆ = β(y 2 − sj , x2 , wj ), ˆ = β(sj , x3 , wj ); β 2j β 3j ˆ where sj = [sj1 , . . . , sjn ]T is the n × 1 vector, β(y, x, w) are the weighted maximum likelihood estimated parameters of a Poisson model with response to the vector y, design or data matrix given by x and weights w. Note that different covariates may be used for each λ, i.e. different design matrices. The above procedure has all the advantages and drawbacks of the EM algorithm. For this reason, suitable terminating conditions should be considered carefully. In the case in which covariates are also used for the mixing proportions, then the M-step has to be replaced by one that fits a multinomial logistic (or a simple logistic if only two components are considered) regression using wij as response vector. Finally, initial values can be obtained by fitting a simple univariate Poisson regression to each variables so as to obtain the fitted values. Then, by simply perturbing them (e.g. multiplying the lambda expressions by 0.8 and 1.2), we can obtain initial values for each component. Initial values for the mixing proportions are less important for initialization. Furthermore, as in other finite mixture settings, initial values can be obtained using a standard clustering algorithm. Note that obtaining initial estimates for the wji is sufficient to initialise the algorithm. 4 4.1 Application The data The original population comprised a ten-percent sample of the 1996 automobile portfolio (note, only automobiles categorized as being for private use were considered) of a major insurance company operating in Spain and contains information for 80,994 policyholders. The data have previously been also used in Berm´dez [2009] where bivariate Poisson models, including zerou inflated models, were fitted. The sample is not representative of the company’s current portfolio, 7 Variable V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 Definition equals 1 for women and 0 for men equals 1 when driving in urban area, 0 otherwise equals 1 when zone is medium risk (Madrid and Catalonia) equals 1 when zone is high risk (Northern Spain) equals 1 if the driving license is between 4 and 14 years old equals 1 if the driving license is 15 or more years old equals 1 if the client is in the company between 3 and 5 years equals 1 if the client is in the company for more than 5 years equals 1 of the insured is 30 years old or younger equals 1 if includes comprehensive coverage (except fire) equals 1 if includes comprehensive and collision coverage equals 1 if horsepower is greater than or equal to 5500cc Table 1: Explanatory variables used in the models being drawn from a larger panel of policyholders that had been customers of the company for at least seven years; however, the sample should be helpful here for illustrative purposes. Twelve exogenous variables were considered plus the annual number of accidents recorded for both types of claim. For each policy, the information at the beginning of the period and the total number of claims from policyholders “at fault” were reported for each year. The exogenous variables, described in Table 1 outlining the covariates, and this data set have previously been used in Pinquet et al. [2001], Brouhns et al. [2003], Bolanc´ et al. [2003], Bolanc´ et al. [2008], Boucher e e et al. [2007], Boucher and Denuit [2008] and in Boucher et al. [2009]. In this study, all customers had held a policy with the company for at least three years. Thus, variable V7 could be rejected and variable V8 retained, the latter’s baseline now being established as a customer who had been with the company for fewer than five years. The meaning of the variables that refer to the policyholders’ coverage should also be clarified. The classification adopted here responds to the most common types of automobile insurance policy available on the Spanish market. The simplest policy only includes just third-party liability (claimed and counted as Y1 type) and a set of basic guarantees such as emergency roadside assistance, legal assistance or insurance covering medical costs(claimed and counted as Y2 type), but it does not include comprehensive coverage or collision coverage (claimed and counted as Y2 type). This simplest type of policy makes up the baseline group, while variable V10 denotes policies which, apart from the guarantees contained in the simplest policies, also include comprehensive coverage (except fire), and variable V11 denotes policies which also include fire and collision coverage. 4.2 Results We fitted a 2-finite mixture of bivariate Poisson regressions to this data set. We have avoided running a model with three components as the interpretation of such a model would have been more difficult and because a 2-finite mixture allows sufficient interpretation of this particular data set. However, models with more components can easily be fitted via the EM algorithm provided. Further, we estimated both λ3 parameters as being equal to 0, implying that conditional on the component no correlation was present. The first model fitted does not have covariates in the mixing proportion p while the second 8 Model Double Poisson Bivariate Poisson (BP) BP (regressors on λ3 ) Zero inflated BP (ZIBP) ZIBP (regressors on λ3 ) 2-finite mixture BP (2-FMBP1) 2-FMBP2 (regressors on p) Log-Lik -48,882.95 -48,135.98 -47,873.37 -45,435.00 -45,414.80 -44,927.01 -44,842.22 Parameters 24 25 26 26 27 51 53 AIC 97,813.90 96,321.96 95,798.74 90,922.00 90,883.60 89,956.02 89,737.44 Table 2: Information criteria for selecting the best model for the data uses V10 and V11 as covariates in the mixing proportion. We used these covariates because when fitting the first model we noticed that there was a large difference in a posteriori probabilities when considering values 0 or 1 for V10 and V11. We return to this issue later. Berm´dez [2009] u also used V 10 covariate to model λ3 parameter. In the sequel, 2-FMBP1 is the name given to the first model without covariates on p and 2-FMBP2 is the name given to the second model with covariates on p. Models were fitted via the EM algorithm provided. Table 2 presents the results from fitting various models to the data. We fitted models of increasing complexity, starting from a simple independent Poisson regression model. The first five models are the same as those fitted in Berm´dez [2009]. It can be seen that the 2-finite u mixture of bivariate Poisson regressions are by far the best models, especially the regression with covariates in the mixing proportion, which has the best AIC. Table 3 shows the results from fitting the 2-finite mixture of bivariate Poisson model with the covariates in the mixing proportion. The p-value refers to the likelihood ratio test (LRT) statistic when the variable is included or excluded from the model. We prefer this approach as standard errors in finite mixtures are not easy to derive. In our case we would need to derive the Hessian of the log-likelihood function which is particularly time consuming and vulnerable to overflows as we have 53 parameters (12 regression coefficients for each variable for two of the components, plus three coefficients for the mixing proportion and two covariance parameters). Bootstrapping as an alternative can also be very slow. So, we removed each variable each time and calculated a LRT. The p-values reported correspond to this LRT. Figures 1 and 2 help illustrate that the 2-finite mixture of bivariate Poisson regression (with covariates on p) is a good option, and better in all circumstances than a zero-inflated bivariate Poisson regression, for dealing with overdispersion and the excess of zeros present in the data set. Figure 1 shows the components fitted. We plotted boxplots for the two components for the two variables under consideration. The boxplots represent the values of λkji for k = 1, 2, j = 1, 2, and i = 1, . . . , 80, 994. From this plot, it can be readily seen that the first component corresponds to policyholders with high rates of claims for both variables, Y1 and Y2 , while the second component corresponds to those with small claim rates. In fact, the second component has very small means for the underlying Poisson components, which implies a high probability of zeros. Thus, the second component introduces a large amount of zero inflation in our model. Berm´dez [2009] fitted zero-inflated bivariate Poisson models to account for the excess of u zeros found with respect to the simple bivariate Poisson model while at the same time, allowing for overdispersion. Here, we show that the problem is more than one of simple zero inflation. 9 1st component (j = 1) Y1 Coeff. p-value Intercept 0.071 < 0.001 V1 -0.061 0.115 V2 -0.037 0.162 V3 -0.090 0.027 V4 0.129 0.003 V5 -0.132 0.142 V6 -0.216 0.052 V8 0.101 0.022 V9 0.078 0.145 V10 -0.707 < 0.001 V11 -0.361 < 0.001 V12 0.036 0.224 λ3 0.000 Mixing Proportion (p) Intercept -2.4595 < 0.001 V10 1.447 < 0.001 V11 0.680 < 0.001 Coeff. -1.611 0.032 0.008 0.106 -0.043 0.153 0.027 0.135 0.035 1.622 1.069 0.102 Y2 p-value < 0.001 0.218 0.308 0.003 0.166 0.111 0.313 0.002 0.262 < 0.001 < 0.001 0.028 Coeff. -3.118 0.127 -0.076 0.197 0.284 -0.346 -0.524 0.190 0.193 -2.676 -0.285 0.079 0.000 2nd component (j = 2) Y1 p-value < 0.001 0.059 0.123 0.006 < 0.001 0.016 0.002 0.013 0.048 < 0.001 0.002 0.145 Coeff. -6.014 0.037 0.179 0.242 -0.371 0.452 0.137 0.326 0.171 2.953 2.412 0.397 Y2 p-value < 0.001 0.258 0.001 < 0.001 < 0.001 0.003 0.215 < 0.001 0.024 < 0.001 < 0.001 < 0.001 Table 3: Results from fitting the 2-FMBP2 model (with regressors on p) 10 Thus, by assuming the existence of two types of policyholder described separately by each component in the mixture, we are able to improve considerably the modelling of the data set. Indeed zero-inflated models represent special instances of the finite mixture model presented here, which was considered, at least initially, to account for overdispersion. In fact, in the univariate case, Lord et al. [2005] and Lord et al. [2007] criticize zero-inflated models when modelling the number of accidents owing to a dual-state process assumption. According to them, the claim is made that zero-inflated models assume two sources of zeros: “true” and “observed”. The existence of “true” zeros may be too strong an assumption in some cases (see also Boucher and Santolino [2010]). However, as Park and Lord [2009] discuss in the univariate case, the two-component mixture model used here does not make this somewhat strict dual-state process assumption and allows mixing with respect to both zeros and positives. This interpretation is more flexible and it holds better in our case. The group separation is characterized by low mean with low variance (policyholders considered as a “good” drivers) and high mean with high variance (policyholders considered as a “bad” drivers). From Figure 1, it is also interesting to note that third-party liability claims (Y1 ) present greater separation between the two components than is shown by the rest of automobile claims (Y2 ). For each observation, we also calculated the underlying variance and covariance. These are depicted in Figure 2. The horizontal line is the observed quantity and the boxplot refers to the values fitted for each individual based on the second model (the one with covariates on p). As for the covariance, we can see that the model captures this quite well. In the case of the variance, we can see that the model’s prediction is somewhat smaller than that observed. This is perhaps an indication that some overdispersion remains uncaptured, either because a third component could be fitted or because we have overlooked some covariates. Most of the parameters are significant. Note, however, that the sample size was very large. Any variable selection technique could have been used to reduce the number of variables, however in this application we preferred to retain all the variables in order to see their effect. Recall that it is not necessary to use the same covariate vector for all the parameters. Only the parameter related to gender (V1 ) is not significant in all cases, i.e. for both components and both response variables. On the other hand, parameters related to the driving zone (V3 ), the number of years the customer has been with the company (V8 ), and the type of coverage (V10 and V11 ) present significant coefficients for both components and both response variables. It is interesting to note that parameters V10 and V11 present coefficients of different signs for each response variable. For the Y1 variable (third-party liability claims), the more policy guarantees the customers take out the fewer claims they report. The opposite is the case for the Y2 variable (all other automobile insurance claims). Finally, parameters V5 and V9, related to the policyholder’s driving experience and age respectively, are significant only for the second component, while parameter V12, related to the car’s horsepower, is only significant for the second response variable. A more detailed explanation of the coefficients is of interest here to differentiate between the two groups. Recall that the first component corresponds to the policyholders considered “good” drivers, characterized by a low mean with low variance, and the second component corresponds to the policyholders considered “bad” drivers, characterized by a high mean with high variance. Most of the parameters present the same behaviour for both “good” and “bad” drivers. This is the case of the parameters related to the driving zone (V3 and V4 ), the type of coverage (V10 and V11 ), and the car’s horsepower (V12 ). Another example is the longer the customer 11 Y1 1.5 Y2 1.0 0.5 0.0 1st comp 2nd comp 0.0 0.5 1.0 1.5 1st comp 2nd comp Figure 1: The fitted components for the two variables analyzed 12 Variance of Y1 0.10 0.15 0.20 0.25 Variance of Y2 0.0 0.2 0.4 0.6 0.8 Figure 2: Variance and covariance for the fitted model 13 covariance 0.0 0.1 0.2 0.3 has been with the company (V8 ) the more claims the policyholder reports, regardless of the group to which he or she belongs. By contrast, three parameters are only significant for the second component, and as such can be used to define a “bad” driver. These are basically the parameters related to a driver’s age and driving experience. The fact of being thirty years old or younger (V9 ) results in the expected number of claims to increase for all types of claim. Driving experience (V5 ) reduces the expected number of third-party liability claims, but increases the expected number for all other automobile insurance claims. Moreover, “bad” drivers in urban areas (V2 ) only present a larger expected number of claims for Y2 type claims. Finally, V10 and V11 are also highly significant for the mixing proportion, implying that the existence of V10 and V11 increases the probability of belonging to the first cluster. Hence, “good” drivers take out more guarantees in their policies than is the case with “bad” drivers. Table 4 presents the observed and expected frequencies under the two 2-finite mixture of bivariate Poisson regressions. To obtain the expected frequencies, for each observations we calculated the probability table based on the estimated parameters and then we summed all these probability tables to obtain the one with the expected frequencies. The fit is quite good, while there are still a few cells that have large residuals. The results of the chi-square test show that only a few cells contribute to this goodness of fit, but owing to the very large sample size, rejecting the null hypothesis is somewhat artefactual. It is our belief that the fit is, in fact, very good given the size of the data set. Furthermore, note that a zero-inflated model would only correct with respect to the (0,0) cell and not to the entire probability table. Finally, we present Figure 3 in an effort to see which variables characterize each cluster and which variables can be included as regressors in the mixing proportion p. Using the posterior probabilities, available on finishing the EM algorithm, we can classify each observation to a cluster, based (as usual) on the maximum posterior probability. Since all the variables were binary, for each cluster we considered the proportion of observations that belong to the first cluster for all the variables. In Figure 3 the profiles of the two clusters are depicted for each model fitted, i.e. the mean for all the clients assigned to each cluster. The left-hand side plot corresponds to the 2-FMBP without covariates in the mixing proportion, while the right-hand side plot corresponds to the model with V10 and V11 as covariates in the mixing proportion. The red dotted line represents the first cluster while the solid black line represents the second. For the left-hand side plot, the main differences occur for variables V10 and V11 while some small difference is found for V3. In simple terms, these variables can be used to distinguish between the two clusters. Interestingly, these variables also have different signs in their regression coefficients for the two components (see Table 3). They are also the only variables that are statistically significant for both components and both response variables. For all the other variables, the profiles are the same which indicates their unsuitability for characterizing the clusters. After including V10 and V11 as covariates in the mixing proportion, it can be seen that all the information regarding V11 is now included within the mixing proportion parameter p. In other words, the p does not differ for the two components but it is significant when selecting the component. Thus, if we do not use V11 in the p then the differences will be apparent in the means. By contrast, the inclusion of the comprehensive coverage (except fire) (V10 ) variable still characterizes the clusters. Differences exist for the other variables but are smaller. Note also the differences in interpretation afforded by the two models. By using covariates in the mixing proportion, we model the effect of the covariate explicitly to the choice of component, while when using them only in the mean of the component we do so implicitly. It also helpful 14 Y1 Observed 2-FMBP1 2-FMBP2 Observed 2-FMBP1 2-FMBP2 Observed 2-FMBP1 2-FMBP2 Observed 2-FMBP1 2-FMBP2 Observed 2-FMBP1 2-FMBP2 Observed 2-FMBP1 2-FMBP2 Observed 2-FMBP1 2-FMBP2 Observed 2-FMBP1 2-FMBP2 Observed 2-FMBP1 2-FMBP2 0 71087 70992.70 71045.30 3022 3032.90 3055.16 574 580.86 476.95 149 107.79 117.96 29 15.41 24.48 4 1.79 4.26 2 0.18 0.64 1 0.02 0.08 0 0 0.01 1 3722 3932.01 3806.84 686 753.38 737.08 138 200.93 217.76 42 38.11 49.36 15 5.50 9.06 1 0.65 1.42 1 0.06 0.20 0 0.01 0.02 0 0 0 2 807 593.23 644.29 184 253.26 280.54 55 71.19 77.63 21 13.61 15.41 3 1.98 2.46 0 0.23 0.34 0 0.02 0.04 0 0 0 1 0 0 0 1 2 3 4 5 6 7 8 Y2 3 219 161.74 191.87 71 87.06 92.21 15 24.73 23.73 6 4.75 4.31 2 0.69 0.62 0 0.08 0.08 1 0.01 0.01 1 0 0 0 0 0 4 51 49.25 54.33 26 27.60 25.57 8 7.87 6.27 6 1.52 1.07 1 0.22 0.14 0 0.03 0.02 0 0 0 0 0 0 0 0 0 5 14 13.93 13.26 10 7.87 6.10 4 2.25 1.45 1 0.44 0.24 1 0.06 0.03 0 0.01 0 0 0 0 0 0 0 0 0 0 6 4 3.54 2.81 3 2.01 1.28 1 0.58 0.30 0 0.11 0.05 0 0.02 0.01 2 0 0 0 0 0 0 0 0 0 0 0 7 0 0.80 0.53 1 0.46 0.24 1 0.13 0.05 1 0.03 0.01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Table 4: Observed and expected frequencies 15 0.8 proportion proportion 0.4 0.2 0.2 V1 0.4 0.6 0.6 0.8 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V2 V3 V4 V5 V6 V7 V8 V9 V10 V12 variable variable Figure 3: The profiles of the two clusters considered for 2-FMBP1 and 2-FMBP2 models to consider how covariates directly affect the probability of each customer belonging to a group. 5 Concluding Remarks We have proposed a new model of finite mixture of bivariate Poisson regressions. The idea is that the data consist of subpopulations of different regression structures. A potential use for such a model is for examining the clustering of observations, taking into consideration the effect of certain covariates while also taking into account the dependence between the response variables. The model corrects for the zero inflation and overdispersion present in the real automobile insurance data set used in the application. The model can also be used to model negative correlation. The AIC reported here indicates that the 2-finite mixture of bivariate Poisson regression with covariates in the mixing proportion is the best model for describing the data set. This model has a number of interesting features: first, it allows for overdispersion; second, it embraces zeroinflated regressions models as a special case; third, it allows for an elegant interpretation based on the typical clustering usage of finite mixture models; and, finally, it can also be used to fit negative correlations. The problem of overdispersion arises because of the presence of unobserved heterogeneity in many real data sets. In insurance data sets, an insurance company cannot keep track of the many differences between policyholders. However, the model proposed in this paper accounts for unobserved heterogeneity by choosing a finite number of subpopulations. We assume the existence of two types of policyholder described separately according to each component in the mixture. The phenomenon of excess of zeros may also be seen as a consequence of this unobserved 16 heterogeneity. The model proposed here, as a finite mixture of bivariate Poisson regression model, embraces the zero-inflated bivariate Poisson regression model as a special case. The main difference with zero-inflated models is that the two-component mixture model reported here allows mixing with respect to both zeros and positives. This interpretation is more flexible and holds better in our application. The group separation is characterized by low mean (policyholders considered as a “good” drivers) and high mean (policyholders considered as a “bad” drivers). Moreover, as it seems that the data set may have been generated from two distinct subpopulations, the model allows for a net interpretation of each cluster separately. Note that different regression coefficients can be used to account for the “observed” heterogeneity within each population. Finally, we would like to mention various ways in which this paper might be extended. Although in the present paper we limit our analysis to the bivariate case, it could be extended to include larger dimensions. Following the general model presented by Karlis and Meligkotsidou [2007], covariates might be added and this finite mixture of multivariate Poisson regressions could be used to cluster high-dimensional data. A particularly interesting case occurs if we consider there to be no dependence within a cluster, whereby within-cluster independent Poisson regressions are considered. To conclude this section, we should point out that the one of the limitations of the bivariate Poisson model is that it allows only for positive dependence within each component, owing to the properties of the bivariate (multivariate) Poisson distribution. To overcome this shortcoming, other bivariate models, such as the copula-based models defined in Nikoloulopoulos and Karlis [2010], might be considered as the component specific bivariate distributions. References J. Aitchinson and C. Ho. The multivariate Poisson-log normal distribution. Biometrika, 75: 621–629, 1989. L. Berm´dez. A priori ratemaking using bivariate Poisson regression models. Insurance: Mathu ematics and Economics, 44(1):135–141, 2009. L. Berm´dez and D. Karlis. Bayesian multivariate Poisson models for insurance ratemaking. u Insurance: Mathematics and Economics, 48(2):226–236, 2011. C. Bolanc´, M. Guill´n, and J. Pinquet. Time-varying credibility for frequency risk models: e e Estimation and tests for autoregressive specifications on the random effects. Insurance: Mathematics and Economics, 33(2):273–282, 2003. C. Bolanc´, M. Guill´n, and J. Pinquet. On the link between credibility and frequency premium. e e Insurance: Mathematics and Economics, 43(2):209–213, 2008. J.-P. Boucher and M. Denuit. Credibility premiums for the zero inflated Poisson model and new hunger for bonus interpretation. Insurance: Mathematics and Economics, 42(2):727–735, 2008. J.-P. Boucher and M. Santolino. Discrete distributions when modeling the disability severity score of motor victims. Accident Analysis and Prevention, 42(6):2041–2049, 2010. 17 J.-P. Boucher, M. Denuit, and M. Guill´n. Risk classification for claim counts: A comparative e analysis of various zero-inflated mixed Poisson and hurdle models. North American Actuarial Journal, 11(4):110–131, 2007. J.-P. Boucher, M. Denuit, and M. Guill´n. Number of accidents or number of claims? an e approach with zero-inflated Poisson models for panel data. Journal of Risk and Insurance, 76 (4):821–846, 2009. T. Brijs, D. Karlis, G. Swinnen, K. Vanhoof, G. Wets, and P. Manchanda. A multivariate Poisson mixture model for marketing applications. Statistica Neerlandica, 58(3):322–348, 2004. N. Brouhns, M. Guill´n, M. Denuit, and J. Pinquet. Bonus-malus scales in segmented tariffs e with stochastic migration between segments. Journal of Risk and Insurance, 70(4):577–599, 2003. S. Chib and R. Winkelmann. Markov Chain Monte Carlo Analysis of Correlated Count Data. Journal of Business and Economic Statistics, 19(4):428–435, 2001. M. Denuit, X. Marechal, S. Pitrebois, and J.-F. Walhin. Actuarial Modelling of Claim Counts: Risk Classification, Credibility and Bonus-Malus Systems. Wiley, New York, 2007. A.M. Garay, E.M. Hashimoto, E.M.M. Ortega, and V.H. Lachos. On estimation and influence diagnostics for zero-inflated negative binomial regression models. Computational Statistics and Data Analysis, 55(3):1304 – 1318, 2011. B. Grun and F. Leisch. Fitting finite mixtures of generalized linear regressions in R. Computational Statistics and Data Analysis, 51(11):5247 – 5252, 2007. S. Gurmu and J. Elder. Generalized bivariate count data regression models. Econometric Letters, 68(1):31–36, 2000. S. Gurmu and J. Elder. A bivariate zero-inflated count data regression model with unrestricted correlation. Economics Letters, 100(2):245–248, 2008. N. Johnson, S. Kotz, and N. Balakrishnan. Multivariate Discrete Distributions. Wiley, New York, 1997. D. Karlis and L. Meligkotsidou. Finite multivariate Poisson mixtures with applications. Journal of Statistical Planning and Inference, 137:1942–1960, 2007. D. Karlis and L. Ntzoufras. Analysis of sports data by using bivariate Poisson models. Journal of the Royal Statistical Society Series D: The Statistician, 52(3):381–393, 2003. S. Kocherlakota. On the compounded bivariate Poisson distribution: A unified treatment. Annals of the Institute of Statistical Mathematics, 40:61–76, 1988. S. Kocherlakota and K. Kocherlakota. Bivariate Discrete Distributions, Statistics: textbooks and monographs, volume 132. Markel Dekker, New York, 1992. D. Lambert. Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics, 34:1–14, 1992. 18 C.-S. Li, J.-C. Lu, J. Park, K. Kim, P.A. Brinkley, and J.P. Peterson. Multivariate zero-inflated Poisson models and their applications. Technometrics, 41(1):29–38, 1999. D. Lord, S.P. Washington, and J.N. Ivan. Poisson, Poisson-gamma and zero-inflated regression models of motor vehicle crashes: Balancing statistical fit and theory. Accident Analysis and Prevention, 37(1):35–46, 2005. D. Lord, S. Washington, and J.N. Ivan. Further notes on the application of zero-inflated models in highway safety. Accident Analysis and Prevention, 39(1):53–57, 2007. M.K. Munkin and P.K. Trivedi. Simulated maximum likelihood estimation of multivariate mixedPoisson regression models, with application. Econometrics Journal, 2:29–48, 1999. A. K. Nikoloulopoulos and D. Karlis. Regression in a copula model for bivariate count data. Journal of Applied Statistics, 37(9):1555–1568, 2010. B.-J. Park and D. Lord. Application of finite mixture models for vehicle crash data analysis. Accident Analysis and Prevention, 41(4):683–691, 2009. J. Pinquet, M. Guill´n, and C. Bolanc´. Long-range contagion in automobile insurance data: e e Estimation and implications for experience rating. ASTIN Bulletin, 31:337–348, 2001. G. Stein and J.M. Yuritz. Bivariate compound Poisson distributions. Communications in Statistics -Theory and Methods, 16:3591–3607, 1987. G.Z. Stein, W. Zucchini, and J.M. Juritz. Parameter estimation for the Sichel distribution and its multivariate extension. Journal of the American Statistical Association, 82:938–944, 1987. H. Steyn. On the multivariate Poisson normal distribution. Journal of the American Statistical Association, 71:233–236, 1976. P. Wang. A bivariate zero-inflated negative binomial regression model for count data with excess zeros. Economics Letters, 78(3):373–378, 2003. P. Wang, I.M. Cockburn, and M.L. Puterman. Analysis of patent data: a mixed Poisson regression model approach. Journal of Business and Economic Statistics, 16:27–36, 1998. R. Winkelmann. Econometric Analysis of Count Data, 4th edition. Springer, New York, 2008. A Zero inflation in mixed bivariate Poisson distributions Lemma: Mixed bivariate Poisson distributions always give equal or greater probability to the (0,0) cell from the corresponding bivariate Poisson with the same marginal means. Proof: It is straightforward to see that any mixed bivariate Poisson distribution has an excess of zeros compared to the bivariate Poisson distribution with the same marginal means. This result generalizes the known property in one dimension (Shaked’s Two Crossings Theorem). To demonstrate this, consider for sake of simplicity the 2-finite bivariate Poisson mixture, with probability p and (1 − p) to the points (λ11 , λ21 , λ31 ) and (λ12 , λ22 , λ32 ). The marginal means 19 are p(λ11 + λ31 ) + (1 − p)(λ12 + λ32 ) and p(λ21 + λ31 ) + (1 − p)(λ22 + λ32 ) respectively. Consider also the bivariate Poisson with the same marginal means. Under the 2-finite mixture case the (0, 0) probability is given by P2 (0, 0) = p exp (−(λ11 + λ21 + λ31 )) + (1 − p) exp (−(λ12 + λ22 + λ32 )) or P2 (0, 0) = p exp(−Λ1 ) + (1 − p) exp(−Λ2 ) while for the bivariate Poisson we have PBP (0, 0) = exp (−(pΛ1 ) + (1 − p)Λ2 ) By considering the random variable Q that takes value −Λ1 and −Λ2 with probabilities p and 1 − p and considering the Jensen’s inequality we have that E(exp(Q)) ≥ exp(E(Q)) and thus P2 (0, 0) ≥ PBP (0, 0). Thus, this mixing of this kind also results in zero inflation. The above result can be readily generalized to an infinite number of components as well as to more than two dimensions. B Some of the moments for the 2-finite mixture of bivariate Poisson distribution m ∑ j=1 m ∑ j=1 It can readily be obtained that E(Yk ) = pj (λkj + λ3j ) E(Yk2 ) = [ ] pj λkj + λ3j + (λkj + λ3j )2 V ar(Yk ) = E(Yk2 ) − [E(Yk )]2 m ∑ j=1 E(Y1 Y2 ) = pj [λ3j + (λ1j + λ3j )(λ2j + λ3j )] Cov(Y1 , Y2 ) = E(Y1 Y2 ) − E(Y1 )E(Y2 ) For actuarial purposes one may well be interested in quantities suc as E(Y1 + Y2 ) and/or V ar(Y1 +Y2 ) (see, e.g. Berm´dez [2009]). These can be easily obtained from the above formulas. u Note, that for this sum, we can show that it is a finite mixture of Hermite distributions. 20 SÈRIE DE DOCUMENTS DE TREBALL DE LA XREAP 2006 CREAP2006-01 Matas, A. (GEAP); Raymond, J.Ll. (GEAP) "Economic development and changes in car ownership patterns" (Juny 2006) CREAP2006-02 Trillas, F. (IEB); Montolio, D. (IEB); Duch, N. (IEB) "Productive efficiency and regulatory reform: The case of Vehicle Inspection Services" (Setembre 2006) CREAP2006-03 Bel, G. (PPRE-IREA); Fageda, X. (PPRE-IREA) "Factors explaining local privatization: A meta-regression analysis" (Octubre 2006) CREAP2006-04 Fernàndez-Villadangos, L. (PPRE-IREA) "Are two-part tariffs efficient when consumers plan ahead?: An empirical study" (Octubre 2006) CREAP2006-05 Artís, M. (AQR-IREA); Ramos, R. (AQR-IREA); Suriñach, J. (AQR-IREA) "Job losses, outsourcing and relocation: Empirical evidence using microdata" (Octubre 2006) CREAP2006-06 Alcañiz, M. (RISC-IREA); Costa, A.; Guillén, M. (RISC-IREA); Luna, C.; Rovira, C. "Calculation of the variance in surveys of the economic climate” (Novembre 2006) CREAP2006-07 Albalate, D. (PPRE-IREA) "Lowering blood alcohol content levels to save lives: The European Experience” (Desembre 2006) CREAP2006-08 Garrido, A. (IEB); Arqué, P. (IEB) “The choice of banking firm: Are the interest rate a significant criteria?” (Desembre 2006) SÈRIE DE DOCUMENTS DE TREBALL DE LA XREAP CREAP2006-09 Segarra, A. (GRIT); Teruel-Carrizosa, M. (GRIT) "Productivity growth and competition in spanish manufacturing firms: What has happened in recent years?” (Desembre 2006) CREAP2006-10 Andonova, V.; Díaz-Serrano, Luis. (CREB) "Political institutions and the development of telecommunications” (Desembre 2006) CREAP2006-11 Raymond, J.L.(GEAP); Roig, J.L.. (GEAP) "Capital humano: un análisis comparativo Catalunya-España” (Desembre 2006) CREAP2006-12 Rodríguez, M.(CREB); Stoyanova, A. (CREB) "Changes in the demand for private medical insurance following a shift in tax incentives” (Desembre 2006) CREAP2006-13 Royuela, V. (AQR-IREA); Lambiri, D.; Biagi, B. "Economía urbana y calidad de vida. Una revisión del estado del conocimiento en España” (Desembre 2006) CREAP2006-14 Camarero, M.; Carrion-i-Silvestre, J.LL. (AQR-IREA).;Tamarit, C. "New evidence of the real interest rate parity for OECD countries using panel unit root tests with breaks” (Desembre 2006) CREAP2006-15 Karanassou, M.; Sala, H. (GEAP).;Snower , D. J. "The macroeconomics of the labor market: Three fundamental views” (Desembre 2006) SÈRIE DE DOCUMENTS DE TREBALL DE LA XREAP 2007 XREAP2007-01 Castany, L (AQR-IREA); López-Bazo, E. (AQR-IREA).;Moreno , R. (AQR-IREA) "Decomposing differences in total factor productivity across firm size” (Març 2007) XREAP2007-02 Raymond, J. Ll. (GEAP); Roig, J. Ll. (GEAP) “Una propuesta de evaluación de las externalidades de capital humano en la empresa" (Abril 2007) XREAP2007-03 Durán, J. M. (IEB); Esteller, A. (IEB) “An empirical analysis of wealth taxation: Equity vs. Tax compliance” (Juny 2007) XREAP2007-04 Matas, A. (GEAP); Raymond, J.Ll. (GEAP) “Cross-section data, disequilibrium situations and estimated coefficients: evidence from car ownership demand” (Juny 2007) XREAP2007-05 Jofre-Montseny, J. (IEB); Solé-Ollé, A. (IEB) “Tax differentials and agglomeration economies in intraregional firm location” (Juny 2007) XREAP2007-06 Álvarez-Albelo, C. (CREB); Hernández-Martín, R. “Explaining high economic growth in small tourism countries with a dynamic general equilibrium model” (Juliol 2007) XREAP2007-07 Duch, N. (IEB); Montolio, D. (IEB); Mediavilla, M. “Evaluating the impact of public subsidies on a firm’s performance: a quasi-experimental approach” (Juliol 2007) XREAP2007-08 Segarra-Blasco, A. (GRIT) “Innovation sources and productivity: a quantile regression analysis” (Octubre 2007) SÈRIE DE DOCUMENTS DE TREBALL DE LA XREAP XREAP2007-09 Albalate, D. (PPRE-IREA) “Shifting death to their Alternatives: The case of Toll Motorways” (Octubre 2007) XREAP2007-10 Segarra-Blasco, A. (GRIT); Garcia-Quevedo, J. (IEB); Teruel-Carrizosa, M. (GRIT) “Barriers to innovation and public policy in catalonia” (Novembre 2007) XREAP2007-11 Bel, G. (PPRE-IREA); Foote, J. “Comparison of recent toll road concession transactions in the United States and France” (Novembre 2007) XREAP2007-12 Segarra-Blasco, A. (GRIT); “Innovation, R&D spillovers and productivity: the role of knowledge-intensive services” (Novembre 2007) XREAP2007-13 Bermúdez Morata, Ll. (RFA-IREA); Guillén Estany, M. (RFA-IREA), Solé Auró, A. (RFA-IREA) “Impacto de la inmigración sobre la esperanza de vida en salud y en discapacidad de la población española” (Novembre 2007) XREAP2007-14 Calaeys, P. (AQR-IREA); Ramos, R. (AQR-IREA), Suriñach, J. (AQR-IREA) “Fiscal sustainability across government tiers” (Desembre 2007) XREAP2007-15 Sánchez Hugalbe, A. (IEB) “Influencia de la inmigración en la elección escolar” (Desembre 2007) SÈRIE DE DOCUMENTS DE TREBALL DE LA XREAP 2008 XREAP2008-01 Durán Weitkamp, C. (GRIT); Martín Bofarull, M. (GRIT) ; Pablo Martí, F. “Economic effects of road accessibility in the Pyrenees: User perspective” (Gener 2008) XREAP2008-02 Díaz-Serrano, L.; Stoyanova, A. P. (CREB) “The Causal Relationship between Individual’s Choice Behavior and Self-Reported Satisfaction: the Case of Residential Mobility in the EU” (Març 2008) XREAP2008-03 Matas, A. (GEAP); Raymond, J. L. (GEAP); Roig, J. L. (GEAP) “Car ownership and access to jobs in Spain” (Abril 2008) XREAP2008-04 Bel, G. (PPRE-IREA) ; Fageda, X. (PPRE-IREA) “Privatization and competition in the delivery of local services: An empirical examination of the dual market hypothesis” (Abril 2008) XREAP2008-05 Matas, A. (GEAP); Raymond, J. L. (GEAP); Roig, J. L. (GEAP) “Job accessibility and employment probability” (Maig 2008) XREAP2008-06 Basher, S. A.; Carrión, J. Ll. (AQR-IREA) Deconstructing Shocks and Persistence in OECD Real Exchange Rates (Juny 2008) XREAP2008-07 Sanromá, E. (IEB); Ramos, R. (AQR-IREA); Simón, H. Portabilidad del capital humano y asimilación de los inmigrantes. Evidencia para España (Juliol 2008) XREAP2008-08 Basher, S. A.; Carrión, J. Ll. (AQR-IREA) Price level convergence, purchasing power parity and multiple structural breaks: An application to US cities (Juliol 2008) XREAP2008-09 Bermúdez, Ll. (RFA-IREA) A priori ratemaking using bivariate poisson regression models (Juliol 2008) SÈRIE DE DOCUMENTS DE TREBALL DE LA XREAP XREAP2008-10 Solé-Ollé, A. (IEB), Hortas Rico, M. (IEB) Does urban sprawl increase the costs of providing local public services? Evidence from Spanish municipalities (Novembre 2008) XREAP2008-11 Teruel-Carrizosa, M. (GRIT), Segarra-Blasco, A. (GRIT) Immigration and Firm Growth: Evidence from Spanish cities (Novembre 2008) XREAP2008-12 Duch-Brown, N. (IEB), García-Quevedo, J. (IEB), Montolio, D. (IEB) Assessing the assignation of public subsidies: Do the experts choose the most efficient R&D projects? (Novembre 2008) XREAP2008-13 Bilotkach, V., Fageda, X. (PPRE-IREA), Flores-Fillol, R. Scheduled service versus personal transportation: the role of distance (Desembre 2008) XREAP2008-14 Albalate, D. (PPRE-IREA), Gel, G. (PPRE-IREA) Tourism and urban transport: Holding demand pressure under supply constraints (Desembre 2008) SÈRIE DE DOCUMENTS DE TREBALL DE LA XREAP 2009 XREAP2009-01 Calonge, S. (CREB); Tejada, O. “A theoretical and practical study on linear reforms of dual taxes” (Febrer 2009) XREAP2009-02 Albalate, D. (PPRE-IREA); Fernández-Villadangos, L. (PPRE-IREA) “Exploring Determinants of Urban Motorcycle Accident Severity: The Case of Barcelona” (Març 2009) XREAP2009-03 Borrell, J. R. (PPRE-IREA); Fernández-Villadangos, L. (PPRE-IREA) “Assessing excess profits from different entry regulations” (Abril 2009) XREAP2009-04 Sanromá, E. (IEB); Ramos, R. (AQR-IREA), Simon, H. “Los salarios de los inmigrantes en el mercado de trabajo español. ¿Importa el origen del capital humano?” (Abril 2009) XREAP2009-05 Jiménez, J. L.; Perdiguero, J. (PPRE-IREA) “(No)competition in the Spanish retailing gasoline market: a variance filter approach” (Maig 2009) XREAP2009-06 Álvarez-Albelo,C. D. (CREB), Manresa, A. (CREB), Pigem-Vigo, M. (CREB) “International trade as the sole engine of growth for an economy” (Juny 2009) XREAP2009-07 Callejón, M. (PPRE-IREA), Ortún V, M. “The Black Box of Business Dynamics” (Setembre 2009) XREAP2009-08 Lucena, A. (CREB) “The antecedents and innovation consequences of organizational search: empirical evidence for Spain” (Octubre 2009) XREAP2009-09 Domènech Campmajó, L. (PPRE-IREA) “Competition between TV Platforms” (Octubre 2009) SÈRIE DE DOCUMENTS DE TREBALL DE LA XREAP XREAP2009-10 Solé-Auró, A. (RFA-IREA),Guillén, M. (RFA-IREA), Crimmins, E. M. “Health care utilization among immigrants and native-born populations in 11 European countries. Results from the Survey of Health, Ageing and Retirement in Europe” (Octubre 2009) XREAP2009-11 Segarra, A. (GRIT), Teruel, M. (GRIT) “Small firms, growth and financial constraints” (Octubre 2009) XREAP2009-12 Matas, A. (GEAP), Raymond, J.Ll. (GEAP), Ruiz, A. (GEAP) “Traffic forecasts under uncertainty and capacity constraints” (Novembre 2009) XREAP2009-13 Sole-Ollé, A. (IEB) “Inter-regional redistribution through infrastructure investment: tactical or programmatic?” (Novembre 2009) XREAP2009-14 Del Barrio-Castro, T., García-Quevedo, J. (IEB) “The determinants of university patenting: Do incentives matter?” (Novembre 2009) XREAP2009-15 Ramos, R. (AQR-IREA), Suriñach, J. (AQR-IREA), Artís, M. (AQR-IREA) “Human capital spillovers, productivity and regional convergence in Spain” (Novembre 2009) XREAP2009-16 Álvarez-Albelo, C. D. (CREB), Hernández-Martín, R. “The commons and anti-commons problems in the tourism economy” (Desembre 2009) SÈRIE DE DOCUMENTS DE TREBALL DE LA XREAP 2010 XREAP2010-01 García-López, M. A. (GEAP) “The Accessibility City. When Transport Infrastructure Matters in Urban Spatial Structure” (Febrer 2010) XREAP2010-02 García-Quevedo, J. (IEB), Mas-Verdú, F. (IEB), Polo-Otero, J. (IEB) “Which firms want PhDs? The effect of the university-industry relationship on the PhD labour market” (Març 2010) XREAP2010-03 Pitt, D., Guillén, M. (RFA-IREA) “An introduction to parametric and non-parametric models for bivariate positive insurance claim severity distributions” (Març 2010) XREAP2010-04 Bermúdez, Ll. (RFA-IREA), Karlis, D. “Modelling dependence in a ratemaking procedure with multivariate Poisson regression models” (Abril 2010) XREAP2010-05 Di Paolo, A. (IEB) “Parental education and family characteristics: educational opportunities across cohorts in Italy and Spain” (Maig 2010) XREAP2010-06 Simón, H. (IEB), Ramos, R. (AQR-IREA), Sanromá, E. (IEB) “Movilidad ocupacional de los inmigrantes en una economía de bajas cualificaciones. El caso de España” (Juny 2010) XREAP2010-07 Di Paolo, A. (GEAP & IEB), Raymond, J. Ll. (GEAP & IEB) “Language knowledge and earnings in Catalonia” (Juliol 2010) XREAP2010-08 Bolancé, C. (RFA-IREA), Alemany, R. (RFA-IREA), Guillén, M. (RFA-IREA) “Prediction of the economic cost of individual long-term care in the Spanish population” (Setembre 2010) XREAP2010-09 Di Paolo, A. (GEAP & IEB) “Knowledge of catalan, public/private sector choice and earnings: Evidence from a double sample selection model” (Setembre 2010) SÈRIE DE DOCUMENTS DE TREBALL DE LA XREAP XREAP2010-10 Coad, A., Segarra, A. (GRIT), Teruel, M. (GRIT) “Like milk or wine: Does firm performance improve with age?” (Setembre 2010) XREAP2010-11 Di Paolo, A. (GEAP & IEB), Raymond, J. Ll. (GEAP & IEB), Calero, J. (IEB) “Exploring educational mobility in Europe” (Octubre 2010) XREAP2010-12 Borrell, A. (GiM-IREA), Fernández-Villadangos, L. (GiM-IREA) “Clustering or scattering: the underlying reason for regulating distance among retail outlets” (Desembre 2010) XREAP2010-13 Di Paolo, A. (GEAP & IEB) “School composition effects in Spain” (Desembre 2010) XREAP2010-14 Fageda, X. (GiM-IREA), Flores-Fillol, R. “Technology, Business Models and Network Structure in the Airline Industry” (Desembre 2010) XREAP2010-15 Albalate, D. (GiM-IREA), Bel, G. (GiM-IREA), Fageda, X. (GiM-IREA) “Is it Redistribution or Centralization? On the Determinants of Government Investment in Infrastructure” (Desembre 2010) XREAP2010-16 Oppedisano, V., Turati, G. “What are the causes of educational inequalities and of their evolution over time in Europe? Evidence from PISA” (Desembre 2010) XREAP2010-17 Canova, L., Vaglio, A. “Why do educated mothers matter? A model of parental help” (Desembre 2010) SÈRIE DE DOCUMENTS DE TREBALL DE LA XREAP 2011 XREAP2011-01 Fageda, X. (GiM-IREA), Perdiguero, J. (GiM-IREA) “An empirical analysis of a merger between a network and low-cost airlines” (Maig 2011) XREAP2011-02 Moreno-Torres, I. (ACCO, CRES & GiM-IREA) “What if there was a stronger pharmaceutical price competition in Spain? When regulation has a similar effect to collusion” (Maig 2011) XREAP2011-03 Miguélez, E. (AQR-IREA); Gómez-Miguélez, I. “Singling out individual inventors from patent data” (Maig 2011) XREAP2011-04 Moreno-Torres, I. (ACCO, CRES & GiM-IREA) “Generic drugs in Spain: price competition vs. moral hazard” (Maig 2011) XREAP2011-05 Nieto, S. (AQR-IREA), Ramos, R. (AQR-IREA) “¿Afecta la sobreeducación de los padres al rendimiento académico de sus hijos?” (Maig 2011) XREAP2011-06 Pitt, D., Guillén, M. (RFA-IREA), Bolancé, C. (RFA-IREA) “Estimation of Parametric and Nonparametric Models for Univariate Claim Severity Distributions - an approach using R” (Juny 2011) XREAP2011-07 Guillén, M. (RFA-IREA), Comas-Herrera, A. “How much risk is mitigated by LTC Insurance? A case study of the public system in Spain” (Juny 2011) XREAP2011-08 Ayuso, M. (RFA-IREA), Guillén, M. (RFA-IREA), Bolancé, C. (RFA-IREA) “Loss risk through fraud in car insurance” (Juny 2011) XREAP2011-09 Duch-Brown, N. (IEB), García-Quevedo, J. (IEB), Montolio, D. (IEB) “The link between public support and private R&D effort: What is the optimal subsidy?” (Juny 2011) SÈRIE DE DOCUMENTS DE TREBALL DE LA XREAP XREAP2011-10 Bermúdez, Ll. (RFA-IREA), Karlis, D. “Mixture of bivariate Poisson regression models with an application to insurance” (Juliol 2011) xreap@pcb.ub.es