![ABSTRACT Small area estimation (SAE) tackles the problem of providing reliable estimates for small areas, i.e., subsets of th](//img.homeworklib.com/images/aef75dda-bd68-4367-97a6-7d5ed2428d46.png?x-oss-process=image/resize,w_560)
![form could be more appropriate for taking explicitly into account the nature of the variable of interest. An application of t](//img.homeworklib.com/images/23d03811-3577-48b2-b3a0-edba7b7f037d.png?x-oss-process=image/resize,w_560)
![minor ameliorations are furtherly carried out to models. The comparison is repeated also by varying the finite-population sim](//img.homeworklib.com/images/857d78cf-f568-4921-b8fe-f30d8a8ab938.png?x-oss-process=image/resize,w_560)
ABSTRACT Small area estimation (SAE) tackles the problem of providing reliable estimates for small areas, i.e., subsets of the population for which sample information is not sufficient to warrant the use of a direct estimator. Hierarchical Bayesian approach to SAE problems offers several advantages over traditional SAE models including the ability of appropriately accounting for the type of surveyed variable. In this paper, a number of model specifications for estimating small area counts are discussed and their relative merits are illustrated. We conducted a simulation study by reproducing in a simplified form the Italian Labour Force Survey and taking the Local Labor Markets as target areas Simulated data were generated by assuming population characteristics of interest as well as survey sampling design as known. In one set of experiments, numbers of employment/unemployment from census data were utilized; in others population characteristics were varied. Results show persistent model failures for some standard Fay-Herriot specifications and for generalized linear Poisson models with (log-) normal sampling stage, whilst either unmatched or no normal sampling stage models get the best performance in terms of bias, accuracy and reliability. Though, the study also found that any model noticeably improves on its performance by letting sampling variances be stochastically determined rather than assumed as known as is the general practice Moreover, we address the issue of model determination to point out limits and possible deceptions of commonly used criteria for model selection and checking in SAE context. In recent years, small area estimation (SAE) has emerged as an important area of statistics as private and public agencies try to extract the maximum information from sample survey data. Sample surveys are generally designed to provide estimates of characteristics of interest for large areas or domains. However, governments are more and more interested in obtaining statistics for smaller geographical areas such as counties, districts or census divisions, or smaller demographic subsets such as specific age-sex-race subgroups. These domains are called small areas. SAE concerns statistical techniques aimed at producing estimates of characteristics of interest for small areas or domains. The simplest approach is to consider a direct estimator, that is estimating the variable of interest using the domain-specific sample data. However, it is well known that the domain sample sizes are rarely large enough to support reliable and accurate direct estimators since budget and other constraints usually prevent drawing adequate samples from each of the small areas. When direct estimates are unreliable (or even non computable), a major direction considers the use of explicit small area models that borrow strength" from related areas across space and/or time or through auxiliary information which is supposed to be correlated to the variable of interest. Explicit models can be classified into two categories: 1) area level models and 2) unit level models. They can be estimated by adopting several alternative approaches and one of these has been the hierarchical Bayesian (HB) paradigm. However, applications of HB models to SAE, though growing, still are quite a few. Moreover, they have mainly focused on continuous variables. To date, there is no thorough discussion on what is the most appropriate nonlinear specification of area level models when small area estimates are needed for discrete or categorical variables In this paper, we focus on HB area level models for producing small area estimates of counts. In the literature, Bayesian specifications commonly derive from classical models for SAE, e.g. the Fay-Harriot model, or more properly consider either a generalized linear Poisson model or a multinomial logit model. Presented a Normal-logNormal model within the class of the so called unmatched models. Following the HB way of thinking, we independently proposed a Normal-Poisson-logNormal model arguing that this unmatched
form could be more appropriate for taking explicitly into account the nature of the variable of interest. An application of this model, originally extended to enable the use of multiple data sources possibly misaligned with small areas. Moreover, we suggested a Gamma-Poisson-logNormal model, that introduces a nonnormal sampling error stage, and advocated a natural extension of the several above specifications by letting sampling variances be stochastically determined rather than fixed to design estimates as is the general practice For completeness, we mention who compare four HB small area models for producing state estimates of proportions: the original proposal consists of a Beta sampling stage with a logit linking model. Stil in a Bayesian context, handle the problem of unknown sampling variances Under appropriate conditions each of these models may have some merits and whether it is appropriate depends on various circumstances like size of the areas, availability of good explanatory variables at area level, accuracy of sampling variance estimates, etc. Practical use of HB models has been boosted by the availability of software that implements Markov chain Monte Carlo (MCMC) simulations so that model estimation can be straightforward and relatively easy. Room is left for investigating the peculiarity of different specifications and for identifying criteria and guidelines for choosing among alternative Bayesian specifications Purpose of the present work is comparing alternative HB area level models for SAE of counts. Comparison is made first on a theoretical side and then by a simulation study This last is aimed to reproduce one of the most relevant instances where SAE has proven its potential, .e. estimation of labour force statistics at a local level finer than the survey planned domains. The specific framework for the simulation is estimation of the number of unemployed (employed) within Local Labor Markets (LLMs, i.e. areas including a group of municipalities which share the same labor market conditions). In most developed countries, the major source of information on the labor market is a Labor Force Survey (LFS). In Italy, LFS design has been planned so that reliable (design-based) estimates of given precision can be obtained for regional and provincial quantities, quarterly and yearly respectively. LLMs are a finer regional partition and the sample sizes associated with such minor domains result inadequate to allow for stable (design-based) estimates Simulated data were generated by assuming population characteristics of interest as well as sampling survey design as known. In one set of experiments, the actual LLNM unemployment (employment) figures from census data were utilized, in others population characteristics were varied (by changing the type of distribution symmetry) Sil LLM survey sample sizes were either maintained fixed at actual LFS values or given different values. The sampling design was kept quite simple across all studies, moreover, synthetic estimates comprise the sole source of auxiliary information incorporated into model framework. Although the core of models is quite basic, it is worth noting that it is the framework actually used in Italy to produce totals of unemployed for LLMs since late nineties In summary, this paper, through a number of HB area level models for SAE of totals compares three broad classes: matched, unmatched and nonnormal sampling stage models. A first comparison, on the basis of a design-based simulation from census data, is made by assuming known sampling variances. Secondarily, once detected specific model failures in terms of bias, accuracy and reliability, this hypothesis is abandoned and
minor ameliorations are furtherly carried out to models. The comparison is repeated also by varying the finite-population simulation. Moreover, we address the issue of model determination to point out limits and possible deceptions of commonly used criteria for model selection and checking in SAE context, namely, the deviance information criterion (DIC) and the posterior predictive p-value (PPp). In the sequel, Section 2 presents the alternative HB models at comparison specifying motivations behind their introduction, Section 3 describes the simulation study, discusses the results and proposes a number of model refinements, finally, Section 4 contains some concluding remarks