##################################################################### ##################################################################### ### ### Practice num. 2: TRADITIONAL SMALL AREA ESTIMATORS ### August 2009 ### ### Instructions: To follow this practice, first go to the website: ## http://www.uc3m.es/portal/page/portal/dpto_estadistica/home/members/isabel_molina_peralta ## and clicking in Course material, download: ## - The Spanish data set: "silc0106_ISI09.txt" ## - The description of the data set: "Description_silc0106_ISI09.doc" ## - The populating sizes of provinces: ## "PopnSizeProv.txt", ## "PopnSizeProvByAge.txt", ## "PopnSizeProvByEdu.txt", ## "PopnSizeProvByNat.txt", ## "PopnSizeProvBySit.txt". ## - The values of auxiliary variables for a group of provinces: ## "X_Albacete.txt" ## "X_Avila.txt" ## "X_Cuenca.txt" ## "X_Guadalajara.txt" ## "X_Huelva.txt" ## "X_Lerida.txt" ## "X_Palencia.txt" ## "X_Segovia.txt" ## "X_Soria.txt" ## "X_Teruel.txt" ##################################################################### # Notation: D will denote the number of areas and d=1,...,D will be the area index ##################################################################### ### 1. READING DATA AND OBTENTION OF SAMPLE AND POPULATION SIZES # 1.1. Read the data file silc0106_ISI09.txt data<-read.table("silc0106_ISI09.txt",header=TRUE) data[1:10,] provl attach(data) provl # 1.2. Sample size = 1st. dimension of data file dim(data) n<-dim(data)[1] n #[1] 34389 # 1.3. Number of provinces (areas or domains) in the data file unique(prov) D<-length(unique(prov)) # 1.4. Province sample sizes nd<-rep(0,D) for (d in 1:D) { nd[d]<-sum(prov==d) } # 1.5. Read the population sizes of the provinces PopnSizes<-read.table("PopnSizeProv.txt",header=TRUE) attach(PopnSizes) Nd provlab data.frame(provlab,nd,Nd,10000*nd/Nd) # 1.6. Popn. size N<-sum(Nd) N;n;10000*n/N #[1] 43586848 #[1] 34389 #[1] 7.889765 ### 2. DESCRIPTION OF INCOME VARIABLE AND CONSTRUCTION OF POVERTY VARIABLES # Poverty line z<-6557.143 # It is obtained as 0.6*median(true norminc) # 2.1. Distribution of normalized income summary(norminc) hist(norminc) abline(v=z,col=2,lwd=2) # Task: Do a boxplot of norminc and include a the poverty line # 2.2. Construct the variable "poor": Indicator of people under the poverty line poor<-rep(0,n) poor[norminc