Saturday, September 15, 2012

The Realty: contd...

This is in continuation to my previous blog, "The Realty".

In my previous article, I barely touched upon the "other" factors. I included all of them in a single subjective variable, which, as many pointed out too, is wrong. Here, I will try to include those other factors in an as objective way as possible. This way, I intend to end up with a multi-variable regression with 5 to 10 independent variables.

This analysis is certainly fraught with dangers, the biggest one being that the independent variables can themselves be correlated among themselves. I do not have the tools and knowledge (more importantly the enthusiasm) to correct for that.
Besides, I am using the data of only 10-odd cities which is clearly a very small sample size.

Including these other variables independently should also probably reduce the R-square value of population density. But lets keep the conclusions for later.

Lets list those other factors, their construction and the data source that I'll use for those factors:

1) Infrastructure and Transportation: This is a tough one. Here, I intend to measure the existing level of infrastructure in the city, and in particular, the transportation facilities, on a per capita basis. Thus, it should include factors like road length (road area might be better), rail length, number of buses, number of train bogeys, niche transport facilities (metros, availability of air transport) etc. So the proxy that can be chosen for this variable is
Total Road Length/Population]

2) Education opportunities per capita: This can be a bit tricky. I can't simply use the Total No. of Educational institutions per capita, because it grossly undermines the value of higher education opportunities and also doesn't take into account the fact that institutions can have variable intake capacities. The best way will be to use a number relating to the total intake in various colleges, needless to say, on a per capita basis of existing population. The value of this variable is
[Total Intake in all the city's colleges/Population]

3) Income per capita and job opportunities: This one is straight forward. I'll just use the per capita income of the city. This variable is also a great indicator for the existing job opportunities for two reasons, one, because a high per capita income means there are opportunities waiting to be ceased in the city and second, a high per capita income in turn creates a lot of trickle-down job opportunities (by the way, I am a firm believer in trickle-down economics. There's too much regulation and taxes in this world at the moment). Reiterating, this variable is Per Capita Income

4) Climate and Environment: There are two dimensions to this factor. First is, how the mean temperature in the city compare to the country average. The second is how much deviation is there in the temperature (month-on-month standard deviation) over the whole year. I am tempted to include precipitation here, but the effect of the amount of rainfall is uncertain, as there's no preferred range of precipitation, at least its not a narrow one. Things like proximity to beaches or hill stations will add to this factor, but this fact is best incorporated by adjusting the variable value in a subjective manner. Thus, this variable is (before adjustments) (|xyz| means absolute value of xyz):
|Mean City temp. - Mean India temp.| x Std. Dev. in the city's temp

5) Safety of Living: For this, I have to pick a type of incident which best represents the unsafety of the place and for which the stats are also highly published. This variable can best be proxied by the number of rape cases in the city, per capita.

6) Taxes, Regulatory and Legal issues: This variable should measure how convenient is the legal framework of the city to settle in. This is, by definition, a subjective variable. Still, I think it can be proxied by the Road Tax percentage in the city. Thus, a proxy for this variable is
Average Road tax rate %

6) Herding behavior, Metro Premium: A premium can be assumed for cities that are already developed, esp. if it's a metro. Thus this is a binary variable indicating if it's a metro.

As expected, data collection of this level is a daunting task. I tried collecting this data, but I am having many difficulties in completing this task. This exercise has to be left incomplete.