Income Disparities Between Asian-Indians Relative to other Asian Groups

Completed Under Professor Gordon Dahl at the University of California, San Diego

Honors Econometrics

Introduction

Income disparities among various racial backgrounds are a consistently important, central question in economics. It has been long studied that individuals with asian descent often make less in mean wages than a control group of whites. However, it is interesting to question, how large of a disparity do we see between individuals of various asian descent? After an initial analysis, with race regressed against income, the race coefficient for all east asian countries is negative. However, those with Indian descent stand alone with a positive coefficient for race. I would like to understand why we see such a disparity in expected earnings when comparing Asian Indians with other Asian descents.

Relevant Economic Theory

There is a plethora of previous studies regarding the effect of race on expected income throughout economics. The natural issue that arises is that humans are individuals, and summaries of a person’s surface level characteristics is often not enough to explain the variance in resulting income. Simply, we are not able to accurately assess a person’s innate ability, and thus it is difficult to reach significant conclusions, since we have a constant problem with omitted variable bias. 

In “Are OLS Estimates of the Return to Schooling Biased Downward?” Blackburn, McKinley and Neumark comment on the lack of a meaningful variable for ability. They augment their regressions on educational returns by including another variable of test scores for each individual. This is one such method that can be used in an attempt to remove the innate bias created by not including a proxy for ability. Generally, they found that the coefficients of work experience and education were significantly less weighted after including a proxy for ability.

Still, we can attempt to understand the disparities in expected income using an augmented Mincer Equation, controlling for each asian subgroup. We choose to use the Mincer Equation because years of experience and education are respectively extremely relevant predictors of income, and omitting these variables would only result in an increased bias. Plus, we have the added bonus of being able to make use of one of the most influential equations in all of economics.

Data Description

Data is pulled from IPUMS CPS, which aggregates census and survey data from the United States. We make heavy use of the “asian” feature, so let us describe it in further detail. In figure 1.1, we can see a summary of the relative frequency of each asian subgroup in our sample. Since we are comparing those of Indian descent to all others in the study, it is important that this variable has a large sample size. Fortunately, Asian Indian descent is one of the most plentiful samples within the Asian variable, and we have over 15,000 observations to work from. Individuals with no or missing education were removed from the sample. 

Figure 1.1 (Asian Subgroups)

It is possible that the surprising initial regression result was due to outliers in the Asian-Indian subgroup. In order to reduce this variance in wages from outliers, we remove individuals who report income resulting in a Z-value greater than 4.0 when compared to the general distribution. We also remove individuals who have missing values of educational attainment, as well as individuals who report 0 years of educational attainment. 

We can also look at figure 1.2 to understand the relative frequency of educational attainment, with the corresponding spikes representing those who graduated with a high school level education, undergraduate education, and postgraduate education.

Figure 1.2 (Frequency of Educational Attainment Levels)

Empirical Results

We will estimate using the following regression equation:

Where: Educ = Educational attainment (years) ; Age = Age of Individual (years) ; Experience = Difference between age and education (years) and α = Vector of racial control variables for each asian subgroup.

Please refer to the regression results (1.3, 1.4, 1.5) below for the following analyses:

Figure 1.3 (Regression Results w/ no random term)

Figure 1.4 (Regression Results w/ random term)

Figure 1.5 (Initial Regression w/ no experience or education controls)

As we expected, the figure in (1.3) demonstrates the coefficients of education and work experience are highly significant determinants of log wages. Interestingly, after controlling for education and work experience, we can see that all of the coefficients on the various subgroups are positive, which is a contradiction to the regression (1.5) which includes no controls for education and experience. 

Interestingly, we can see that individuals with Indian descent are highly favored by the model, with a regression coefficient that is both much more significant (t-stat ~ 22) and with a much stronger weight (.24) than any other asian subgroup. While the sign changed in (1.3) relative to (1.5), the relative ordinality of the data is the same, and we can still see that individuals with Indian descent can expect a significantly higher wage, with a regression coefficient nearly five times the size of the next subgroup. 

For robustness, we also add an additional, randomly generated variable in the regression in (1.4). Ideally, if our model is working correctly, the random regressor should be insignificant. We can see that this is true, and that the coefficient on the random term is not statistically significant.

As for potential threats to the validity of our results, it is always the case that we must have some degree of omitted variable bias. Even in the fully controlled regression, we only reach an r-squared of .2 in (1.3) , so only roughly 20% of the variance in log wages can be explained by our regressors. It is likely that we are not capturing the causal effect because of the presence of these omitted variables. For example, perhaps we are omitting some sort of estimator for long-term family wealth. In that event, it could be the case that individuals with asian-indian descent simply have more generational wealth than their counterparts, causing the disparity. We also do not have an estimator for individual ability, which will bias our estimate. 

Conclusion

After controlling for education and years of experience, we can dispute our initial regression, and the coefficients on each asian subgroup becomes positive. Still, we can see that the ordinality of the results is the same, with a huge discrepancy between individuals of asian-indian descent rather than other asian subgroups. The reason for this statistical phenomenon is still ultimately unclear, but the fact itself exists that there is a significant difference in expected wages for those of asian-indian descent relative to other subgroups. However, our analysis is potentially limited by a lack of controls, especially the lack of controls for individual ability and access to intergenerational wealth, and there is a strong possibility that we are experiencing some sort of omitted variable bias.

Previous
Previous

Forecasting Wealth using Advanced Regression Techniques