A unified framework for constructing, tuning and assessing photometric redshift density estimates in a selection bias setting


Photometric redshift estimation is an indispensable tool of precision cosmology. One problem that plagues the use of this tool in the era of large-scale sky surveys is that the bright galaxies that are selected for spectroscopic observation do not have properties that match those of (far more numerous) dimmer galaxies; thus, ill-designed empirical methods that produce accurate and precise redshift estimates for the former generally will not produce good estimates for the latter. In this paper, we provide a principled framework for generating conditional density estimates (i.e. photometric redshift PDFs) that takes into account selection bias and the covariate shift that this bias induces. We base our approach on the assumption that the probability that astronomers label a galaxy (i.e. determine its spectroscopic redshift) depends only on its measured (photometric and perhaps other) properties x and not on its true redshift. With this assumption, we can explicitly write down risk functions that allow us to both tune and compare methods for estimating importance weights (i.e. the ratio of densities of unlabelled and labelled galaxies for different values of x) and conditional densities. We also provide a method for combining multiple conditional density estimates for the same galaxy into a single estimate with better properties. We apply our risk functions to an analysis of ≈106 galaxies, mostly observed by Sloan Digital Sky Survey, and demonstrate through multiple diagnostic tests that our method achieves good conditional density estimates for the unlabelled galaxies.

Monthly Notices of the Royal Astronomical Society