NASA's Earth Science Information Partnership Federation is an experiment funded to assess the ability of a group of widely heterogeneous earth science data or service providers to self organize and provide improved and affordable access to an expanding earth science user community. As it is self-organizing, the Federation is mandated to set in place an evaluation methodology and collect metrics reflecting the outcomes and benefits of the Federation. This paper describes the challenges of organizing such a federated partnership self-evaluation and discusses the issues encountered during the metrics definition phase of the early data collection. Our experience indicates that a large number of metrics will be needed to fully represent the activities and strengths of all partners, but because of the heterogeneity of the ESIPs the qualitative data (comments accompanying the metric data and success stories) becomes the most useful information. Other lessons learned included the absolute need for online browsing tools to accompany data collection tools. Finally, our experience confirms the effect of evaluation as an agent of change, the best example being the high level of collaboration among the ESIPs which can be in part attributed to the initial identification of collaboration as one of the important evaluation factors of the Federation.