Reconstruction of the surrounding 3D world is of particular interest either for mapping, civil applications or for entertainment. The wide availability of smartphones with cameras and wireless networking capabilities makes collecting 2D images of a particular scene easy. In contrast to the client-server architecture adopted by most mobile services, we propose an architecture where data, computations and results can be shared in a collaborative manner among the participating devices without centralization. Camera calibration and pose estimation parameters are determined using classical image-based methods. The reconstruction is based on interactively selected arbitrary planar regions which is especially suitable for objects having large (near) planar surfaces often found in urban scenes (e.g. building facades, windows, etc). The perspective distortion of a planar region in two views makes it possible to compute the normal and distance of the region w.r.t the world coordinate system. Thus a fairly precise 3D model can be built by reconstructing a set of planar regions with different orientation. We also show how visualization, data sharing and communication can be solved. The applicability of the method is demonstrated on reconstructing real urban scenes.