Previously, we have put forth the concept of Cartesian abstraction and argued that it can yield ‘cognitive maps’. We suggested a general mechanism and presented deep learning based numerical simulations: an observed factor (head direction) was non-linearly projected to form a discretized representation (head direction cells). That representation, in turn, enabled the development of a complementing factor (place cells) from high dimensional (visual) inputs. It has been shown that a related metric, in the form of oriented hexagonal grids, may also be derived. Elements of the algorithms were connected to the entorhinal-hippocampal complex (EHC loop). Here, we make one step further in the mapping to the neural substrate. We consider (i) the features of signals arriving at deep and superficial CA1 pyramidal cells, (ii) the interplay between lateral and medial entorhinal cortex efferents, and the nature of ‘instructive’ input timing-dependent plasticity, a feature of the loop. We suggest that the circuitry corresponds to a special form of Residual Networks that we call Sparsified and Twisted Residual Autoencoder (ST-RAE). We argue that ST-RAEs can learn Cartesian Factors and fit the structure and the working of the entorhinal-hippocampal complex to a reasonable extent, including certain oscillatory properties. We put forth the idea that the factor learning architecture of ST-RAEs has a double role in serving goal-oriented behavior, such as (a) the lowering the dimensionality of the task and (b) the mitigation of the problem of partial observation.