Abstract: This thesis deals with the problem of indoor environment modelling using depth cameras. We propose a system that allows to traverse an environment with a hand held camera, and no other sensor, and compute a dense 3D textured geometric reconstruction. In the front-end, camera motion is computed by detecting interest points in the images and matching them across frames. We propose a loop closing or place recognition algorithm that is robust over time, and thus allows the system to reconsider past loop closing decisions once additional information becomes available. The backend of the system is the g2o graph optimization algorithm. We test our system both in simulations and with real data coming from a Kinect sensor. Results show that thus system is viable and precise. Our future goal is to incorporate robust outlier detection algorithms that will allow the system to ignore dynamic objects, such as people, and avoid the inclusion of these elements in the model.