Systems and methods for multi-modal sensing of three-dimensional position information of the surface of an object are disclosed. In particular, multiple visualization modalities are each used to collect distinctive positional information of a surface of an object. Each of the computed positional information is combined using weighting factors to compute a final, weighted three-dimensional position. In various embodiments, a first depth may be recorded using fiducial markers, a second depth may be recorded using a structured light pattern, and a third depth may be recorded using a light-field camera. Weighting factors may be applied to each of the recorded depths and a final, weighted depth may be computed.