piu58 has it right, although I would use the central or symmetric finite difference instead of the forward finite difference.
Given a two-variable function f(x,y), ∂f/∂x ≈ (f(x+h,y) - f(x-h,y)) ÷ (2×h), and ∂f/∂y ≈ (f(x,y+h) - f(x,y-h)) ÷ (2×h). h should be small relative to the variable against which you are differentiating, but not so small that you run into truncation error during subtraction to form the numerator. This can be extended to any number of function arguments. You can see these equations printed nicely at https://en.wikipedia...veral_variables
If accuracy becomes an issue with finite differences, you can use Richardson Extrapolation to improve the accuracy.
If you have the time and the interest, I recommend using either the Levenberg–Marquardt algorithm or Trond Steihuag's Conjugated Gradient method (http://www.ii.uib.no...apers/trust.pdf) and solving the update step with a QR decomposition instead of using a straight inverse or applying Cholesky Decomposition to the normal equations. I have had great success with both of these in my previous job, which largely involved programming and optimizing nonlinear solvers.