### Soil Data Fitting

By H. D. Scott and D. L. Baker  Ó 2000

Topics

Introduction  (top)

The first step in calibrating a simulation model of saturated and unsaturated water flow in soil must be to take field and laboratory measurements of the soil physical and hydraulic properties. In this case, Thiesse (1984) performed standard constant-head saturated conductivity and pressure plate saturation measurements on core samples in the lab. In the field, he used the zero-flux method (Arya, et al., 1975) to generate data for the unsaturated conductivity relation. One can either accept his results or refit the data to relations that seem more convenient. It takes less time and effort to fit a model to relations that use the least number of parameters. Depending on the calibration method chosen, such as simulated annealing, adding just one more parameter can increase the effort by as much as the multiple of the number of trial values for that parameter. The soil hydraulic relations chosen here may not be the best from other standpoints, but are chosen with that in mind.

Also, conventional symbols for soil hydraulic parameters are not used here. Instead, the Fortran variables from the computer program POULIT are used, in an effort to reduce confusion for users wanting to become familiar with the model.

The Pressure-Saturation Relation  (top)

Table 1 shows the soil saturation data, with depth (cm), from Table 8 in Thiesse (1984), with data added by H.D. Scott (Table 2). Table 1 converts matric suction, -p, from kbar in Thiesse to cm of suction head, at the rate of (1020.4 cm)/(1 bar). These data were first addressed by the current authors in 1997. For reasons lost in that time, the 100 kPa data were not used.

 Table 1: Volumetric water content, th (cm3/cm3), versus depth, z (cm), and negative hydraulic soil water pressure, -p (cm). z (cm): 0 15 30 45 61 76 91 106 122 137 -p (cm) Volumetric water content, th (cm3/cm3) 0 0.537 0.479 0.461 0.462 0.449 0.447 0.458 0.434 0.422 0.465 25.5 0.458 0.401 0.372 0.375 0.379 0.378 0.385 0.374 0.367 0.396 51.0 0.450 0.381 0.348 0.352 0.362 0.362 0.372 0.366 0.363 0.383 75.5 0.417 0.359 0.329 0.331 0.340 0.345 0.359 0.358 0.356 0.381 102.0 0.406 0.350 0.320 0.325 0.340 0.344 0.358 0.355 0.350 0.372 153.1 0.393 0.336 0.308 0.316 0.331 0.338 0.352 0.349 0.346 0.363 204.0 0.363 0.314 0.292 0.301 0.320 0.330 0.344 0.344 0.340 0.356 255.1 0.357 0.307 0.286 0.295 0.316 0.326 0.340 0.341 0.339 0.348 306.1 0.333 0.291 0.276 0.285 0.307 0.318 0.333 0.335 0.331 0.344 510.0 0.306 0.263 0.254 0.267 0.294 0.305 0.321 0.326 0.322 0.329 765.3 0.286 0.239 0.233 0.248 0.281 0.295 0.311 0.316 0.312 0.318 1020.4 0.266 0.225 0.218 0.236 0.270 0.287 0.304 0.308 0.304 0.310
 Table 2: Separate bulk density, rob (g/cm3), total porosity and water content, th (cm3/cm3) data, versus soil depth, z (cm). water retained at 1000 kPa water retained at 1500 kPa z rob (g/g) th (g/g) th 5 1.390 0.150 0.2085 0.049 0.0681 15 1.444 0.126 0.1819 0.051 0.0736 25 1.417 0.147 0.2083 0.072 0.1020 35 1.417 0.160 0.2267 0.092 0.1304 45 1.454 0.163 0.2370 0.116 0.1687 55 1.485 0.170 0.2524 0.135 0.2005 65 1.495 0.181 0.2706 0.142 0.2123 75 1.470 0.191 0.2808 0.134 0.1970 85 1.462 0.198 0.2895 0.182 0.2661 95 1.462 0.201 0.2939 0.116 0.1696

Notice that the 1500kPa (or 15.306 m or 15 bar) data in Table 2 were not developed either at the same depths in the soil column, or to the deepest depth in Table 1. The data were fitted (Figure 1) with a Gompertz function , and extended to 137 cm (Table 3). Unfortunately, it is apparent in later discussion that the 15-bar data do not always seem to be in the same data set as the pressure plate data.  Figure 1: Plot of 15 bar water content (cm3/cm3) data (points, Table 2) and fitted equation (line, Table 3) versus depth (cm)
 Table 3: 15-bar Water contents, th (cm3/cm3), fitted to depth, z (cm), by  z th z th z th 0 0.0670 61 0.2083 122 0.2297 15 0.0719 76 0.2220 137 0.2299 30 0.1144 91 0.2271 45 0.1725 106 0.2290

It should also be noted that the water content data for z = 0 cm in Table 1 was not developed on a pressure plate apparatus. It was calculated from the bulk and particle densities, because of the difficulty of getting the samples fully saturated. This means that there is an unknown systematic error between this saturated water content, ths, and the remaining unsaturated water contents, th(z) in the table. The unsaturated water contents involve some unknown residual air content in the finest pores that did not exist for the saturated water content, ths. As the water is withdrawn from the soil in the pressure plate apparatus, these fine pores may or may not be eventually exposed to the contiguous air phase. There is no way to know where in the data this may happen. It also seems likely the that 10-bar and 15-bar measurements may have exposed these fine pores, where the lower suction measurements did not.

Figure 2a-j show the measured saturation relation data from Table 1 in log-log and semi-log plots, along with a line demonstrating a fitting equation , using the parameters in Table 4. Notice that for this equation, th can go to zero. The last column in Table 4 shows the matric suctions for this case. One could also choose to use the Brooks-Corey relation  or the van Genuchten-style relation . But the approach for modeling the hydraulic properties of the soil column in this experiment and calibrating the model requires that the saturation relation be of the same form down the entire column. Only one of these can be chosen. ,

where th is water content (cm3/cm3) for p < 0, ths is saturated water content, vn is a dimensionless parameter, p is hydraulic pressure head (cm) and at is the "displacement pressure" (cm). , where thr is the residual water content (cm3/cm3), and vn is a dimensionless parameter , p < 0, vm and vn are dimensionless parameters.  Figure 2a Figure 2b  Figure 2c Figure 2d  Figure 2e Figure 2f  Figure 2g Figure 2h  Figure 2i Figure 2j Figure 3: Soil water content, fitted by  with the Table 4 parameters, versus the measured water content for all the points in Table 1, plus the 15-bar points in Table 3.

 Table 4: Fitted parameters for equation  used in Figures 2a-j. The matric suction, y = -p, at which th = 0 is listed in the last column Depth set, z (cm) ths (cm3/cm3) at (cm) vn rms error (cm3/cm3) y(th=0) (m) 0 0.537 20.563 0.071067 8.127e-3 393 15 0.479 11.4081 0.056512 4.490e-3 547 30 0.461 3.4968 0.041388 3.085e-3 2404 45 0.462 2.7316 0.037250 9.450e-3 6650 61 0.449 2.4079 0.028916 4.404e-3 0.133e+6 76 0.447 1.6668 0.024722 2.421e-3 1.19e+6 91 0.458 1.4599 0.023469 4.581e-3 4.36e+6 106 0.434 1.8074 0.019503 8.382e-3 83.5e+6 122 0.422 2.6180 0.019363 7.464e-3 76.4e+6 137 0.465 3.9693 0.027913 4.558e-3 0.682e+6

Since some of the log-log plots of measured data in Figures 2a-j can be well fitted with a straight line, the Brooks-Corey could be the best in those cases. But others have an obvious curvature in the log-log plots that would seem to demand a van Genuchten-style relation. Table 5 shows the fitted parameters for the standard van Genuchten relation . Figure 4 shows the comparison of the fitted values to the measured values. Notice that the standard van Genuchten fit does very well except for saturated porosity, ths. In many cases it produces a lower value. There are duplicate values in Table 5 to demonstrate the pseudo-random nature of the simulated annealing method to make the fits (explained later). ,

m and n are non-dimensional fitting parameters.

 Table 5: Fitting parameters for the van Genuchten saturation relation  and the measured data in Table 1. Fitted by simulated annealing with duplicate fits for some depths. Last column is the mean absolute error (MAE) of the fit to the measured data. depth set z (cm) ths (cm3/cm3) thr (cm3/cm3) at (cm) m n MAE (cm3/cm3) 0 0.4707 0.0000 122.83 0.21847 1.27954 0.0094 0 0.4711 0.0000 119.72 0.21623 1.27589 0.0094 15 0.4168 0.0000 77.57 0.19237 1.23819 0.0054 15 0.4107 0.0000 97.01 0.20320 1.25502 0.0054 30 0.3745 0.0000 89.29 0.17928 1.21845 0.0032 30 0.3745 0.0000 89.09 0.17919 1.21831 0.0032 45 0.4029 0.0731 33.56 0.16321 1.19504 0.0025 45 0.3891 0.0887 48.15 0.18067 1.22050 0.0025 61 0.4093 0.0261 20.83 0.10146 1.11291 0.0017 61 0.4041 0.0651 27.62 0.12032 1.13678 0.0018 76 0.3867 0.0000 45.40 0.08704 1.09534 0.0017 91 0.3864 0.0000 96.80 0.09503 1.10501 0.0021 91 0.3861 0.0000 99.32 0.09534 1.10539 0.0021 106 0.3670 0.0000 242.73 0.10202 1.11360 0.0019 106 0.3670 0.0001 243.08 0.10207 1.11367 0.0019 122 0.3662 0.0000 203.70 0.09707 1.10750 0.0015 122 0.3662 0.0019 204.30 0.09769 1.10826 0.0015 137 0.3982 0.0000 102.94 0.09884 1.10968 0.0012 137 0.3982 0.0005 103.25 0.09903 1.10991 0.0012 Figure 4: Van Genuchten fits  generated for data in Table 1 by the parameters in Table 5, versus the measured data in Table 1.

All of the semi-log plots in Figures 2a-j demonstrate that the simpler log relation  can fit the data in Table 1 as well as the van Genuchten relation , with all of the same values of ths. It has the added advantage that it does not require a parameter for the residual saturation, thr, making it the least number of parameters to fit. It is also reasonable to assume that if the saturation in  goes to zero, then so does the hydraulic conductivity, sc (cm/hr). This can produce some difficulties in the simulation model with stability and convergence. If the displacement pressure parameter, at, is not changed in the model calibration, then Table 4 shows that equation  fits the entire range of the measured tensiometer readings for this experiment, at each depth (not shown here). Thus, any adverse effects from using equation  may not appear, or if they do, not in all the ranges of the possible parameter values (the parameter space).

Calibrating a model to a field experiment (Scott, et al., 1995) over a nominal growing season of more than 200 days is computationally intensive. Consider simulated annealing (Goffe, et al., 1994), with eight fitting parameters in a model that takes about 2 minutes to simulate 1-D infiltration and evapotransipration in a 225-day growing season. It takes about 131 steps to go from an annealing temperature of 10 to 10-5, with a reduction factor of 0.9. Suppose that the model is run 20 times with random parameter settings, before the step size is change, and that 10 such sets are run before the temperature is reduced. This means that it will take 132*8*10*20*(2(min))/(60*24) = 293.3 days of computer time to reach a resolution, 68 days longer than the growing season being simulated.

This makes choosing a saturation relation with the least number of fitted parameters, along with some other optimization method, a reasonable first approach. In addition, the curvature in the log-log plots of measured saturation data is slight, and there are not enough data points in the low and high matric suction ranges. That makes it difficult to clearly and independently identify the at, vm and vn parameters in the van Genuchten-style relation , which in the fits tried, behaved as an equation with too many variables.

Fitting Saturation Relations with Simulated Annealing  (top)

While the computational overhead of simulated annealing (Goffe, et al., 1994) is burdensome for model calibration, it works quite well for fitting saturation relations. For those wishing a better understanding of the process, additional readings may include Corana, et al. (1987), and Kirkpatrick, et al. (1983). A public-domain Fortran program, simann (http://netlib.bell-labs.com/netlib/opt/simann.f.gz), associated with Goffe, et al., was modified to perform the fits presented in Tables 4 and 5.

Simulated annealing is a numerical optimization process that imitates the cooling of metal from high to low temperature. As metal cools, its atoms vibrate less and less, eventually "freezing" into their final positions. If the metal is quickly quenched, its atoms may be caught in positions that are not optimal in terms of the lowest possible energy. It may thus contain stresses and coarse grain structures, literally frozen into position. But if the metal is cooled very slowly, its atoms may naturally find the positions of lowest energy and stress.

Numerical simulated annealing is a pseudo-random process. Say that the "energy" of a data system is represented by a measure called an objective function. In this case, it is the sum of the absolute errors of the fitted saturation relation relative to the measured volumetric water content. The fitting parameters are always given a starting point, a first estimate. In this case, the program allows the user to make the first estimate, or will make the estimation itself.

The program is allowed a certain range of movement in the parameter space, a step length, within fixed limits. From the first point, it will take the each parameter and make a step away from it a random distance that is less than one-half the step length. If the objective function is lower, this will be the new best point and new starting point for the next step. If the objective function is higher, it may or may not accept the new point as the starting point for the next step, depending on the annealing temperature, which starts high.

The annealing temperature is used in an second pseudo-random process that accepts larger increases in the objective function for larger annealing temperatures, with an exponential probability. The higher the annealing temperature, the more likely a parameter point with a "worse" objective function value will be accepted as the start of the next step. Between random step lengths, and the random acceptance of worse values, the process has a good but finite chance of stepping out of local minima of the objective function to find the universal best minimum, the global minimum.

There are other details of the method that are left to the references. The step length is adjusted after 10 steps of each parameter to eventually become very small, and thus specify the resolution of the method. After each 200 steps of each parameter, the annealing temperature is reduced, decreasing the probability that a worse point will be accepted. The overall best point in each temperature cycle is retained as the starting point for the next cycle. Eventually, when both the temperature and the step sizes are very small (10-6), the best overall point is taken as the final result.

As the data in Table 5 demonstrate, the pseudo-random nature of the process does not guarantee the same answer every time. This is almost guaranteed in the program discussed here, satfitsa.exe, by using the computer clock time to seed the random number generator. Thus, the answers will most likely not be exactly the same in every run, even with the same input data and first estimates. The advantage of this is that a slightly better estimate might be found with more runs.

The computer program, satfitsa, will fit three types of saturation relations, the five-parameter van Genuchten-style relation , the three-parameter log relation  and the four-parameter standard van Genuchten relation . It will allow the user to select the initial starting estimates for the parameters, or make its own estimates from the data. If the program is allowed to make its own estimates, it is wise to have a full range of data, and ten points or more. Please note that the program runs only under MS- or other compatible DOS, or in a DOS window in Windows. The input and output files are DOS ascii files.

The program will begin by asking for three filenames, one input and two output files. They must all be in DOS 8+3 filename format, with 8 or less characters, optionally followed by a dot and three or less characters. Table 6 shows the format of the input file. Except for the "series" identifier, all of the data is unformated, and must be set up on each line as shown. The x1 to x5 variables are the input estimates for the fitting parameters, designated under usage, and must all contain values, even if they are unused dummy variables.

 Table 6: Format for the input file to satfitsa. line variable(s) usage 1 ny the number of points (y, th) following 2 series a 15-character alphanumeric identifier for the data set, starts with the first character of the line 2, ... ny+2 y, th y = matric suction (m), th = volumetric water content (cm3/cm3), one set per line, separated by a comma. ny+3 me method = 1 for van Genuchten-style equation  with user initial estimates, = 10 for  with program estimates, = 2 for ln-equation  with user estimates, =20 for  with program estimates, = 3 for standard van Genuchten equation  with user estimates, = 30 for  with program estimates 1 or 10 2 or 20 3 or 30 ny+4 x1 ths ths ths ny+5 x2 thr dummy value thr ny+6 x3 at at at ny+7 x4 vm vn n ny+8 x5 vn dummy value dummy value ny+9 f dummy value, placeholder for output file ny+10 ny ny < 0, terminates the program ny > 0, begins the next data set, with all the above lines repeated.

Table 7 shows the output format for the long output file, of the second name requested by the program at start. It adds to the basic input data. The names of the input and output files are added at the beginning to help in spreadsheets. The values the water content calculated by the fitting equation, th-fit, are added to the input (y, th) points on the same line, to use in spreadsheet plots. The input estimates of the parameter values are replaced by the program optimal values, followed by the final step sizes, res1 to res5, used in searching for the optimum. The magnitudes of these step sizes are a reliable indicator of the resolution of the optimal parameter values. Finally, the objective function, f, the summed-absolute-error of the fitted points, shows the overall error of the fit.

 Table 7: Format for the long out file from satfitsa. line variable(s) usage -2 the filename of the input file in quotes -1 the filename of this file in quotes 0 blank line 1 ny the number of points (y, th) following, repeated from input file 2 series a 15-character alphanumeric identifier for the data set, repeated from input file 2, ... ny+2 y, th, th-fit y = matric suction (m), th = volumetric water content (cm3/cm3), th-fit = the fitted water content ny+3 me, ier method, repeated from input file, and the error number, ier = 0 for no error, = 1 if the program exceeds 500,000 evaluations of the objective function 1 or 10 2 or 20 3 or 30 ny+4 x1, res1 ths, res ths, res ths, res ny+5 x2, res2 thr, res 0, 0 thr, res ny+6 x3, res3 at, res at, res at, res ny+7 x4, res4 vm, res vn, res n, res ny+8 x5, res5 vn, res 1, 0 m, 0 ny+9 fopt objective function, sum of the absolute error between th and th-fit ny+10 ny ny < 0, terminates the program repeated from the input file ny > 0, begins the next data set

The short output file, of the third filename requested by the program at start, tabulates the results without the data points. It contains one line for each input data set. If the fitting method is 1 or 10 (equation ), the line is:

series, ny, ier, me, th(1), th(2), ths, thr, at, vm, vn, fopt, res(ths), res(thr), res(at), res(vm), res(vn)

If the method is 2 or 20 (equation ), the line is:

series, ny, ier, me, th(1), th(2), ths, 0, at, vn, 0, fopt, res(ths), res(at), res(vn), 0, 0

If the method is 3 or 30 (equation ), the line is:

series, ny, ier, me, th(1), th(2), ths, thr, at, m, n, fopt, res(ths), res(thr), res(at), res(n), 0

The variables th(1) and th(2) are the first and second water contents of the input (y, th) points, for comparison to the ths value.

Table 8 shows the upper and lower limits allowed for the parameters by the program. Any user estimates outside these limits will be adjusted inside. Please note that the matric suction, y, is in the dimensions of meters. Please make any changes in scale necessary before and after the program runs.

 Table 8: Upper and lower limits for fitted parameters Method 1 or 10 Method 2 or 20 Method 3 or 30 , y(1) = 0 is taken to be a saturation measurement 0 £ thr £ 0.999 th(ny) --- 0 £ thr £ 0.999 th(ny) 10-4 £ at £ 5 10-4 £ at £ 5 10-4 £ at £ 10 1 £ vm £ 15 10-3 £ vn £ 10 --- 0.1 £ vn £ 15 --- 0.1 £ n £ 20

1. satfitsa.exe - MS-DOS executable program to fit three saturation relations (73k).
2. se030001.dat - example DOS ascii input file for satfitsa; contains the data from Table 1 twice, set up for program estimates and fitting to the standard van Genuchten function  (the parameter estimates in the file are not used, and a good thing, too) (7k).
3. se030002.prn - example long output file generated by satfitsa from se030001.dat (14k).
4. se030003.prn - example short output file (4k).
5. se030002.123 - example Lotus 1-2-3 97 spreadsheet file with a graph (Figure 4) demonstrating the data (30k).
6. se010001.123 - example Lotus 1-2-3 spreadsheet file with the data and graphs used to generate Figures 2 and 3 (98k).
7. satfitsa.zip - a zipped file with all of the files above (60k).

Conductivity Relation Fits  (top)

It is not entirely clear from Theisse's thesis (1984) and notes how he used the experimental data from the plane of zero flux method (Arya, et al., 1975) to generate his unsaturated conductivity relationships. He gives the unsaturated conductivity relation  and the tabular values of A and B (Table 9) from an unspecified averaging method over 12 plots at the given depths. Equation  can also be expressed as , where the saturated conductivity, sc (cm/h), and the saturated water content, ths (cm3/cm3), are explicitly parameters. If one substitutes the log saturation relation  into , one obtains , which makes the unsaturated conductivity, ck, an explicit function of the hydraulic pressure, p (cm). It produces two fitting parameters, sc and bc (dimensionless) that are independent of the saturation relation. , where k = unsaturated hydraulic conductivity after Thiesse (cm/h), q = volumetric water content (cm3/cm3). , where ck is the model variable for unsaturated conductivity (cm/h), sc = saturated conductivity (cm/h), and ths = saturated water content (cm3/cm3). Table 9: Unsaturated conductivity fitting parameters from Thiesse (1984). Saturated porosity, ths, averaged from depths in Table 1. Depth, z (cm) ln A A B avg ths sc 7.5 -11.35 1.17e-5 25.0 0.5080 3.81904 22.5 -12.82 2.71e-6 34.0 0.4700 23.5706 37.5 -14.90 3.38e-7 40.3 0.4615 40.3847 53.0 -15.00 3.06e-7 39.0 0.4555 15.8711 68.5 -15.50 1.86e-7 38.1 0.4480 4.80088 83.5 -13.00 2.26e-6 30.3 0.4525 2.03552 98.5 -13.30 1.67e-6 30.6 0.4460 1.41567 114 -14.00 8.32e-7 33.3 0.428 1.28711

Notice that in order to develop equation  for each of the soil depths, one must have values of ths, A and B at the same depths. Unfortunately, the depths in Table 9 do not match the depths in Table 1 or Table 4. But the two sets of depths are positioned midway between each other, making simple averaging a reasonable approximation. For values outside the available ranges, such as z = 0 for A and B, and z > 114 for A and B, and z > 137 for ths and vn, let us simply extend the nearest value that is available, and produce a set from z = 0 to z = 201 cm.

Figure 5a shows the parameters B and vn cross-interpolated to the same z-points. Figure 5b shows the points generated from B× vn, and a smoothed line of bc through the points. The smoothed line was generated by TableCurve 2-D (SPSS, Inc), using a 21.1% Loess smoothing spline method. Table 10 shows the points taken from the curve to use in the model.  Figure 5a: Cross-interpolated B and vn Figure 5b: bc = B*vn, and and spline-smoothed

 Table 10: Spline-smoothed bc(z) z (cm) bc z (cm) bc z (cm) bc 0 1.746 60 1.156 120 0.652 10 1.662 70 0.971 130 0.789 20 1.634 80 0.807 140 0.893 30 1.587 90 0.711 160 0.924 40 1.512 100 0.660 201 0.930 50 1.355 110 0.640

References  (top)

Arya, L.M., D.A. Farrell and G.R. Blake. 1975. A field study of soil water depletion patterns in the presence of growing soybean roots: 1. Determination of hydraulic properties of the soil. Soil Science Society of America Proceedings 39:424-430.

Corana, A., M. Marchesi, C. Martin and S. Ridella. 1987. Minimizing multimodal functions of continuous variables with the "simulated annealing" algorithm, ACM Transactions on Mathematical Software, 13(3):262-280.

Goffe, W.L., G.D. Ferrier and J. Rogers. 1994. Global optimization of statistical functions with simulated annealing, Journal of Econometrics 60:65-99.

Kirkpatrick, S., C.D. Gelatt, Jr. and M.P. Vecchi. 1983. Optimization by simulated annealing, Science, 220(4598):671-680.

Scott, H.D., A. Mauromoustakos and J.T. Gilmour. 1995. Fate of inorganic nitrogen and phosphorous in broiler litter applied to tall fescue, Bulletin 947, Arkansas Agricultural Experiment Station, University of Arkansas, Fayetteville, AR 72701, 55p.

Thiesse, B.R., 1984, Variability of the physical properties of the Captina soils, M.S. Thesis, Department of Agronomy, University of Arkansas, Fayetteville, AR USA, 91 p.

Questions? mailto:dscott@cleora.uark.edu (no spam please)