XSPECtor

Thursday, August 08, 2013

XSPEC v12.8.1 released as part of HEAsoft 6.14

The latest release of XSPEC is now available from the usual place. The release notes are as follows :

New features:

- New models:
     cpflux  - a variant of cflux for photon flux.
     heilin  - Voigt absorption profiles for the HeI series
     lyman   - Voigt absorption profiles for the HI or HeII series
     zbabs   - EUV ISM attenuation

- A new statistic, pgstat, has been added for the case of Poisson-distributed
  data
  with a Gaussian-distributed background. The whittle statistic can
  now be used when fitting averaged power density functions by
  appending an integer (so eg whittle5 is the statistic to use when
  fitting a pdf constructed by averaging those from 5 observations).

- The old CERN Minuit library, which is used for the migrad, minim, monte,
  and simplex fitting methods and the improve command, has been
  replaced by the new version. The minim and monte methods are no
  longer supported and the new version does not include an improve command.
  The output from the migrad and simplex fitting methods now looks the
  same as that from the leven method. Note however that the rules for
  when to write intermediate fit results are not directly comparable
  so do not provide a measure of the relative speed of the methods.

- Fakeit now has a 'nowrite' option to generate fake spectra without
  producing output files.  This is also now available in the
  multifake.tcl script command.

- Parallel processing capability has been added to the steppar command
  and can be invoked using the parallel command.

- Markov Chain Monte Carlo (the chain command) now uses the
  Goodman-Weare algorithm by default. Previously the default was
  Metropolis-Hastings.

- After a chain run, the best-fit parameters and statistic are now
  displayed with "chain info", and are available through the "tclout
  chain" option.

- The default atom_db version used in apec models may now be modified
  with the ATOMDB_VERSION keyword in the user's Xspec.init file.

- Steppar now has a 'delta' option for performing grids centered on
  the best-fit parameters.

- The 'setplot delete' option has been enhanced to allow removal of all
  or a range of commands.

- For external programs calling XSPEC, new wrapper functions have been
  added for retrieving XFLT keywords from data files.

- Norm parameters are now set with a default 'soft' upper limit below
  their 'hard' upper limit.

- In PyXspec, the Fit.statMethod and statTest attributes can now be set
  for individual spectra.

Enhancements previously released as patches to 12.8.0:

- AtomDB has been upgraded to version 2.0.2.

- The tclout 'stat' and 'statmethod' options can now retrieve the test
  statistic as well as the fit statistic.

- The simftest Tcl script command now takes an optional filename
  argument for output.

- Attributes added to PyXspec classes: Xset.parallel, Fit.statTest.

All bug fixes to v12.8.0 released as patches are included in v12.8.1.
In addition the following problems have been corrected:

- The command history file xspec.hty (in the user's ~/.xspec directory)
  is now updated when exiting XSPEC with the 'quit' command.  Previously
  it was only updated when exiting with 'exit'.

- The 'chain' command can now read/write files in ASCII format when
  running in the default Goodman-Weare mode.  Previously this feature
  was only available for Metropolis-Hastings chains.

- Fix to an array access error in the nthcomp model.

- PyXspec fix removes error messages generated when accessing response
  parameters in Python versions 2.6.x.

Friday, March 22, 2013

XSPEC bug patches 12.8.0e-j

We have made available a number of XSPEC bug patches/enhancements over the last couple of months and these are summarized below. They can be downloaded from the usual site.

12.8.0e	This adds a couple of new features for PyXspec: interfaces for the parallel and statistic test commands which were introduced in standard XSPEC 12.8.0. The parallel get/set options are accessed through PyXspec's Xset.parallel object, ie: Xset.parallel.leven = 4 # Set parallel processes for Lev-Marq fitting. Xset.parallel.error = 2 # Set parallel processes for error command. Xset.parallel.reset() # Restore single-process mode for all contexts The statistic test command option is accessed in PyXspec through the new Fit.statTest attribute: Fit.statTest = "ad" Fit.statTest = "chi"
12.8.0f	An array access error in the plot goodness command can cause a crash in some cases. Our thanks to Stefano Bianchi for pointing this out.
12.8.0g	Improvements made for running the Goodman-Weare algorithm in the chain command. It now writes keywords CHAINTYP and NWALKERS to the output chain file to distinguish it from those run with the Metropolis-Hastings algorithm. It also fixes bugs that prevented both appending to and re-loading pre-existing Goodman-Weare chain files. Our thanks to Giacomo Vianello for bringing this to our attention.
12.8.0h	When the lrt Tcl command script is run with multiple spectra and background files loaded, the first spectrum is wrongly assigned the background file of the last spectrum. Our thanks to Ed Kellogg for pointing this out.
12.8.0i	This updates the AtomDB to version 2.0.2 (used by the apec models). It also fixes a bug in the identify command, which was preventing the changing of the AtomDB version by way of the APECROOT variable setting.
12.8.0j	This adds a new test option to the tclout statistic and tclout statmethod commands: tclout statistic test tclout statmethod test These will output the current test statistic value and test method respectively. If the 2nd tclout argument is missing or is fit, these will output the fit statistic and fit method (as before).

Tuesday, February 12, 2013

XSPEC v12.8 - Markov Chain Monte Carlo

The Metropolis-Hastings algorithm used by XSPEC for Markov Chain Monte Carlo requires a choice of proposal distribution. Finding the best distribution can be difficult and makes MCMC harder to use. A new algorithm Goodman-Weare is included in v12.8. This algorithm does not require a choice of proposal distribution. It works by holding multiple sets of parameters, called walkers, for each step in the chain and generating walkers for the next step using those from the current step. This appears to work very well in many cases. The same algorithm is the basis of the emcee python package.

To use the new option in XSPEC do the following

XSPEC12> chain type gw
XSPEC12> chain walkers 8
XSPEC12> chain length 10000

The actual number of steps performed will be length/walkers so length should be chosen as a multiple of the number of walkers. The output chain combines all the walkers so in this case the first eight rows will be the walkers for the first step, the next eight for the second step and so on. A similar rule holds for the burn-in length set by chain burn.

Note that the output from running a chain is fully integrated into XSPEC. So, if a chain is loaded then tclout simpars will draw from the distribution defined by the chain, the error command will be based on the chain, as will the error options on flux and eqwidth.

Another new feature related to MCMC is the thin option on plot chain. It can be difficult to see anything on a plot with many thousands of points so this option applies a thinning factor to only show every Nth step in the chain. For instance:

XSPEC12> plot chain thin 100 1

will plot the values of parameter 1 every 100th step.

Thursday, January 24, 2013

XSPEC v12.8 - modifications to fit command

The new XSPEC release includes some improvements to the fit command. The most visible is an additional column in the output during the fit. |beta|/N is the norm of the vector of derivatives of the statistic with respect to the parameters divided by the number of parameters. At the best fit this should be zero so provides another measure of how well the fit is converging. |beta|/N can also be used as the criterion to stop the fit instead of the statistic delta although this is still considered experimental. This feature is activated by an additional argument on the fit command. For instance, fit 50 1.0e-3 20.0 will run the fit until either 50 iterations are completed, the statistic difference between successive iterations is less than 1.0e-3 or |beta|/N is less than 20.0. Setting either the statistic difference or |beta|/N stopping criteria to a negative number means they will not be used. These stopping criteria can also be set using the method command.

Another internal change is the option to use delayed gratification. This is turned on by giving the argument delay to the fit or method leven commands and turned off by giving the argument nodelay. Delayed gratification is a modification to the choice of the Levenberg-Marquardt damping factor and has been shown to speed convergence in many cases. The default is nodelay.

One other change included in the release was for the first iteration to only modify normalization parameters. We have subsequently determined that this introduces instabilities in some cases so have removed it in patch 12.8.0c. This patch also fixes an infinite loop case which has shown up in a few cases when running the error command.

Error in simftest.tcl

Dacheng Lin has pointed out that the Tcl script simftest.tcl which can be run from the XSPEC command line using the command simftest does not work correctly. The call to the lrt.tcl script within simftest switched the order of the names of the model command files. This has been fixed in the 12.8.0d patch available from the usual site. This patch also adds options to simftest and lrt to specify a filename into which the simulation results are written. Note that documentation on simftest and lrt can be obtained within XSPEC by typing the command name.

Thursday, January 17, 2013

XSPEC v12.8 - new statistics

XSPEC uses statistics for two purposes: 1) determining the best-fit parameter and its uncertainties; and 2) checking whether the model is a good fit to the data. In the past these have both been performed using the same statistic. There are two problems with this. Firstly, there are many goodness-of-fit statistical tests which do not use a statistic appropriate for parameter estimation. Secondly, using the same statistic for both parameter estimation and goodness of fit affects the properties of the statistic when used for the latter purpose.

XSPEC v12.8 now allows separate statistics to be defined for parameter estimation and goodness of fit. There are no changes to the statistic command for parameter estimation however to use a new goodness-of-fit statistic the command becomes statistic test whatever. At the end of a fit both statistic values are now written out. The goodness command now uses the test statistic. There is a new plot option to help visualizing the result of running the goodness command. The command plot goodness after running goodness generates a histogram plot of the test statistic values from the simulations as well as a line indicating the value for the actual data.

The new test statistics added include the familiar Kolmogorov-Smirnov along with the related Anderson-Darling and Cramer-von Mises. The runs statistic tests for runs of consecutive positive or negative residuals. Also added is the original Pearson chi-square statistic for Poisson data.

One new fit statistic has been added: the Whittle statistic for fitting power density spectra.

All these statistics are described in the manual in the extended appendix on statistics.

Wednesday, January 02, 2013

XSPEC v12.8 - Parallelization

XSPEC v12.8 is now available as part of HEAsoft 6.13.

This is the first of several posts on new features in v12.8. The parallel command tells XSPEC to use multiple cores on your machine. At the moment we have parallel processing in two places: in the fit command and in the error command. The standard Levenberg-Marquardt algorithm requires the calculation of the numerical derivative of the statistic with respect to each of the parameters. This is an obvious place to split the calculation among multiple cores and we do this if the command parallel leven N is given, where N is the number of cores to use. The more compute-intensive the model, the more time will be saved. In general however, this does not produce improvements linear in the number of cores.

The error command can be run on multiple parameters at the same time and using the parallel command here with parallel error N does produce a linear gain in N provided there are at least N parameter errors being calculated. The one current drawback with the parallel error calculation comes if the error finds a new minimum. At present all that will happen is that XSPEC will write out that a new error has been found but not update the best fit. To actually find the new best fit the error command must be repeated on individual parameters. We do intend to fix this.

The next command for which we intend to add parallel processing is steppar. We welcome other suggestions and feedback.

Thursday, June 28, 2012

WebSpec - security

I recently installed a new version of WebSpec which fixes security holes and adds some new features.

Since the security fixes might be helpful for others supporting websites I will make a few comments about them. WebSpec operates mainly using Perl CGI scripts which accept information from the user entered in various boxes, process it to create an XSPEC script, run the script, and process the output to generate HTML for the next page. Any instance where information is accepted from the user then either passed to a program or echo'ed back in HTML is a potential serious security risk. The safest course is to check that all input contains only the characters which are expected.

Perl has a paranoid TAINT mode invoked using -T. In this mode Perl will generate an error if any user input is passed to a potentially dangerous command (eg system) without having first been run through a regex which limits the acceptable characters. To illustrate how this works lets look at some WebSpec code. A typical WebSpec Perl script reads from STDIN or the environment variable QUERY_STRING looking for units of form "name = value". Thus


if ($ENV{'REQUEST_METHOD'} eq "POST") {
   read (STDIN, $_, $ENV{'CONTENT_LENGTH'});
} elsif ($ENV{'REQUEST_METHOD'} eq "GET") {
   $_ = $ENV{'QUERY_STRING'};
}


s/^\s*//g;                                  # remove leading whitespace
foreach $_ (split (/&/)) {              # & delimits name/value pairs
   s/\+/ /g;                               # .. which have '=' inside.
   s/\%(..)/sprintf("%c",hex($1)-'0')/geo;
   ($name, $value) = split (/=/, $_, 2);
   $entries{$name} .= $value;
}

This somewhat obscure piece of code puts all the information into an associate array called entries.

Now consider the variable "backsys" which is the systematic error on the background normalization. This must be a positive decimal number so we process it as follows:


($backsys = $entries{"backsys"}) =~ /([0-9\.]+)/;
$backsys = $1;

The term on the right hand side of the first line is a regex expression which says extract a string containing only the characters 0,1,2,..9 and "." and place the result in $1. The square brackets contain the list of allowed characters and the "+" indicates that multiple instances of each character are allowed.

A slightly more complicated case is that for a filename:


($rsp = $entries{"rsp"}) =~ /(\w{1}[\w\_\.]+)/;
$rsp = $1;

Here "\w" means any alphanumeric character so I allow a filename to consist of alphanumeric characters as well as "_" and ".". I also require the first character in the name to be alphanumeric. The importance of these restrictions is to eliminate the possibility of the user giving a filename starting with multiple "../" which means they could get anywhere in the directory tree.

Every member of the entries array is treated in a similar fashion. This is relatively little inconvenience and is worth doing even if you don't think any particular entry is a security threat. It may save you down the line when the script is changed to use the entry in a different fashion.

I'm not aware that other programming languages have anything exactly like the Perl TAINT mode however they can have features which help restrict what might be dangerous code eg Python has RExec.