Tuesday, February 12, 2013

XSPEC v12.8 - Markov Chain Monte Carlo

The Metropolis-Hastings algorithm used by XSPEC for Markov Chain Monte Carlo requires a choice of proposal distribution. Finding the best distribution can be difficult and makes MCMC harder to use. A new algorithm Goodman-Weare is included in v12.8. This algorithm does not require a choice of proposal distribution. It works by holding multiple sets of parameters, called walkers, for each step in the chain and generating walkers for the next step using those from the current step. This appears to work very well in many cases. The same algorithm is the basis of the emcee python package.

To use the new option in XSPEC do the following

XSPEC12> chain type gw
XSPEC12> chain walkers 8
XSPEC12> chain length 10000

The actual number of steps performed will be length/walkers so length should be chosen as a multiple of the number of walkers. The output chain combines all the walkers so in this case the first eight rows will be the walkers for the first step, the next eight for the second step and so on. A similar rule holds for the burn-in length set by chain burn.

Note that the output from running a chain is fully integrated into XSPEC. So, if a chain is loaded then tclout simpars will draw from the distribution defined by the chain, the error command will be based on the chain, as will the error options on flux and eqwidth.

Another new feature related to MCMC is the thin option on plot chain. It can be difficult to see anything on a plot with many thousands of points so this option applies a thinning factor to only show every Nth step in the chain. For instance:

XSPEC12> plot chain thin 100 1

will plot the values of parameter 1 every 100th step.


Thursday, January 24, 2013

XSPEC v12.8 - modifications to fit command

The new XSPEC release includes some improvements to the fit command. The most visible is an additional column in the output during the fit. |beta|/N is the norm of the vector of derivatives of the statistic with respect to the parameters divided by the number of parameters. At the best fit this should be zero so provides another measure of how well the fit is converging. |beta|/N can also be used as the criterion to stop the fit instead of the statistic delta although this is still considered experimental. This feature is activated by an additional argument on the fit command. For instance, fit 50 1.0e-3 20.0 will run the fit until either 50 iterations are completed, the statistic difference between successive iterations is less than 1.0e-3 or |beta|/N is less than 20.0. Setting either the statistic difference or |beta|/N stopping criteria to a negative number means they will not be used. These stopping criteria can also be set using the method command.

Another internal change is the option to use delayed gratification. This is turned on by giving the argument delay to the fit or method leven commands and turned off by giving the argument nodelay. Delayed gratification is a modification to the choice of the Levenberg-Marquardt damping factor and has been shown to speed convergence in many cases. The default is nodelay.


One other change included in the release was for the first iteration to only modify normalization parameters. We have subsequently determined that this introduces instabilities in some cases so have removed it in patch 12.8.0c. This patch also fixes an infinite loop case which has shown up in a few cases when running the error command.

Error in simftest.tcl

Dacheng Lin has pointed out that the Tcl script simftest.tcl which can be run from the XSPEC command line using the command simftest does not work correctly. The call to the lrt.tcl script within simftest switched the order of the names of the model command files. This has been fixed in the 12.8.0d patch available from the usual site. This patch also adds options to simftest and lrt to specify a filename into which the simulation results are written. Note that documentation on simftest and lrt can be obtained within XSPEC by typing the command name.

Thursday, January 17, 2013

XSPEC v12.8 - new statistics

XSPEC uses statistics for two purposes: 1) determining the best-fit parameter and its uncertainties; and 2) checking whether the model is a good fit to the data. In the past these have both been performed using the same statistic. There are two problems with this. Firstly, there are many goodness-of-fit statistical tests which do not use a statistic appropriate for parameter estimation. Secondly, using the same statistic for both parameter estimation and goodness of fit affects the properties of the statistic when used for the latter purpose.

XSPEC v12.8 now allows separate statistics to be defined for parameter estimation and goodness of fit. There are no changes to the statistic command for parameter estimation however to use a new goodness-of-fit statistic the command becomes statistic test whatever. At the end of a fit both statistic values are now written out. The goodness command now uses the test statistic. There is a new plot option to help visualizing the result of running the goodness command. The command plot goodness after running goodness generates a histogram plot of the test statistic values from the simulations as well as a line indicating the value for the actual data.

The new test statistics added include the familiar Kolmogorov-Smirnov along with the related Anderson-Darling and Cramer-von Mises. The runs statistic tests for runs of consecutive positive or negative residuals. Also added is the original Pearson chi-square statistic for Poisson data.

One new fit statistic has been added: the Whittle statistic for fitting power density spectra.

All these statistics are described in the manual in the extended appendix on statistics.

Wednesday, January 02, 2013

XSPEC v12.8 - Parallelization

XSPEC v12.8 is now available as part of HEAsoft 6.13.

This is the first of several posts on new features in v12.8. The parallel command tells XSPEC to use multiple cores on your machine. At the moment we have parallel processing in two places: in the fit command and in the error command. The standard Levenberg-Marquardt algorithm requires the calculation of the numerical derivative of the statistic with respect to each of the parameters. This is an obvious place to split the calculation among multiple cores and we do this if the command parallel leven N is given, where N is the number of cores to use. The more compute-intensive the model, the more time will be saved. In general however, this does not produce improvements linear in the number of cores.

The error command can be run on multiple parameters at the same time and using the parallel command here with parallel error N does produce a linear gain in N provided there are at least N parameter errors being calculated. The one current drawback with the parallel error calculation comes if the error finds a new minimum. At present all that will happen is that XSPEC will write out that a new error has been found but not update the best fit. To actually find the new best fit the error command must be repeated on individual parameters. We do intend to fix this.

The next command for which we intend to add parallel processing is steppar. We welcome other suggestions and feedback.

Thursday, June 28, 2012

WebSpec - security

I recently installed a new version of WebSpec which fixes security holes and adds some new features.

Since the security fixes might be helpful for others supporting websites I will make a few comments about them. WebSpec operates mainly using Perl CGI scripts which accept information from the user entered in various boxes, process it to create an XSPEC script, run the script, and process the output to generate HTML for the next page. Any instance where information is accepted from the user then either passed to a program or echo'ed back in HTML is a potential serious security risk. The safest course is to check that all input contains only the characters which are expected.

Perl has a paranoid TAINT mode invoked using -T. In this mode Perl will generate an error if any user input is passed to a potentially dangerous command (eg system) without having first been run through a regex which limits the acceptable characters. To illustrate how this works lets look at some WebSpec code. A typical WebSpec Perl script reads from STDIN or the environment variable QUERY_STRING looking for units of form "name = value". Thus


if ($ENV{'REQUEST_METHOD'} eq "POST") {
read (STDIN, $_, $ENV{'CONTENT_LENGTH'});
} elsif ($ENV{'REQUEST_METHOD'} eq "GET") {
$_ = $ENV{'QUERY_STRING'};
}


s/^\s*//g; # remove leading whitespace
foreach $_ (split (/&/)) { # & delimits name/value pairs
s/\+/ /g; # .. which have '=' inside.
s/\%(..)/sprintf("%c",hex($1)-'0')/geo;
($name, $value) = split (/=/, $_, 2);
$entries{$name} .= $value;
}


This somewhat obscure piece of code puts all the information into an associate array called entries.

Now consider the variable "backsys" which is the systematic error on the background normalization. This must be a positive decimal number so we process it as follows:

($backsys = $entries{"backsys"}) =~ /([0-9\.]+)/;
$backsys = $1;

The term on the right hand side of the first line is a regex expression which says extract a string containing only the characters 0,1,2,..9 and "." and place the result in $1. The square brackets contain the list of allowed characters and the "+" indicates that multiple instances of each character are allowed.

A slightly more complicated case is that for a filename:

($rsp = $entries{"rsp"}) =~ /(\w{1}[\w\_\.]+)/;
$rsp = $1;

Here "\w" means any alphanumeric character so I allow a filename to consist of alphanumeric characters as well as "_" and ".". I also require the first character in the name to be alphanumeric. The importance of these restrictions is to eliminate the possibility of the user giving a filename starting with multiple "../" which means they could get anywhere in the directory tree.

Every member of the entries array is treated in a similar fashion. This is relatively little inconvenience and is worth doing even if you don't think any particular entry is a security threat. It may save you down the line when the script is changed to use the entry in a different fashion.

I'm not aware that other programming languages have anything exactly like the Perl TAINT mode however they can have features which help restrict what might be dangerous code eg Python has RExec.

Monday, March 19, 2012

A couple of useful PHP functions

I've recently added to my home page a count of the number of papers which include the string "xspec". This accesses the new ADSlabs full text search. The html instruction is



where the getXSPECpapers.php file contains

1 - 20 of (.+?)%';
preg_match($regex,$data,$match);
echo $match[1];
?>

This can be modified for any other string by changing the q=xspec section of the URL.

I've also added to my CV page so that it updates the number of citations to each of my papers. The HTML to do this is eg:



where the bibcode argument is the standard ADS bibcode. The getCitations.php file contains

(.+?) abstracts.%';
preg_match($regex,$data,$match);
echo $match[1];
?>

New HEAsoft release: eqpair and related models

The new XSPEC (v12.7.1) has updated eqpair, eqtherm, compth, compps and ntee models. These now all use the same Compton reflection code as reflect and ireflct. This may change results slightly for ionized reflectors because the new code uses the input spectrum to calculate ionization fractions while the old code assumed a power-law.

Note however that these models are still not using a realistic physical model for the ionized reflector.

Friday, March 16, 2012

Release of HEAsoft 6.12/Xspec 12.7.1/PyXspec 1.0

The latest release of HEAsoft (6.12) is available from the usual place. This includes a new release of XSPEC, with the changes being primarily new and modified models, and our first full release of PyXspec (v1.0).

Monday, March 12, 2012

New HEAsoft release: PyXspec non-backwards compatibility

The next HEAsoft release should occur before the end of the week. This is the first of a few notes on some of the more important changes.

This is the release of v1.0 of PyXspec. Our thanks to everyone who has provided feedback on the pre-v1 releases, the comments have been most helpful. Based on this feedback and some further thought by ourselves, we have made a couple of changes which are not backwards compatible. We hope this will not be too inconvenient and believe that the new syntax is clearer. These changes are:

(1) When using multiple data groups, the Model objects assigned to the higher-numbered groups now all have their parameters indexed from 1 to nPar. For example, with a 3 parameter model applied to 2 data groups, you would now access the first parameter in the 2nd model object with "mod2(1)" rather than "mod2(4)".

(2) The Model.setPars() function (introduced with patch 12.7.0f) used the p# keyword argument syntax to set non-consecutive parameters. This has been replaced by the use of Python dictionaries. For example, m.setPars(p2=.3, p4=1.1) should now be m.setPars({2:.3, 4:1.1}).

rfxconv model

Chris Done's convolution model which combines the reflionx table model with the Magdziarz & Zdziarski angle-dependent Compton reflection code is now available from the extra models webpage. This model is described in Kolehmainen, Done & Diaz Trigo (2011).

Tuesday, February 14, 2012

coplrefl table model file

There is a new table model available from the extra models page. This table model by David Ballantyne and collaborators is intended mainly for observations of X-ray pulsars and describes the reflection of a power-law (with variable high-energy cutoff) from a constant density ionized accretion disk.

Friday, January 20, 2012

Trapping a crash when plotting and a new tclout option

12.7.0t Fix for a crash which may occur when attempting to plot data for a spectrum which contains no noticed channels. Our thanks to Eduardo Ojero Pascual for pointing this out. Report added on Jan 03, 2012.

12.7.0u This adds a new tclout option, tclout rerror, which is necessary for retrieving the results of an rerror command run on gain parameters. The syntax for this is:

tclout rerror [< source number>:]< gain par number>

Our thanks to Matteo Guainazzi for bringing this to our attention. Report added on Jan 18, 2012.

Saturday, December 24, 2011

A couple of patches to relatively obscure xspec bugs

12.7.0r When using C-stat, if spectra with no backgrounds and different exposures and/or areascales are collected into the same data group, the renorming operation will miscalculate and can disrupt the fit. Our thanks to Jeremy Sanders for pointing this out. Report added on Dec 06, 2011.

12.7.0s Fix for bug which caused errors in the apec model when running v1.x of AtomDB and the wilm abundance option. Report added on Dec 08, 2011.

Thursday, November 17, 2011

xspec patches 12.7.0p and q

12.7.0p Fix for the obscure case of a model containing both 1) frozen norm parameters AND 2) an unfrozen norm for an additive component to the left of a multiplicative component. When these conditions exist the renorm operation may break, causing a fit to go astray. Our thanks to Hauke Worpel for pointing this out. Report added on Oct 28, 2011.

12.7.0q Enhancements for PyXspec. This adds the object attribute Model.expression which stores the model expression string. It also adds AllModels.sources to the global AllModels container. This is a dictionary object containing the current source numbers with their assigned active models. Its key:value pairs are: [source number]:[model name]. Our thanks to Adam Mantz for these suggestions. Report added on Oct 28, 2011.

Tuesday, October 18, 2011

xspec patches 12.7.0l-o

12.7.0l A crash can occur when trying to plot data while using a dummy response with no channel array. Our thanks to Randall Smith for pointing this out. Report added on Oct 05, 2011.

12.7.0m An enhancement for PyXspec. This adds the new function AllModels.simpars(), which provides the same functionality as standard XSPEC's tclout simpars command. The simulated parameter values are returned as a tuple of floats. Our thanks to Martin Sparre for this suggestion. Report added on Oct 05, 2011.

12.7.0n A fix for PyXspec. When a Parameter object's frozen attribute is toggled (to either freeze or thaw the parameter), the fit statistic's degrees-of-freedom doesn't get updated until a new fit is performed. It ought to be updated immediately. Our thanks to Sebastien Guillot for pointing this out. Report added on Oct 05, 2011.

12.7.0o An enhancement for PyXspec. This adds the new attribute Fit.covariance, which holds (in a tuple) the covariance matrix values from the most recent fit. Our thanks to Sebastien Guillot for this suggestion. Report added on Oct 11, 2011.

Tuesday, September 13, 2011

xspec patches 12.7.0i,j,k

12.7.0i This fixes an obscure case where erroneous usage of "?" for command summary display is causing a crash. Our thanks to Sebastien Guillot for pointing this out. Report added on Sep 07, 2011.

12.7.0j Fix needed to allow gain fit parameters to be used along with custom energy arrays supplied by the energies command. (This includes the energies extend option.) Our thanks to Dominic Walton for pointing this out. Report added on Sep 07, 2011.

12.7.0k For PyXspec only. This improves the output of the Model.folded() class method for the case where there are ignored channels. Previously, bins in the folded model array were set to zero when they corresponded to ignored channels. Now they are removed from the array altogether. This ensures that the length of the folded model array always matches the number of noticed channels. Our thanks to Nakisa Nooraee for pointing this out. Report added on Sep 08, 2011.

Friday, September 02, 2011

Two PyXspec patches

12.7.0g Minor fix for when retrieving plot arrays (using tclout plot in standard XSPEC, or Plot.x() and Plot.y() in PyXspec). This removes the trailing zeros that padded the arrays when plot points were removed with the ignore or setplot rebin commands. Our thanks to Alexander Gewering-Peine for pointing this out. Report added on Aug 12, 2011.

12.7.0h Fix for PyXspec only. Earlier patch 12.7.0f introduced a bug which occurs when a non-norm parameter is modified and set to a parameter link. In this case it does not perform the necessary automatic model recalculation. Our thanks to Adam Mantz for pointing this out. Report added on Aug 25, 2011.

Thursday, August 04, 2011

A model for fitting power spectra in xspec

A model from Adam Ingram and Chris Done for the power spectra from black hole binaries is now available from the new models page.

I've added a note based on their paper to the XSPEC wiki which explains how to read power spectra into XSPEC.

Thursday, July 28, 2011

New models available

Two new sets of models are now available through the XSPEC extra models page (http://heasarc.gsfc.nasa.gov/docs/xanadu/xspec/newmodels.html)

logpar and eplogpar are power-laws whose indices have a log parabolic energy dependence. These models are widely used in blazar research.

optxagnf and optxagn are models for AGN which combine a colour temperature corrected disc and energetically coupled Comptonisation.