edges_cal.xrfi.xrfi_model_sweep

edges_cal.xrfi.xrfi_model_sweep(spectrum: ndarray, *, freq: ndarray | None = None, flags: ndarray | None = None, weights: ndarray | None = None, model: Model = Polynomial(parameters=None, n_terms=3, transform=IdentityTransform(), offset=0.0), window_width: int = 100, use_median: bool = True, n_bootstrap: int = 20, threshold: float | None = 3.0, which_bin: str = 'last', watershed: int = 0, max_iter: int = 1) tuple[numpy.ndarray, dict][source]

Flag RFI by using a moving window and a low-order polynomial to detrend.

This is similar to xrfi_medfilt(), except that within each sliding window, a low-order polynomial is fit, and the std dev of the residuals is used as the underlying distribution width at which to clip RFI.

Parameters:
  • spectrum (array-like) – A 1D or 2D array, where the last axis corresponds to frequency. The data measured at those frequencies.

  • flags (array-like) – The boolean array of flags.

  • weights (array-like) – The weights associated with the data (same shape as spectrum).

  • model_type – The kind of model to use to fit each window. If a string, it must be the name of a Model.

  • window_width (int, optional) – The width of the moving window in number of channels.

  • use_median (bool, optional) – Instead of using bootstrap for the initial window, use Median Absolute Deviation. If True, n_bootstrap is not used. Note that this is typically more robust than bootstrap.

  • n_bootstrap (int, optional) – Number of bootstrap samples to take to estimate the standard deviation of the data without RFI.

  • n_terms – The number of terms in the model (if applicable).

  • threshold – The number of sigma away from the fitted model must be before it is flagged. Higher numbers get less false positives, but may miss some true flags.

  • which_bin – Which bin to flag in each window. May be “last” (default), “all”. In each window, only this bin will be flagged (or all bins will be if “all”).

  • watershed – The number of bins beside each flagged RFI that are assumed to also be RFI.

  • max_iter – The maximum number of iterations to use before determining the flags in a particular window.

Returns:

  • flags (array-like) – Boolean array of the same shape as spectrum indicated which channels/times have flagged RFI.

  • info (dict) – A dictionary of info about the fit, that can be used to inspect what happened.

Notes

Some notes on this algorithm. The basic idea is that a window of a given width is used, and within that window, a model is fit to the spectrum data. The residuals of that fit are used to calculate the standard deviation (or the ‘noise-level’), which gives an indication of outliers. This standard deviation may be found either by bootstrap sampling, or by using the Median Absolute Deviation (MAD). Both of these to some extent account for RFI that’s still in the residuals, but the MAD is typically a bit more robust. NOTE: getting the estimate of the standard deviation wrong is one of the easiest ways for this algorithm to fail. It relies on a few assumptions. Firstly, the window can’t be too large, or else the residuals within the window aren’t stationary. Secondly, while previously-defined flags are used to flag out what might be RFI, os that those data are NOT used in getting the standard deviation, any remaining RFI will severely bias the std. Obviously, if RFI remains in the data, the model itself might not be very accurate either.

Note that for each window, at first the RFI in that window will likely be unflagged, and the std will be computed with all the channels, RFI included. This is why using the MAD or bootstrapping is required. Even if the std is predicted robustly via this method (i.e. there are more good bins than bad in the window), the model itself may not be very good, and so the resulting flags may not be very good. This is where using the option of max_iter>1 is useful – in this case, the model is fit to the same window repeatedly until the flags in the window don’t change between iterations (note this is NOT cumulative).

In the end, by default, only a single channel is actually flagged per-window. While inside the iterative loop, any number of flags can be set (in order to make a better prediction of the model and std), only the first, last or central pixel is actually flagged and used for the next window. This can be changed by setting which_bin='all'.