EOVSA Implementation of a Spectral Kurtosis Correlator for Transient Detection and Classification

We describe in general terms the practical use in astronomy of a higher-order statistical quantity called Spectral Kurtosis (SK), and describe the first implementation of SK-enabled firmware in the F-engine (Fourier transform-engine) of a digital FX correlator for Expanded Owens Valley Solar Array (EOVSA). The development of the theory for SK is summarized, leading to an expression for generalized SK that is applicable to both SK spectrometers and those not specifically designed for SK. We also give the means for computing both the SK estimator and thresholds for its application as a discriminator of RFI contamination. Tests of the performance of EOVSA as an SK spectrometer are shown to agree precisely with theoretical expectations, and the methods for configuring the correlator for correct SK operation are described.


Introduction
The quality of radio astronomy scientific data can be greatly affected by radio frequency interference (RFI). With the increased use of wireless communication systems operating in frequency bands of scientific interest, the RFI environment becomes ever more hostile. At the same time, astronomers seek to observe over an ever broader part of the radio spectrum, both for increased sensitivity and to exploit spectral continuum diagnostics. These trends make it necessary to find efficient and robust methods that are able to discriminate and excise the RFI contamination, while preserving as much of the underlying useful information as possible.
There is no universal solution for the problem of RFI mitigation, and one must find or develop methods that are most suitable for a particular instrument and the RFI environment in which it operates (Fridman, P. A. & Baan, W. A., 2001). Such methods include active sampling and subtraction of RFI signals, detection and excision (blanking), and spatial nulling using interferometric techniques (van Ardenne et al., 2000). Among these methods, RFI mitigation algorithms based on statistical discrimination of signals in both the time domain (Ruf et al., 2006) and the frequency domain Fridman, P. A. & Baan, W. A., 2001) have become increasingly popular in recent years, due to the feasibility of their implementation using field-programmable gate arrays (FPGAs) or graphical programming units (GPUs), equipped on many modern digital instruments (Baan et al., 2004;Gary et al., 2010;Ford & Buch, 2014).
Spectral Kurtosis (SK) is a higher-order statistical tool, applicable to accumulated Power Spectral Density (PSD) estimates, that has become increasingly popular over the last several decades in digital signal processing applications for automatic detection of non-Gaussian signals embedded in Gaussian background noise (Dwyer, 1983;Ottonello & Pagnan, 1994;Vrabie et al., 2003;Antoni, 2006;Antoni & Randall, 2006;Antoni, 2007;Dion et al., 2012). More recently, Nita et al. (2007) demonstrated that SK may be effective for automatic detection of RFI in astronomical signals, and proposed a simple design for a wide-band SK spectrometer based on its implementation and testing in a software correlator for a prototype instrument .
As extensively developed in our preceding studies Nita & Gary, 2010a;Gary et al., 2010;Nita & Gary, 2010b), what makes an SK spectrometer distinct from a traditional one is the fact that, for each frequency channel, it accumulates not only a set of M instantaneous power spectral density (PSD) estimates, S 1 = M i=1 P i , but also the squared instantaneous spectral power, S 2 = M i=1 P 2 i . A remarkable property of the spectral kurtosis estimator ( SK) computed from these parallel accumulations is that, in the case of a pure Gaussian time domain signal, its statistical expectation is unity for each frequency channel and time, no matter how the power level may vary in spectral shape or time. This property gives the SK estimator the ability to discriminate signals that deviate from Gaussian time domain statistics against arbitrarily shaped astronomical backgrounds, thus making it sensitive to many man-made signals that produce unwanted RFI contamination of the astronomical signals of interest.
Moreover, the SK estimator is expected to deviate below unity if the RFI signal is continuous or acting for more than half of the accumulation time, while it is expected to deviate above unity if the RFI signal has a less than 50% duty-cycle. Thus, the SK estimator also provides the means to characterize the duration of the RFI signals relative to the instrumental accumulation time. Nevertheless, as shown by Nita et al. (2007), the SK estimator also deviates above unity in the case of a Gaussian transient lasting shorter than the integration time, which may pose a real-time decision challenge if such Gaussian transients represent wanted astronomical signals. However, as recently suggested by Nita (2016) in a comprehensive theoretical study, and experimentally demonstrated by Nita & Gary (2016), this sensitivity of the SK estimator to transient astronomical signals, if properly taken advantage of, may provide an experimental means for obtaining accurate signal-to-noise ratio and duration estimates for both Gaussian and non-Gaussian transients, and thus extends the potential uses of the SK spectrometer well-beyond its originally intended scope.
Developed for the Korean Solar Radio Burst Locator telescope (KSRBL, Dou et al., 2009), the first hardware implementation of a wide-band spectrometer with SK capabilities, utilizing FPGA architecture, served as a demonstration (Gary et al., 2010) of the effectiveness of real-time RFI detection and excision via the SK algorithm. In addition, the quantitative examination of the resulting high-precision probability density function (PDF) also revealed the need for a more accurate calculation of the theoretical RFI detection thresholds than the traditional, symmetrical standard deviation thresholds employed by Nita et al. (2007) in the original software-based SK spectrometer prototype.
Consequently, Nita & Gary (2010a) proposed a more theoretically well-founded spectral kurtosis estimator ( SK ), which applies to the case of a normally distributed time domain signal whose PSD estimates are obtained by means of a Fast Fourier Transform (FFT). It is an unbiased estimator of the spectral variability, which is defined as the ratio between the variance and the squared mean of the PSD estimates. For this particular case, Nita & Gary (2010a) derived exact analytical expressions for the infinite series of statistical moments of the SK estimator, and assigned to it a Pearson Type IV probability curve (Pearson, 1895), which was shown to be in very good agreement with numerically simulated SK PDFs, as well as with the SK distributions derived from direct experimental observations made with KSRBL (Gary et al., 2010).
Given its conceptual simplicity, which makes it ideally suited for a real-time FPGA or GPU implementation, the original SK spectrometer design proposed by Nita et al. (2007) has begun to be adopted as a standard component in the analysis pipelines of a new generation of radio telescopes designed not only by us, but also by other independent instrument design teams (Deller et al., 2011;Anderson et al., 2012;Bassa et al., 2016).
However, one drawback of the SK spectrometer is the need to record both power and power-squared. Despite rapidly increasing computational and storage capabilities and decreasing associated costs, the overhead brought by the doubling of real-time data volume and rates relative to that required by a traditional spectrometer design may still negatively impact the larger adoption of such instruments, especially in the case of radio interferometry arrays involving a large number of individual antennas. One solution to this is to compute SK in the firmware and either record only the real-time SK flags, which can be represented by single bits, or even apply them in the firmware and dispense with recording power-squared entirely.
An alternative solution, which also broadens the use of the SK estimator to existing spectrometers not specifically designed for SK, was developed by Nita & Gary (2010b)-the concept of a generalized SK estimator that, with a trade-off of decreased temporal resolution, may be computed solely from the output of a standard spectrometer (one that provides only accumulated PSD estimates, s 1 = Σ N j=1 P j ). As defined by Nita & Gary (2010b) in terms of the accumulations S 1 = Σ M i=1 (s 1 ) i and S 2 = Σ M i=1 (s 1 ) 2 i , and the shape parameter d, the generalized spectral estimator is where N is the number of internal accumulations performed to produce the instrumental output s 1 , and M is the number of such consecutive accumulations used to compute the above sums S 1 and S 2 . This generalized form of SK has unity expectation not only in the particular case of accumulations involving raw PSD estimates obtained via FFT, which have a Gamma probability distribution of shape parameter d = 1 (exponential distribution), but also for PSD estimates characterized by Gamma probability distributions of arbitrary shape parameters d, particularly the PSD estimates obtained by means of narrow-band filtering of wide-band time domain signals, which obey a chi-square statistics corresponding to a Gamma distribution with shape parameter d = 0.5. Therefore, unlike its original counterpart, the generalized SK estimator defined by Equation 1 is no longer limited to a particular instrument design or method used to obtain the PSD estimates. It may be helpful to give some specific use cases: (i) For a spectrometer specifically designed for SK, which produces S 1 and S 2 in firmware, one would use N = 1, d = 1 in Equation 1. (ii) For a typical FFT-based spectrometer that outputs only accumulated power s 1 , one would set N to the number of on-board accumulations, and then perform offline accumulation of S 1 and S 2 , and use d = 1. (iii) For a spectrometer that outputs only a band-limited time-domain signal, one would set N to the number of on-board accumulations of data samples, perform offline accumulation of S 1 and S 2 , and use d = 0.5. (iv) Finally, one may sometimes have data from a spectrometer whose internal characteristics, and hence number of on-board accumulations, are not known. In this case, it is often possible to calculate the partial estimate for large M , SK ′ = (M S 2 /S 2 1 − 1) and observe its value in RFI-free regions, to estimate N d ≈ (M S 2 /S 2 1 − 1) −1 . Although Equation 1 suffices to calculate the expected SK for different use cases, it is also necessary to know the SK PDF for each case, so that thresholds can be computed beyond which one can be confident, at some probability level, that the signal is non-Gaussian. Following the same mathematical framework used to obtain the SK PDF approximation corresponding to the particular case {N = 1, d = 1} (Nita & Gary, 2010a), Nita & Gary (2010b) derived the infinite series of statistical moments of the generalized SK estimator defined by Equation 1 and demonstrated that, for different regions of the parameter space {M, N d}, the SK PDF may be approximated by one of a Pearson Type I, Type III, or Type IV probability distribution, any of which can provide known probability of false alarm (PFA) detection thresholds accurate enough for practical applications. In Appendix 1 we provide an example procedure written in Interactive Data Language (IDL) that may be used to compute such thresholds.
Just as KSRBL (Dou et al., 2009) was the first implementation of an SK spectrometer, we report here on the first implementation of SK in a multi-antenna, interferometric array, the Expanded Owens Valley Solar Array (EOVSA). In doing so, we extend the previous work by demonstrating the use of both the traditional SK estimator and the generalized SK estimator for data recorded with the system. In §2 we describe the design of the EOVSA correlator, which implements the SK function. In §3 we discuss specific tests that demonstrate the performance. We conclude with some general remarks in the final §4.

EOVSA SK Correlator Design
Following our success with the KSRBL spectrometer design described by Dou et al. (2009), we have implemented a similar design in the Fourier transform engine (F-engine) of the 16-antenna, dual-polarization FX correlator for the Expanded Owens Valley Solar Array (EOVSA). EOVSA is a solar-dedicated array of 13 small 2.1-m-diameter dishes equipped with broadband (1-18 GHz) front-end systems with crossed-linearpolarization feeds. A pair of 20 GHz optical links transmit the entire RF band from the two polarizations to a central control building, where they are downconverted via a tunable LO to sample an instantaneous bandwidth of 600 MHz. The system is capable of tuning to a new IF band in under 1 ms, and the correlator is designed to accumulate for 19 ms, thus providing a 20 ms sample time. The frequency-agile system is capable of tuning in arbitrary order, and typically tunes to 50 IF frequencies in a fixed 1 s cycle. The correlator also accepts input from two 27-m antennas equipped with He-cooled receivers, which are used for observation of celestial sources for calibration of the small dishes.
The digital part of the system, comprising the ADCs, the F-engine, and the cross-correlation engine (X-engine), is based on the hardware and firmware of the Collaboration for Astronomical Signal Processing and Electronics Research (CASPER), and is implemented on second-generation Reconfigurable Open Architecture Computing Hardware (ROACH-2) boards equipped with first-generation, dual-input Karoo Array Telescope analog-to-digital converter (KatADC) boards. Each KatADC board accepts input from the two polarizations of a single antenna, and each ROACH-2 has two such boards, for four inputs in total. The entire 16-antenna correlator thus requires 8 ROACH-2 boards. The firmware for each ROACH-2 is running four parallel, identical F-engines and a single X-engine that handles 1/8th of the total IF bandwidth on a single Xilinx Virtex-6 Field Programmable Gate Array (FPGA). The remaining 7/8ths of the band is sent via 10-gigabit ethernet packets to the other 7 ROACH-2 boards. The F-engine and X-engine data are accumulated over 19 ms on each board, and then sent via 10-gigabit ethernet to a processing computer. Figure 2 shows a simplified block diagram of the digital hardware. At present (August 2016), the functional design of the digital correlator is final, but the design is operating at an FPGA clock speed of 200 MHz, lower than the design goal of 300 MHz. Small design tweaks are being applied in order to bring the timing to the design goal of 300 MHz, which is expected soon. Meanwhile, the tests described in the next section were taken with the lower clock speed, which limits the effective IF bandwidth to 400 MHz and causes an overlap of the lower half of the band, so that it is effectively double-sideband. The upper half of the 400 MHz band is the preferred single-sideband. However, these complications in no way affect the operation of the system for the purposes of evaluating SK performance. Fig. 1. Block diagram of EOVSA correlator. Quantities with no bit-truncation are shown without an underbar, while quantities with potential truncation are shown with an underbar. Truncation is performed via the "Cast" blocks, which change the output bit-width for practical purposes of data volume and compatibility with standard computational data types, e.g. 32-bit data for accumulated power ( P ) and cross-power ( X 2 ), and 64-bit data for power-squared ( P 2 ). Figure 2 gives the bit-width of the digital representation of numerical values at various points in the design. To properly determine SK from the quantities output by the EOVSA correlator, care must be taken to keep the precision of the approximate output values ( P ) and ( P 2 ) as high as possible without overflow. The critical locations where inappropriate truncation could occur are the "Cast" block just after the accumulator for the accumulated power, and the "Cast" block just before the accumulator for the individual power-squared samples. Since these blocks merely take the most significant bits in both cases, this amounts to ensuring that the power be as high as possible without overflowing. There are two adjustments that must act together to keep the power at the desired level. First, the overall analog system gain must be adjusted to keep the digitized ADC signal in an appropriate range, which anyway is needed for optimum performance of the digital system, and second, a quantity called the FFT shift can be manipulated to keep the output of the FFT in the appropriate range for good SK performance. EOVSA's 4096 frequency channels require a 13-stage FFT, and at each stage it is possible with the CASPER firmware to shift the output by one bit to lower precision. With no shifts at all, FFT overflows are highly likely, so shifting the output of some FFT stages is often needed. Each shift of a stage results in the x, y quantities being reduced by a factor of 2, and after squaring to form the power P , the power reduces by a factor of 4 (and power-squared by a factor of 16). The procedure we use to determine the optimum FFT shift is to simply try different values and form the SK estimator, which will be unity for non-RFI affected channels for a fairly wide range of FFT shift values, but will differ from unity when the FFT shift is too small, so that P starts to overflow, or too large, so that P 2 starts to fall to low precision.
In practice, EOVSA's design provides such a high precision (64 bits) for P 2 that SK is well behaved over a wide range of FFT shifts. When only the first 3 or fewer stages are shifted, the SK values fall below unity for some non-RFI channels, but they are well behaved when at least 4, and up to 10, stages are shifted. This range of 7 bits in x, y corresponds to a range of 14 bits in P and 28 bits in P 2 , yet the SK performance is unchanged over this range. For our operational setting, we have settled on shifting the first 5 stages, which gives some headroom in case of flares while keeping the precision high. Although not relevant to SK operation, we note that once the FFT shift is set, it is a more delicate matter to keep the X data in range due to its cast to only 4 bits prior to cross-correlation, via the "Cast" block after the equalizer block (labeled "EQ multiply" in Figure 2). This is done by applying a 34 × 128 array of equalizer coefficients (yellow block in the figure), which can vary by band and channel, with 128 coefficients per band so that each applies to 32 of the 4096 channels in each band. The equalizer coefficients are adjusted by a calibration procedure that seeks to make the rms of the 4-bit autocorrelations lie in the range 2 to 3.

EOVSA SK Performance Tests
As described in §2, the normal mode of operation of EOVSA consists of cycling a user-defined tuning sequence composed from a set of 34 predefined 400 MHz-wide IF bands (will be 600 MHz-wide when timing is met), some of them repeated, such that the entire (1-18 GHz) frequency range is covered in 1 s by 50 independent tunings, resulting in 19-ms-long power and power-square accumulations (M = 1792 raw FFT spectra), which are divided into 4096 frequency channels (∼ 0.098 MHz resolution). If the realtime RFI-excision algorithm is activated, the high-frequency-resolution S 1 and S 2 accumulations are then used to compute SK estimators according to equation 1 (N = 1, d = 1). SK values that fall outside the thresholds are interpreted as RFI-contaminated channels, which are excluded from a subsequent frequency integration performed in order to produce the final 1 s time-resolution dynamic spectrum characterized by, typically, ∼ 500 "science channels" (with variable frequency resolution up to ∼ 40 MHz). However, for the purpose of evaluating the performance of the hardware-embedded SK engine, we first analyze a data segment obtained by dwelling on a fixed 400 MHz IF band (9.6-10 GHz), which happens to be virtually free of RFI.
The results of statistical analysis displayed in Table 1, which was performed for the 13 EOVSA antennas, 2 linear polarizations labeled XX and YY, 4096 frequency channels, and 1000 contiguous accumulations of M = 1792 raw FFT spectra each, demonstrates a full agreement of results with the theoretical expectations of Nita & Gary (2010a,b). Indeed, despite the variations of bandpass shapes of individual antennas and polarizations, the distribution of the observed 4,096,000/antenna/polarization SK samples displayed in Table 1 have frequency-and polarization-independent means close to unity, and sample variances consistent with the theoretical expected variance σ SK = 2/ √ M = 0.047245 corresponding to M = 1792. The accuracy of the Pearson Type IV approximation of the SK PDF derived by Nita & Gary (2010a) is consistently confirmed by the observed percentages of flagged data by a pair of asymmetric thresholds, SK ≤ 0.87145 and SK ≥ 1.15800, chosen to correspond to an intended 0.13499% PFA. In the next performance test, we tuned the EOVSA instrument to a fixed 400 MHz IF band prone to RFI contamination (3.6-4.0 GHz), and then commanded the antennas to point from an on-Sun position to off-Sun, to simulate a change in total power level that might be seen during the evolution of a solar burst. Although S 1 thus varies by about a factor of 3, the value of the SK estimator is expected to be unchanged due to its strict lack of dependence on changes of the background power level. Figure 2a displays the XX-polarization dynamic power spectrum for one of the antennas obtained in the 3.6 − 4 GHz frequency range, with frequency resolution ∆f = 0.097 MHz, and time resolution ∆t = 0.02 s, the latter corresponding to an accumulation length of M = 1792 raw FFT spectra. The change of the background on a time scale of about 2 s during the on-off Sun transition is evident. The sharp transition to higher power level at about 17:32:19 UT is due to the automatic gain control switching out 2 dB of attenuation to compensate for the decreased radio frequency (RF) level, which affords us the opportunity to characterize the SK performance for both gradual and abrupt power-level changes. Also apparent from Figure 2a is the nonuniform bandpass shape reflecting the (temporary) overlap of the lower half of the band mentioned in §2. Relatively faint RFI contamination may be observed superposed on the background spectrum in some time-frequency bins that are crossed by the vertical dotted line and one of the two horizontal dotted lines that mark one instant of time and two frequency channels selected for a more detailed analysis presented in the subsequent panels of Figure 2. Figure 2b displays a color-coded dynamic spectrum of SK flags corresponding to the power spectrum shown in panel 2a. The green uniform background indicates the RFI-free time-frequency bins where SK is within the calculated thresholds. The black and yellow regions indicate the presumably RFI-contaminated time-frequency bins characterized by SK ≤ 0.87145 and SK ≥ 1.15800, respectively, which represent the theoretical M = 1792 RFI thresholds corresponding to an intended 0.13499% PFA. The clustered yellow SK ≥ 1.15800 flags clearly seen in the SK flag spectrum indicate the presence of RFI signals characterized by < 50% duty-cycle relative to the integration time, while the more uniformly scattered SK ≤ 0.87145 and SK ≥ 1.15800 flags mostly correspond to RFI-free SK statistical fluctuations, as we confirm by comparing the actual percentage of the SK ≤ 0.87145 flags, i.e. 0.20093%, with the theoretically expected 0.13499% PFA, as opposed to the statistically significantly larger 0.50789% percentage of SK ≥ 1.15800 flags. Figure 2c displays the accumulated power spectrum corresponding to the vertical dotted-line marker shown in Figure 2a and 2b. Superposed on the variable-shape bandpass spectrum, relatively strong RFI contamination is clearly seen in the 3.71 − 3.74 GHz frequency range. The corresponding SK spectrum displayed in Figure 2d generally reveals a flat unity mean SK distribution bounded by the computed 0.13499% PFA thresholds (horizontal red lines). The RFI contamination in the 3.71 − 3.74 GHz range is clearly flagged by larger than unity SK values, which are indicative of RFI contamination with < 50% duty-cycle relative to the integration time (Nita, 2016;Nita & Gary, 2016). Figure 2e displays the evolution of the integrated power at the upper of the two horizontal dotted-line markers, corresponding to 3.75 GHz. The progressive drop of intensity from on-to off-Sun levels is evident. The abrupt increase of the power level at 17:32:19 UT due to the automatic gain control is also evident. The corresponding SK lightcurve displayed in Figure 2f, which fluctuates within the statistical 0.13499% PFA thresholds (horizontal red lines), is remarkably independent of the variations seen in the background power level, which indicates no RFI contamination in the selected frequency channel. Figure 2g displays the evolution of the integrated power at the lower horizontal marker, 3.73 GHz, which, in contrast with the time evolution seen in the almost adjacent 3.75 GHz channel, features strong signal variations superimposed on a background level similar to the adjacent RFI-free frequency channels. The corresponding SK evolution at 3.73 GHz is shown in Figure 2h. The varying SK ≥ 1.15800 values indicate intermittent RFI contamination with evolving < 50% duty-cycles relative to the integration time. Nevertheless, the SK time profile clearly demonstrates the ability of the SK-based RFI excision algorithm to adaptively select contiguous RFI-free time segments suitable for science investigations, as opposed to a non-adaptive procedure that would discard entirely such heavily RFI-contaminated channels.
In a final test of the SK performance, in Figure 3 we present a data segment recorded by the EOVSA instrument during the GOES C8.6 flare of 2016-07-10 00:53:00 UT. However, since at the time of this event the real-time RFI excision algorithm was turned off, we do not have the full-resolution SK data, but rather only S 1 , S 2 and M accumulated over the wider, and variable-width, science channels. The dynamic spectrum ranges from 2.86-17.98 GHz, and the time resolution is 1 s, which, as explained in §2, are a result of the tuning cycle of 1 s and a frequency-domain integration resulting in 134 science channels. Moreover, prior to the science-band integration, the S 1 and S 2 accumulations corresponding to any repeated IF bands during the 1 s cycle were added together. As the result of these operations, the reduced-resolution S 1 and S 2 corresponding to the science channels are characterized by different accumulation lengths M . For the data set analyzed in Figure 3, the number of accumulations range from M = 164, 864 = 90 × M 0 for the narrower science bands at lower frequencies, to M = 1, 924, 608 = 1074×M 0 for the broader (and repeated) science bands at higher frequencies, where M 0 = 1792 is the output accumulation length of the EOVSA correlator. Figure 3a displays the raw S 1 spectrum that results from this procedure. Except for a strong RFIcontaminated channel, the raw solar spectrum appears almost featureless due uncalibrated, frequencydependent gain variations, and to the strong gradient of the quiet Sun signal, which reaches more than 500 solar flux units (sfu; 1 sfu = 10 −22 W m −2 Hz −1 = 10000 Jy) at high frequencies. Only after subtracting this background signal and applying calibration does the < 50 sfu solar burst become clearly visible. The calibrated power spectrum derived from S 1 , which is displayed in Figure 3b, reveals the flare spectrum composed of a more or less smooth envelope, on top of which a set of clustered fine spectral features are clearly visible. However, except for the clearly RFI-contaminated channel near 6 GHz, it is not immediately evident whether other frequency-time bins are affected by RFI. Although the frequency-time integration is an irreversible process, the availability of the S 1 , S 2 and M still allows the computation of an SK estimator for each time-frequency bin, and thus discrimination of those bins affected by RFI signals.
Similar to the procedure used to obtain the RFI flags displayed in Figure 2b, we show in Figure 3c the RFI flag dynamic spectrum obtained by assigning a flat unity value to all spectral bins characterized by SK that fluctuate within the 1 ± 6/ √ M range (black background), while saturating to these Mdependent values all spectral bins characterized by SK values ranging below or above these limits. The choice of these 1 ± 3σ SK symmetric thresholds, instead of the exact 0.13499% PFA thresholds used in Figure 2b, is motivated by the extremely large M values, for which the Gaussian approximation of the true SK PDF is fully justified (Nita & Gary, 2010a). Figure 3c reveals that the RFI contamination is stronger than suggested by the visual analysis of the dynamic spectra shown in Figure 3a,b. Indeed, a quantitative analysis demonstrates that 12.57% of the time-frequency bins feature SK values above the upper 1+6/ √ M threshold, while no spectral bin is found to have SK ≤ 1 − 6/ √ M . Remarkably, the SK flagged power spectrum shown in Figure 3d, demonstrates that, with a few noticeable exceptions discussed below, the vast majority of the SK flags are not related to the flare evolution. This demonstrates the ability of the SK estimator to automatically discriminate RFI contamination against natural spectral features, even in the case of fine spectral structures as those seen superposed on the flare background. Nevertheless, a few flagged spectral bins coincide with some of the fine spectral features, which may be just simple coincidence, or the result of a fast evolution of the such spectral features at a time scale shorter than the integration time, as theoretically expected by Nita (2016) and experimentally demonstrated by Nita & Gary (2016) using data recorded by an EOVSA prototype instrument during a flare featuring solar radio spikes. Therefore, aside from these few exceptions, the SK flags may be safely attributed to RFI contamination, and corresponding data may be safely discarded from scientific analysis of the event, particularly imaging when available, given the fact that EOVSA is capable designed to produce a full Sun image for very time-frequency bin of the dynamic spectrum.
We also find noteworthy the fact that none of the SK flags are found to correspond to SK values less than unity, which would be the case for an RFI signal lasting longer than the instrumental integration time (Nita, 2016;Nita & Gary, 2016), which in the particular case of the EOVSA tuning sequence may range anywhere between 20 ms (for non-repeated bands) and 60 ms (for bands repeated 3 times). However, rather than asserting that no such RFI signals were mixed in the data, we point out that, even for a purely continuous, narrowband RFI signal, which would be detected as less than unity by the real-time RFI excision algorithm, the integration across frequency channels to form the wider-band science-frequency channels effectively reduces the contribution of RFI to the integrated signal to a percentage much smaller 50%, and thus results in SK values larger than unity.
We conclude this analysis by pointing out that, had the automatic RFI excision algorithm been active at the time of the flare, most of the 12.57% bins contaminated by RFI could have been saved from being fully discarded, and the scientific return of the instrument significantly increased. .98] GHz frequency range. a) Raw (uncalibrated, not background-subtracted) S 1 spectrum. b) Calibrated and backgroundsubtracted power spectrum that reveals a more or less smooth flare envelope, on top of which a set of clustered fine spectral features are clearly visible. The RFI contaminated channel around 6 GHz is also visible in this panel. c) RFI flag spectrum corresponding to variable 1 + 6/ √ M thresholds, as explained in the main text. A much larger percent of RFI contaminated channel than suggested by the plots shown in a and b is revealed. d) RFI-flagged dynamic spectrum. Remarkably, almost none of the observed fine structures are flagged, which confirms their solar origin.

Discussion and Conclusion
Although not a test of EOVSA's SK performance per se, we can use EOVSA data to better illustrate in Figure 4 the contrast between using its output as an SK spectrometer versus its output as a traditional spectrometer lacking specially designed SK capabilities.
We already showed an example in Figure 2d of the SK estimator for a single, 20 ms snapshot, where the lower and upper thresholds for M = 1792, N = 1 and d = 1 are 0.87145 and 1.15800, respectively. Figure 4a,c are the same as Figure 2c,d, to aid comparison. In addition, Figure 4b also shows the powersquared accumulation corresponding to the same 20 ms time slot. As indicated in the Figure 4c inset, the 0.13499% PFA SK thresholds (red horizontal lines) flag 2.17% of the 4096 frequency channels as being contaminated by RFI. For later comparison, the inset also gives the mean and standard deviation of RFI occupancy (1.72±0.44%) for 50 consecutive 20 ms samples. When compared with the much lower 0.13499% PFA corresponding to the RFI thresholds, the 0.44% standard deviation corresponding to the analyzed sequence clearly indicates a true change in RFI occupancy from one accumulation to another. We consider this finding to be fully consistent with the observed, larger-than-unity SK values, which can only arise as the result of RFI signals that change their dynamics at time scales shorter than the integration time Nita, 2016), and thus may not be necessarily active, or detectable, in all consecutive accumulations. Now imagine that the EOVSA correlator does not provide power-squared information, but instead provides only accumulated power, s 1 . We can still calculate SK using the generalized SK formula (Equation 1), where S 1 and S 2 are accumulated from sums of s 1 and s 2 1 over some time interval. The middle row of Figure 4 illustrates this approach for a sum of 50 consecutive spectral samples (1 s) of the same data as in Figure 2a, where now the generalized SK is computed using M = 50, N = 1792, and d = 1. Figure 4d displays the 1 s accumulated power, while Figure 4e displays the sum of the squares of the same sequence of accumulated power spectra. Based on these inputs, we calculate the generalized SK spectrum displayed in Figure 4f, which, despite being computed from a much larger data set, is affected by much larger statistical fluctuations than those corresponding to the SK plot displayed in Figure 4c. However, this is an expected result (Nita & Gary, 2010b), since the statistical fluctuations affecting the generalized SK estimator are ultimately dictated by the much smaller number of terms entering the outer sum, i.e M = 50 versus M = 1792.
Nevertheless, the confidence level of the RFI flags is not dictated by the absolute value of the SK statistical fluctuations, but by the accuracy with which the RFI threshold are computed for a predefined PFA (Nita & Gary, 2010a,b). Indeed, the lower and upper thresholds, shown by red horizontal lines in Figure 4c, which are now 0.50077 and 1.71615, respectively, do not appear to be crossed by more than the expected 0.13499% PFA of SK values, except for the clearly RFI contaminated spectral region flagged in both (c) and (f) panels.
From this perspective, the significantly larger percentage of RFI occupancy, 7.06%, produced by the generalized SK analysis over the same data segment on which the sequence of 50 SK estimators indicates only 1.72 ± 0.44% RFI occupancy, cannot be attributed to a less accurate generalized SK estimation, but to the dynamical nature of the RFI signals themselves, which during a longer time window contaminate more frequency channels than the number contaminated in a single 20 ms period. This conclusion is also supported by much larger generalized SK values, ( SK max ∼ 86.5 vs. SK max ∼ 2.9), which is a immediate consequence of the RFI signals not being continuously active in each frequency bin (Nita, 2016).
To conclude our comparison between the SK and generalized SK approaches, we display on the bottom row of Figure 4 the result of the SK analysis involving both outputs of the EOVSA spectrometer over the entire 1 s time window, from which accumulations of M = 50×1792 power and power-squared samples may be generated (panels g and h, respectively). To facilitate a direct comparison of the SK spectra displayed on panels (f) and (i), the vertical scale of the SK plot shown in panel (i) has been zoomed-in by a ∼ 30 : 1 factor, which demonstrates their statistical similarity. Remarkably, when compared with the 0.13499% PFA, the difference between the 7.13% and 7.06% percentages of RFI flags is not statistically significant, which quantitatively demonstrates the statistical equivalence of the SK and generalized SK estimators in terms of their RFI sensitivity, when computed based on data spanning the same time window. Although, in the light of this comparison, it may be concluded that the same RFI detection performance is obtained by generalized SK without the need of doubling the data output of a spectrometer, our comparison also demonstrates that this may come at the cost of flagging and discarding potentially good data (of order 5% in this example), and on a time scale much larger than the native resolution of the instrument. Moreover, if the astronomical signal of interest is expected to vary on a time scale shorter than the analysis time window, as noted in Nita et al. (2007) (see also Nita, 2016;Nita & Gary, 2016), flagging of such astronomical transients will occur. Therefore, all these aspects should be considered when designing a new instrument.
In any case, as we already pointed out in the introductory section, a native SK design may extend the potential uses of the SK spectrometer well-beyond its originally intended scope by providing an experimental means for obtaining accurate signal-to-noise-ratio and duration estimates for both Gaussian and non-Gaussian transients (Nita, 2016;Nita & Gary, 2016). Moreover, due to the additive property of both its outputs in both frequency and time, a native SK spectrometer may also offer versatile means of employing, in an automated manner, the concept of multi-scale SK analysis (Gary et al., 2010), not only for improving the sensitivity of RFI excision (Gary et al., 2010), but also for the purpose of automatic detection and classification of the statistical nature of signals (Nita, 2016). A direct illustration of the multi-scale SK analysis concept is provided by the right column of Figure 4, where the maximum values of SK vary from SK max ∼ 2.9 in panel (c), to SK max ∼ 86.5 in panel (f), and SK max ∼ 3.1 in panel (i), as different duty cycles of RFI are generated for the same data.
We have described the hardware implementation of a correlator for EOVSA whose F-engine is designed to accumulate the sums of power and power-squared needed to compute the SK estimator defined in Equation 1. Using actual data from EOVSA, we have demonstrated that it performs exactly as expected in a series of stringent tests, for RFI-free data (Table 1), data with RFI ( Figure 2), and data containing both RFI and solar radio bursts (Figure 3).
While the SK concept is simple and easy to apply, the computation of the correct thresholds is more involved in the general case. To help with this, we include some example code written in the IDL language in Appendix 1. We note that the thresholds are typically just constants for data from a given instrumental setup. For example, for the EOVSA case the M = 1792, N = 1 and d = 1 values are fixed, hence the thresholds do not vary for the real-time SK calculation. The accompanying 20 ms accumulated power-squared spectrum output. c) Computed SK spectrum (black), and 0.13499% PFA threshold levels (red horizontal lines) from Figure 2d. d) Sum of 50 consecutive 20-ms power accumulations covering 1 s of data, starting with the time frame illustrated on the top panels. e) Sum of the squares of the same 50 consecutive 20 ms power accumulations. f) Computed generalized SK spectrum for the band (black) and the threshold levels (red horizontal lines). The same vertical scale as in panel (c) has been imposed to facilitate comparison of the SK and generalized SK fluctuations. g) The same 1 s accumulated power spectrum shown in panel (d). h) The 1 s accumulated power-squared spectrum directly obtained by summing the 50 consecutive 20 ms power squared accumulations natively produced by the EOVSA spectrometer. i) The SK spectrum computed based on the 1 s power and power-squared accumulations. The vertical scale of the SK plot shown in panel (i) has been zoomed-in by a ∼ 30 : 1 factor to illustrate the practical statistical equivalence of the the 1 s SK and generalized SK spectra, despite the ∼ 30 : 1 ratio of their absolute statistical fluctuations. In each panel on the right column we indicate the actual M , N and d values entered in Equation 1 to compute the SK spectra. The percentages of RFI flagged channels are displayed for each SK plot in the right column. In panel (c), the mean and standard deviation of the RFI percentages corresponding to each of the 50 consecutive time frames are also displayed.