Overview & Recommendations
This document outlines the research plan to identify a robust and production-ready toolchain for wavelet analysis of biological signals. The goal is to find libraries for multi-signal analysis, identify suitable validation datasets, confirm methods for statistical significance, and select a framework for interactive visualization. The chart below summarizes the top candidates against key project requirements.
Top Candidate Feature Matrix
RQ1: Multi-Signal Wavelet Analysis Libraries
The primary goal is to identify Python libraries supporting wavelet coherence and cross-wavelet analysis, which are essential for studying interactions between biological time series. The focus is on production-ready libraries with strong support for the Morlet wavelet, good documentation, and active maintenance.
Library Feature Comparison
Comparison of the leading candidates for core wavelet analysis functionalities.
| Feature | pycwt | PyWavelets | ssqueezepy |
|---|---|---|---|
| Cross-Wavelet / Coherence | ✔ Yes | ✘ No | ✔ Yes |
| Morlet Wavelet Support | ✔ Yes | ✔ Yes | ✔ Yes |
| Statistical Significance | ✔ Built-in | ✘ Manual | Partial |
| Active Maintenance (2023+) | ✔ Yes | ✔ Yes | ✔ Yes |
| License | BSD-3-Clause | MIT | MIT |
RQ2: Validation Datasets
To ensure the reliability of the chosen tools, we must validate them against datasets with known, ground-truth periodicities. This research question focuses on identifying publicly available biological time series, such as circadian rhythm or cell cycle data, that are standard benchmarks in the field.
Public Data Sources
- CircaDB: A comprehensive database for circadian gene expression data.
- BioClock Database: Curated datasets from biological clock research.
- Published Papers: Supplementary data from key publications in the field.
- Tool Repositories: Example and test datasets bundled with existing libraries.
Validation Criteria
- Verified Ground Truth: Data must have confirmed periodicities (e.g., 24h for circadian).
- Clear Protocols: Use validation methods from published studies.
- Error Margins: Define acceptable error for period detection.
- Synthetic Data: Use signal generators for controlled testing scenarios.
RQ3: Statistical Significance Testing
A critical step in wavelet analysis is determining whether observed periodicities are statistically significant or likely due to random chance. This involves testing against a null hypothesis, often a red-noise (AR1) process. We need to identify libraries with built-in or easily adaptable methods for these tests.
Significance Testing Methods in Libraries
Comparing built-in support for standard null hypothesis testing methods.
RQ4: Interactive Visualization Libraries
The output of a wavelet analysis is often a 2D scalogram (a heatmap of power across time and frequency). For effective exploration, these visualizations must be interactive. This research area focuses on identifying JavaScript or Python libraries capable of producing responsive scalograms with features like tooltips, overlays for significance, and the Cone of Influence (COI).
Example Interactive Scalogram (Plotly.js)
This is a demonstration of an interactive heatmap. Hover over cells to see data. The shaded area represents a simulated Cone of Influence (COI), outside of which results are less reliable.