You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
feng qishan e73bc28762
delete
2 months ago
configs 提交 2 months ago
data_provider 提交 2 months ago
exp 提交 2 months ago
layers 提交 2 months ago
models 提交 2 months ago
scripts 提交 2 months ago
utils 提交 2 months ago
.gitignore 提交 2 months ago
LICENSE 提交 2 months ago
README.md 提交 2 months ago
anomaly_detection.py 提交 2 months ago
classification.py 提交 2 months ago
imputation.py 提交 2 months ago
long_term_forecast.py 提交 2 months ago
pretrain.py 提交 2 months ago
requirements.txt 提交 2 months ago
short_term_forecast.py 提交 2 months ago

README.md

SymTime NeurIPS 2025

This code is the official PyTorch implementation of our NeurIPS'25 paper: Synthetic Series-Symbol Data Generation for Time Series Foundation Models.

Introduction

Due to issues such as data privacy and acquisition difficulties, existing large-scale time series datasets face severe data shortages and imbalanced distribution compared to images and natural language. Foundation models pre-trained on these datasets will have certain prediction biases, reducing their generalization and robustness.

Inspired by complex dynamic system theories, we design a series-symbol data generation mechanism, enabling the unrestricted creation of high-quality time series data paired with corresponding symbolic expressions. To leverage series-symbol data pairs with strong correlations, we develop SymTime, a pre-trained foundation model for enhancing time series representation using symbolic information, which demonstrates competitive performance across five major TSA tasks, rivaling foundation models pre-trained on real-world datasets.

SymTime

🧭 Quickstart

Installation

First create a Python virtual environment (preferably version 3.10.15), then install the required dependencies by running the following command:

pip install -r requirements.txt

Data Preparation

SymTime relies on a large-scale series-symbol bimodal dataset generated by S2Generator during pre-training. You can generate the data required for pre-training by executing the following script:

bash ./scripts/s2generator.sh

For the fine-tuning datasets, you can load from OneDrive or BaiduCloud. Then place the downloaded datasets under the folder ./datasets.

Model Pre-Train and Fine-Tune

Once you have generated enough time series data using S2Generator, you can pre-train SymTime by executing our pre-training script:

bash ./scripts/SymTime_pretrain.sh

If you want to skip the time-consuming pre-training phase, you can directly download our pre-trained model parameters from OneDrive or BaiduCloud and put them under ./models/params/ for fine-tuning on downstream tasks:

# Long-term time series forecasting
bash ./scripts/long_term_forecasting/SymTime_ECL.sh

# Short-term time series forecasting
bash ./scripts/short_term_forecasting/SymTime_M4.sh

# Time series classification
bash ./scripts/classification/SymTime_EthanolConcentration.sh

# Time series imputation
bash ./scripts/imputation/SymTime_ECL.sh

# Time series anomaly detection
bash ./scripts/anomaly_detection/MSL.sh

📊 Results

Main Results

Compared with other models for general time series analysis tasks, SymTime, which has been pre-trained with mask modeling and cross-modal contrastive learning, can achieve SOTA results in fine-tuning of downstream tasks and has lower model complexity.

main

Benchmark Results

We present the experimental results on the TimesNet benchmark. Compared with the current more advanced models, SymTime can achieve better experimental results.

benchmark

Dataset and Representation Learning

We generate a large amount of series-symbol bimodal data using S2Generator for pre-training of mask modeling and contrastive learning. Therefore, we first verify the representation coverage of the synthetic data compared with real-world time series dataset.

coverage

Then, we visualize the representation space of time series encoder (a)(b) and symbolic expression encoder (c)(d) in SymTime before and after pre-training. The paired time series and symbolic expressions form distinct clustering features, demonstrating the effectiveness of our pre-training paradigm.

representation

🎓 Citation

If you find this code useful, please cite our paper.

@misc{wang2025syntheticseriessymboldatageneration,
      title={Synthetic Series-Symbol Data Generation for Time Series Foundation Models}, 
      author={Wenxuan Wang and Kai Wu and Yujian Betterest Li and Dan Wang and Xiaoyu Zhang},
      year={2025},
      eprint={2510.08445},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2510.08445}, 
}

🎖️ Acknowledgement

We appreciate the following GitHub repos a lot for their valuable code and efforts.

🤗 Contact

If you have any questions or are interested in our view on the complex dynamics of time series, feel free to contact: