This study introduces SST, a low-rank optimization method that achieves near full-rank performance in AI model training while drastically reducing trainable parameters. Tested on the OPT language model and Hyperbolic Graph Neural Networks (HGNNs), SST outperformed LoRA and ReLoRA across multiple benchmarks — from zero-shot NLP evaluations to node classification and link prediction. The results show that SST offers a more efficient and scalable alternative for training large models without sacrificing accuracy or generalization.This study introduces SST, a low-rank optimization method that achieves near full-rank performance in AI model training while drastically reducing trainable parameters. Tested on the OPT language model and Hyperbolic Graph Neural Networks (HGNNs), SST outperformed LoRA and ReLoRA across multiple benchmarks — from zero-shot NLP evaluations to node classification and link prediction. The results show that SST offers a more efficient and scalable alternative for training large models without sacrificing accuracy or generalization.

SST vs LoRA: A Leaner, Smarter Way to Train AI Models

2025/10/30 16:07

Abstract and 1. Introduction

  1. Related Work

  2. Low Rank Adaptation

    3.1 LoRA and 3.2 Limitation of LoRA

    3.3 ReLoRA*

  3. Sparse Spectral Training

    4.1 Preliminaries and 4.2 Gradient Update of U, VT with Σ

    4.3 Why SVD Initialization is Important

    4.4 SST Balances Exploitation and Exploration

    4.5 Memory-Efficient Implementation for SST and 4.6 Sparsity of SST

  4. Experiments

    5.1 Machine Translation

    5.2 Natural Language Generation

    5.3 Hyperbolic Graph Neural Networks

  5. Conclusion and Discussion

  6. Broader Impacts and References

Supplementary Information

A. Algorithm of Sparse Spectral Training

B. Proof of Gradient of Sparse Spectral Layer

C. Proof of Decomposition of Gradient of Weight

D. Proof of Advantage of Enhanced Gradient over Default Gradient

E. Proof of Zero Distortion with SVD Initialization

F. Experiment Details

G. Singular Value Pruning

H. Evaluating SST and GaLore: Complementary Approaches to Memory Efficiency

I. Ablation Study

5.2 Natural Language Generation

We utilize the OPT [9] architecture as the baseline for our language generation experiments. All models are pre-trained on OpenWebText [39], an open-source reproduction of OpenAI’s WebText. To facilitate fair comparisons across different OPT model sizes, we standardize the total training tokens for all models at 19.7 billion. A consistent rank (r = 64) is applied for all low-rank methods.

\ Table 3 displays the validation perplexity results on the OpenWebText dataset across different sizes of OPT models. The results indicate that SST not only achieves lower perplexity scores compared to LoRA and ReLoRA* but also approximates the performance of full-rank training, with significantly fewer trainable parameters.

\ Figure 2: Comparison of performance on effective steps between SST and full-Rank training. Effective steps are quantified by multiplying the number of trainable parameters by the number of steps taken. All methods and model sizes utilize the same number of tokens in each step.

\ Figure 2 illustrates a comparison of effective steps among various training methods. The effective step metric, which considers both the number of trainable parameters and the number of training steps, demonstrates that SST offers a more efficient training approach compared to the full-rank method.

\ Each pretrained model undergoes zero-shot evaluations on all 16 NLP tasks used in OPT article [9], including ARC Easy and Challenge [40], HellaSwag [41], OpenBookQA [42], PIQA [43], StoryCloze [44], SuperGLUE [45], WinoGrad [46], and WinoGrande [47]. Evaluations are conducted using the LM Evaluation Harness framework [48]. Except for the ReCoRD task, which uses F1 score, all other tasks are evaluated using accuracy.

\ Table 4 details the zero-shot evaluation results across the 16 NLP tasks. SST consistently performs comparably or better than other low-rank methods and shows competitive performance against the full-rank models.

\ We further conduct an analysis experiment on inference by doing post-training singular value pruning on SST model (see appendix G).

\

5.3 Hyperbolic Graph Neural Networks

Hyperbolic Graph Neural Networks (HGNNs) [11, 12] capitalize on the expansive and hierarchical nature of hyperbolic space to efficiently manage and analyze graph-structured data. This geometric space is particularly suitable for graphs due to its ability to closely mimic the underlying data structures with minimal distortion, offering a substantial improvement over traditional Euclidean methods.

\ \ Table 3: Validation perplexity on OpenWebText across various OPT model sizesalong with the number of trainable parameters of each method. Rank r = 64. Values in bold highlight the highest performance among the low-rank methods.

\ \ We evaluated the effectiveness of SST on HyboNet [12] version HGNN in node classification and link prediction across four distinct datasets: Airport [11], Cora [49], Disease [50], and PubMed [51]. Each experiment was conducted with three random seeds.

\ \ Table 4: Zero-shot evaluations on the same 16 NLP tasks featured in the OPT article [9]. Except for the ReCoRD task, which uses F1 score, all other tasks are evaluated using accuracy, with values presented as percentages. Mean scores in bold represent superior performance among the low-rank methods. Additionally, we include the win percentage (counting ties) for each low-rank method compared to the full-rank training.

\ \ \ Table 5: Node Classification and Link Prediction Results. Model’s dimension d = 16. Results are reported as test F1 scores for node classification and test precision for link prediction, expressed in percentages. Values highlighted in bold represent the highest performance among the low-rank methods, while those marked with an “*” denote performance that exceeds that of the full-rank variants.

\ \ The results, detailed in Table 5, demonstrate strong performance in both node classification and link prediction tasks. SST not only shows comparable performance to full-rank training (exceeding it in the Disease link prediction task) but also significantly outperforms LoRA at equivalent ranks. Notably, SST’s advantage over LoRA is larger on r = 1 than r = 2, likely due to SST’s sampling strategy being particularly effective in sparser scenarios.

:::info Authors:

(1) Jialin Zhao, Center for Complex Network Intelligence (CCNI), Tsinghua Laboratory of Brain and Intelligence (THBI) and Department of Computer Science;

(2) Yingtao Zhang, Center for Complex Network Intelligence (CCNI), Tsinghua Laboratory of Brain and Intelligence (THBI) and Department of Computer Science;

(3) Xinghang Li, Department of Computer Science;

(4) Huaping Liu, Department of Computer Science;

(5) Carlo Vittorio Cannistraci, Center for Complex Network Intelligence (CCNI), Tsinghua Laboratory of Brain and Intelligence (THBI), Department of Computer Science, and Department of Biomedical Engineering Tsinghua University, Beijing, China.

:::


:::info This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.
Share Insights

You May Also Like

Whales Dump 100M ADA as Cardano Struggles Below $0.70

Whales Dump 100M ADA as Cardano Struggles Below $0.70

Whales offload 100M ADA, pushing Cardano price below $0.70. Analysts eye breakout as Grayscale ETF speculation fuels investor optimism. Solana gains traction with $69.5M ETF inflows boosting confidence. Cardano holders have faced a turbulent few days as large investors offloaded massive amounts of ADA. According to Ali Martinez, whales sold nearly 100 million ADA within just three days, creating noticeable selling pressure in the market. The cryptocurrency now hovers below the $0.70 mark, struggling to overcome its current resistance level. The wave of selling has stirred short-term uncertainty among retail investors. However, Cardano’s fundamentals remain firm, supported by strong development activity and increasing total value locked across its DeFi ecosystem. These indicators show that while prices fluctuate, network growth continues steadily behind the scenes. At the same time, the broader crypto market is showing weakness. Bitcoin trades near $110,925 after a slight dip, while Ethereum remains around $3,930. Despite the broader slump, sentiment within the Cardano community has not turned bearish, as optimism builds around potential catalysts. 100 million Cardano $ADA sold by whales in 72 hours! pic.twitter.com/2VXsZnx90m — Ali (@ali_charts) October 29, 2025 Also Read: Analyst: “XRP Structure Remains Intact” – See Multiple Price Targets ETF Speculation Ignites Optimism Among Cardano Investors Ali Martinez noted that Cardano may be preparing for a significant rebound. He explained that a confirmed break above $0.80 could open the path toward $1.70, signaling strong upside momentum. Many traders are now monitoring that level closely as a possible trigger for the next rally. Meanwhile, attention is focused on the potential Grayscale Cardano ETF. The fund recently reached its SEC decision deadline without an announcement, fueling speculation that it could launch soon. Such a move would allow institutional investors to gain regulated exposure to ADA, potentially driving fresh inflows into the market. Experts believe the ETF could play a crucial role in ADA’s price recovery. Grayscale’s recent filings show that Cardano meets the SEC’s rule 19b-4 listing standards, meaning the ETF could list without direct approval. Consequently, even moderate institutional demand could lift Cardano’s market cap and price in the near term. Solana Whale Transfer Sparks Market Attention An on-chain alert from Whale Alert showed 1,097,555 SOL tokens moving from a verified Coinbase Institutional wallet to a new address. The large transaction fueled speculation about institutional investors expanding their Solana exposure. Analysts noted the timing aligned with Bitwise confirming $69.5 million in first-day inflows for its spot Solana ETF ($BSOL), nearly 480% higher than $SSK’s debut, reflecting strong institutional interest in Solana. Hence, while Cardano faces temporary selling pressure, the broader altcoin market remains dynamic. Both ADA and SOL continue to attract significant institutional attention, suggesting that investor interest in major blockchain ecosystems is far from fading. Also Read: Egrag Crypto Says “XRP Family is Under Attack,” Here’s Why The post Whales Dump 100M ADA as Cardano Struggles Below $0.70 appeared first on 36Crypto.
Share
Coinstats2025/10/30 21:37