Bing Search Revamped: Faster and Smarter Results

Microsoft has revamped Bing’s search technology by integrating advanced language models, promising cost savings alongside quicker and more accurate search outcomes.

The latest updates feature a combination of large language models (LLMs), small language models (SLMs), and cutting-edge optimization strategies aimed at refining search performance.

Advancements in Search Technology

Microsoft revealed these enhancements in a recent announcement, emphasizing their commitment to improving search. According to the company:

“At Bing, innovation drives our approach to search. By utilizing both Large Language Models (LLMs) and Small Language Models (SLMs), we’ve reached a pivotal moment in boosting search efficiency. While transformer models have been effective, evolving user demands require even more capable systems.”

Balancing Performance and Efficiency

Implementing LLMs often raises challenges related to speed and operational costs. To address this, Bing developed SLMs, which are said to operate 100 times faster than traditional LLMs. The company elaborated:

“LLMs are resource-intensive and slow to execute. To enhance performance, we’ve trained SLMs, achieving approximately 100x throughput improvement over LLMs, resulting in faster and more precise query processing.”

Additionally, Bing employs NVIDIA’s TensorRT-LLM technology to optimize the performance of these models. TensorRT-LLM enables faster and more cost-effective execution of large models on NVIDIA GPUs.

Enhancing Deep Search

Microsoft’s technical documentation highlights how integrating TensorRT-LLM has transformed Bing’s “Deep Search” functionality, which uses SLMs to deliver relevant results in real-time.

Previously, Bing’s transformer model exhibited a latency of 4.76 seconds per batch (20 queries) and a throughput of 4.2 queries per second per instance. With TensorRT-LLM, latency has dropped to 3.03 seconds per batch, while throughput increased to 6.6 queries per second per instance. This represents a 36% reduction in latency and a 57% improvement in operational efficiency.

Microsoft affirmed:

“Our mission is to deliver the best search experience without compromising quality. TensorRT-LLM allows us to reduce inference time, enhancing overall response speed while maintaining top-notch results.”

Benefits for Users

These advancements offer several advantages for Bing users:

Faster and more responsive search results
Enhanced accuracy through SLMs, ensuring better context in answers
Improved cost-effectiveness, enabling future innovations and upgrades

The Significance of Bing’s Strategy

By adopting a hybrid approach of LLMs, SLMs, and TensorRT optimization, Bing is positioning itself as a leader in handling increasingly complex user queries.

As search engines strive to meet growing expectations, Bing’s use of smaller, highly optimized models demonstrates how modern search can balance speed, precision, and efficiency.