Search for enhanced power generation (RAG) – Introduction

0 0 5 minutes read

Search for enhanced power generation (RAG) – Introduction

this! It gave me a good answer, and then it started to hallucinate. We all heard or experienced it.

Natural language generative models sometimes hallucinate that they start to generate text, which is not very accurate to the prompts provided. In layman’s words, they began Make something This has nothing to do with the given or obviously inaccurate context. For example, some hallucinations are understandable, for example, mentioning relevant content, but not entirely relevant topics, and other times it looks like legitimate information, but is simply not correct, it has constituted.

This is obviously a problem when we start using generative models to accomplish tasks, and we intend to consume the information they generate to make decisions.

This problem is not necessarily related to how the model generates text, but to the information it uses to generate the response. Once you train the LLM, the information encoded in the training data will crystallize, which will become a static representation of everything the model knows until that point in time. To update the model Worldview Or its knowledge base, which needs to be retrained. However, training large language models takes time and money.

One of the main motivations for developing rags is the growing demand for factually accurate, context-sensitive and latest generated content.[1]

While thinking about a way to make generative models understand the new and informatively rich created every day, researchers began to explore effective ways to keep these models to this day without the need for continuous retraining of the models.

They thought of it Mixed ModelMeaning, Generating models have a way to obtain external information, which can supplement data that LLM has known and trained. These models have an information retrieval component that allows models to access the latest data, as well as their already well-known generation capabilities. The purpose is to ensure fluency and factuality is correct when generating text.

This hybrid model architecture is called Search for enhanced production or rag in short.

The era of rags

Given the urgent need to keep the model up-to-date in a time- and cost-effective way, RAG has become an increasingly popular architecture.

Its search mechanism retrieves information from unencoded external sources in LLM. For example, when you ask Gemini about Brooklyn Bridge, you can see a rag in the real world. At the bottom, you will see external sources get information from it.

Examples from external sources are displayed as part of the rag model output. (Image of the author)

By taking the final output from the information obtained from the search module, these results of generating AI applications are unlikely to propagate outdated, point-in-time views derived from the training data they use.

The second part of the rag architecture is the most visible to us, consumers, generation models. This is usually an LLM that can process information that retrieves and generates human-like text.

RAG combines search mechanisms with generative language models to improve output accuracy[1]

As for its internal architecture, the search module relies on dense vectors to identify relevant documents to be used, while the generative model utilizes a typical LLM architecture based on Transformers.

The basic process of the rag system and its components. The image and title are taken from the referenced paper [1] (Image of the author)

This architecture addresses very important pain points in the generative model, but this is not a silver bullet. It also presents some challenges and limitations.

Retrieve module possible Work hard to get the latest files.

This part of the building relies on dense channel retrieval (DPR)[2, 3]. DPR does a better job of finding semantic similarities between queries and documents than other TF-IDF-based technologies such as BM25. It is particularly useful in open domain applications using semantic meanings rather than simple keyword matching, i.e. consider tools like gemini or chatgpt, which are not necessarily experts in a specific domain, but experts Know A little bit about everything.

However, DPR also has its drawbacks. Dense vector representations can result in the retrieval of inconsequential or non-themed files. The DPR model appears to retrieve information based on knowledge already present in its parameters, i.e. facts must be encoded so that they can be accessed by search.[2].

[…] If we extend the definition of search to also include the ability to browse and elucidate concepts previously unknown or uninfected by the model (similar to how humans can study and retrieve information), our findings mean that the DPR model does not meet this mark.[2]

To alleviate these challenges, the researchers considered adding more complex query extensions and context ambiguity. Query extension is a set of techniques that modify the original user query by adding relevant terms, with the purpose of establishing a connection between the intent of the user query by using relevant documents.[4].

In some cases The generation module fails to fully consider its answer, and the information is collected during the search phase. To solve this problem, new improvements have been made to the attention and hierarchical integration technology [5].

Model performance is an important metric, especially when the goal of these applications is to be seamlessly part of our daily lives and make the most mundane tasks almost effortless. However, Running a rag end to end can be computationally expensive. For each query proposed by the user, one step is required for information retrieval, while the other requires text generation. This is where new technologies (such as model trimming) [6] and knowledge distillation [7] Work to ensure that the overall system is still performant even if the additional steps are searched for the latest information outside of the training model data.

Finally, while the information retrieval module in the rag architecture is designed to mitigate bias by accessing external sources that are newer than the model’s data. It may not actually eliminate bias completely. Without careful selection of external sources, they can continue to increase bias and even expand existing bias from training data.

in conclusion

Using rags in a generation application can significantly improve the latest capabilities of the model and provide more accurate results for its users.

When used in domain-specific applications, its potential is even clearer. With narrower scopes and external document libraries that are only relevant to a specific domain, these models can retrieve new information more efficiently.

However, ensuring that the continuous up-to-date generative model is far from solving the problem.

Technical challenges, such as processing unstructured data or ensuring model performance, continue to be active research topics.

Hopefully you enjoy learning more about rags, and this type of architecture plays a role in keeping the generated application up to date without having to retrain the model.

Thank you for reading!

A comprehensive survey of the Retrieval Effects Generation (RAG): Evolution, Current Landscape and Future Directions. (2024). Shailja Gupta and Rajesh Ranjan and Surya Narayan Singh. (arxiv)
The generation of search authorized: is a dense channel search. (2024). Benjamin Reichman and Larry Heck (link)
Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D. & Yih, WT (2020). Open channel search to answer questions in open domains. At the 2020 Natural Language Processing Experience Methods Conference (EMNLP) (pp. 6769-6781). (ARXIV)
Hamin Koo and Minseon Kim and Sung Ju Hwang. (2024). The query is generated for enhanced document retrieval for rags. (arxiv)
Izacard, G. and Grave, E. (2021). Use generative models to leverage channel retrieval for open domain questions. Proceedings of the 16th session of the European Branch of the Society for Computational Linguistics: Main Volume (pp. 874-880). (arxiv)
Han, S., Pool, J., Tran, J.. , & Dally, WJ (2015). Learn weights and connections for effective neural networks. In the advancement of neural information processing systems (pp. 1135-1143). (arxiv)
Sanh, V., Deput, L., Chaumond, J. , & Wolf, T. (2019). Distilbert, Bert’s distilled version: smaller, faster, cheaper and lighter. arxiv. /abs/1910.01108 (arxiv)