Extracting information from unstructured data is a common use case for generative AI. While this may seem straightforward at first glance, it poses several challenges related to data security, accuracy, and overall compliance. To showcase how these challenges can be effectively addressed, we have developed a demo chatbot with multi-level information access, that demonstrates strategies for tackling the complexities of searching for information within intricate reports.
Corporate knowledge databases are often built of loads of unstructured data: files, reports, notes, etc. Most of the searching methods through these documents are ineffective as they require a lot of manual work.
Depending on the companies’ internal policies, different departments, or even specific roles, may be authorized to access certain documents. When implementing generative AI to help users search for information through the knowledge base, it is crucial to indicate what pieces of information they are authorized to access, and develop a chatbot with multi-level information access.
In general, large language models generate answers that are syntactically correct. If they don’t know the right answer, they return an answer that seems probable. In other words, they hallucinate. However, if we want to rely on the model’s responses, it needs to be trustworthy.
As hallucinations are a well-known problem of LLMs and many people were aware of it, eliminating it was not enough — because the users would never be sure if the model’s response is correct or just probable.
RAG is an AI framework for retrieving information from external sources of knowledge to ground LLMs. Thanks to that, the model is able to provide users with accurate, up-to-date information that comes from the designated data sources (e.g., the reports), and there is no need for additional training.
If the answer was not found in the database, the model was specifically instructed to say so and not to make the answer up. Additionally, if the model was able to provide the answer, it would add a footnote indicating the information source. That way, it ensured users that the answer was not a model’s “hallucination.”
For the sake of this demo, integrating an extensive knowledge base into the model would be an overkill. As our main goal was to validate the technical feasibility of the designed solution, we decided to use a sample document, RAG, and a simple chatbot interface. Additionally, we added a multi-level access feature that manages users’ access to specific parts of the report.
The PoC proved the technical feasibility of the designed solution. When implemented, it can:
One of the objectives of incorporating new tools is to make work more efficient. To become a true alternative to previously used solutions, they need to be intuitive, and their onboarding time should be as short as possible. A well-known interface of a chatbot makes it possible to be used from day one by any employee.
Instead of searching for the right report and then looking for the information in that report, the users tell the Gen AI-powered chatbot what they need to know, and the model returns an answer with a reference to the original document — saving up hours of research work.
Assigning different levels of authorization to different roles or departments makes it easy to manage access to confidential information and comply with corporate policies.