This article presents the development of an application capable of converting meetings into visual diagrams in real time using generative AI. Through the integration of audio capture, automatic transcription, and structured generation with LLMs, conversations were transformed into dynamic graphical representations in SVG format. This case demonstrates how combining multiple technologies can structure complex information and improve comprehension in collaborative settings.

Meetings are often spaces where many ideas are generated, but it's not always easy to follow the thread of the conversation or remember the key points afterward. In many cases, the information becomes abstract, disordered, or difficult to synthesize.
In this context, an interesting opportunity arises: use generative AI not only to process text but to structure and visualize thinking in real time. This article explores the development of an application that automatically converts conversations into visual diagrams, combining audio, transcription, and language model technologies to transform meetings into clearer, more dynamic, and more comprehensible experiences.

The system was designed as a continuous flow that transforms audio into visualizations without interruptions. The architecture connects multiple stages that work in real time. First the audio is captured from the microphone, then transcribed and finally transformed into a visual diagram.
Key points:
This approach allows the diagram to evolve as the meeting progresses, without needing to reprocess everything from scratch.
One of the biggest challenges was deciding how to visually represent the information. Different options were evaluated, but many presented significant limitations. AI-generated images were attractive, but inconsistent and difficult to update.
Key points:

Using SVG solved these problems:
This makes it an ideal option for dynamic, real-time visualizations.
The language model was not used only to generate content, but to structure it logically and visually. To achieve this, a prompt with strict rules was designed to guide the model's behavior.
Key points:
This approach allows the system to build a coherent, evolving diagram rather than producing isolated results.

In addition to structured text, the goal was to enrich diagrams with complementary visual elements. Initially, attempts were made to generate images with AI, but response times were too long for a real-time system.
Key points:
This approach maintained system speed without sacrificing visual quality, combining generative AI with efficient retrieval techniques.

The development of this application demonstrates that generative AI can go beyond text generation, enabling the structuring of ideas and their real-time visualization. By combining audio capture, automatic transcription, and structured generation with LLMs using formats like SVG, it is possible to transform meetings into clearer and more comprehensible experiences. These kinds of solutions open new possibilities to improve communication, decision-making, and the way we interact with information in collaborative environments.