The Forge – Voice Agent Spike

As you may have read in our previous article, at Arionkoder we have reinvented our approach to innovation through The Forge. Traditional hackathons are a thing of the past; we now run short, well‑defined challenges that allow us to:

  • Tackle real customer or internal project issues with innovative, AI‑first solutions.
  • Craft reusable, scalable assets that emphasise impact and ingenuity.
  • Validate ideas quickly through  rapid prototyping and proof‑of‑concept (POC) work in 1–2 weeks, reducing long‑term risk.

Episode 25.3: Spike: Voice Agents Personality

In the first two editions of The Forge, we tried two different challenge formats: Spike and Prove the Value respectively. Based on the results we defined for our third edition to repeat the Spike approach while refining parts of the process and narrowed back into AI agents as the challenge scope. The Spike approach was intended for one full week in which teams focused on a specific issue that required a proof of concept (POC) solution that eventually tackled it. This POC needn’t be very detailed or elaborate, the main goal was to build something quick to test it against a pain point we had experienced before on a project.

In the case of The Forge 25.3, the main challenge was embedding personality inside voice agents dealing with real life conversations in real time. This challenge was then divided for teams to choose only one of the following sub-challenges:

  • Emotional Intelligence – Detects user emotion and responds empathetically.
  • Conversation Dynamics – Pause naturally, avoid interruptions, and let dialogue flow.
  • Contextual Awareness – Adapt tone and style to the topic at hand.
  • Personality Consistency – Remain coherent and dependable throughout.
  • Real‑time Usability – Use tools (e.g., web search) mid‑conversation without latency.

Spike Format Refinement

Challenge context: Compared to the previous Spike format challenge, we decided to provide more context into each of the 5 issues with real-world benchmarks and references, thus we built a guideline with examples and research resources filled with papers and interesting articles teams could use as starting points. Another change we had is that for each team, a machine learning engineer or data scientist was designated as team lead, this way we provided more expertise onto an AI challenge approach. This was helpful especially for other team members which were new on The Forge and hadn’t participated before in AI projects.

Diversity & topic relevancy: The challenge topic was highly relevant to the innovation area, addressing a real-world pain point as we explored personality integration into conversational AI agents at that exact moment. Given the recent advancements in AI in this domain over the past couple of years, we aimed for a deep dive into the subject. With the goal of incorporating diverse perspectives, particularly from UX designers and product managers who may not regularly engage with AI agents, the challenge was extended to all Arionics, this approach sought to have diversity and cross-functional teams working towards a common goal.

Key Learnings

Timeframe: One week was too tight for ideation, build and thorough documentation, especially for teammates with full client loads.

Role mix: Despite outreach, UX and Product talent remained under‑represented. Solutions were technically solid but weaker in business value and usability.

Storytelling: Registration hit a record high, yet demo narratives lacked depth and creativity. Tight timing and limited UX roles likely contributed.

Challenge Outcomes

So far it might feel from the outside that the challenge wasn’t successful, but it indeed was. We definitely had a blast with the demo presentations, all of them were very creative and innovative from their point of view. Three different approaches and three different journeys due to the fact not the same roles were in each team as well. For example one of the teams had no real experience with conversational agents at all, and tried coming up with a novel way to treat the problem we didn’t expect. 

Senti used retrieval augmented generation (RAG) as the core of the solution instead of LLMs as the other two teams did, and it did solve the issue at hand. Senti also used colors for emotion detection (yes, similar to Inside Out!) which was a great experience for the end user.

TARS made a great benchmarking process by sourcing an off-the-shelf LLM tool that had great integration with APIs to existing solutions. This simple yet very effective approach made sure the solution was actually reusable for any project coming up that required conversational agents. Hume.ai is now part of our extensive AI toolkit.

Lastly our challenge winners, AI musketeers, also chose the LLM tool approach, however, they embarked on building a custom agent to solve almost all 5 challenges. They provided a roadmap to grow and scale this solution into an actual product in the future, while making sure reusability remained as focus, this was possible through the use of MCP servers (a novel communication protocol between agents and other platforms).

Next Steps & Innovations

After analyzing results and gathering feedback and learnings from the team and judges, we are now ready to take upon our next The Forge edition 25.4: Model Performance Evaluation. As next steps and innovations for this new challenge we made sure to:

  • Extend Spike timeframe to 2 weeks to relieve pressure and improve documentation quality.
  • Proactively recruit UX/Product participants with personal outreach and a promise of hands-on AI experience.
  • Double down on storytelling, asking teams to highlight journeys and learnings, not just the end product.

We’re excited to see what Forge 25.4 brings and to keep growing participation (currently ~10 % of Arionics). Our North Star remains: build real solutions fast, share what we learn, and give every Arionic a stake in innovation.If you’re a company eager to co‑create AI‑powered products, or a professional who wants to shape them, let’s talk: [email protected]