A Hackathon Tale: Turning Internet Text into Visual Stories with Text-to-Image-Phi2

Enrique Gamboa
4 min readFeb 15, 2024

--

Phi-2 Hackathon 2024 — Cover Image

1. Introduction 🚀

In the ever-evolving landscape of technology, hackathons serve as a crucible for innovation, challenging bright minds to push the boundaries of what’s possible. Last week (February 01, 2024), my team and I embarked on a thrilling 24-hour journey at the Phi-2 Technology: 24 Hours Challenge, an arena to harness the power of Microsoft’s latest marvel in NLP, Phi-2. This article unfolds the story of how we melded creativity with technology to birth a groundbreaking browser extension: Text-to-Image-Phi2.

Project submission: Link

GitHub: Link

2. The Team: Metaverse Professional 🌐

Behind every great innovation is a team of relentless dreamers. Enrique Gamboa & Elias Ordaz Sanchez — a duo bonded by a shared passion for tech and a vision to redefine the digital experience. With backgrounds spanning software development and AI, we dove into the hackathon with a blend of expertise and enthusiasm, aiming to bridge the gap between textual content and visual imagination.

Phi-2 Hackathon presentation introduction — Enrique and Elias on image

3. The Problem Resolved 🔍

In the vast expanse of the internet, the ability to instantly visualize textual content as images represents a groundbreaking advancement for digital creators and everyday users alike. Our browser extension, operational across a wide range of browsers, addresses this by enabling users to effortlessly convert any selected text into compelling images. This not only enriches the digital content experience but also makes visual creation highly accessible and intuitive. Whether for educational purposes, content creation, or simply to add a visual dimension to reading, our tool democratizes the power of visual storytelling, making it a staple in the digital creator’s toolkit.

Text-to-Image-Phi2 — Browser Extension

4. Architecture 🏗️

The architecture of our extension is a testament to simplicity and efficiency. Integrating a context menu for text selection, the extension seamlessly sends the selected text to Microsoft Phi-2 for a crisp summarization. This summary then serves as a prompt for Dalle 3, which breathes life into the text by generating a unique illustration. This streamlined process embeds a layer of creativity directly into the user’s browsing experience.

Architecture of Text-to-Image-Phi2 Browser Extension

5. Phi-2 Usage 🧠

At the core of our extension is Microsoft Phi-2, tasked with transforming internet text into crisp, creative prompts for Dalle 3. This model’s strength lies in its adeptness at summarizing and refining text into a form that’s both comprehensive and conducive for visual generation. By condensing complex ideas into short, illustrative prompts, Phi-2 ensures each image produced is a meaningful visual reflection of the original text. Our use of Phi-2 exemplifies its exceptional capability in turning textual content into prompts that inspire accurate and engaging illustrations, streamlining the bridge from text to visual art.

Here’s a simplified breakdown of the process:

  1. Capture the user-selected text via the browser’s context menu.
  2. Construct a pre-prompt that sets the stage for Microsoft Phi-2, indicating the need for a short, illustrative prompt based on the text.
  3. Execute a call to Phi-2 with the selected text, receiving in return a prompt ingeniously crafted to inspire a visually engaging illustration by Dalle 3.
Microsoft Phi-2 specifications to summarize text from the internet

6. GitHub Project and How to Install 🛠️

Embark on your visual journey with Text-to-Image-Phi2 by visiting our GitHub repository. The installation is a breeze — download the extension, switch to Developer Mode in Chrome Extensions, and load the unpacked extension to start transforming web text into illustrations right from your browser. Dive into a new era of internet browsing where text becomes a canvas for creativity.

7. Hackathon Conclusion 🏁

The Phi-2 Technology: 24 Hours Challenge was not just a competition; it was a celebration of innovation, a testament to what can be achieved when technology meets creativity. Our project, Text-to-Image-Phi2, stands as a bridge between the textual and visual realms, offering a fresh perspective on digital content interaction. As we reflect on this journey, we’re reminded of the power of collaboration, the thrill of innovation, and the endless possibilities that lie ahead.

Project video submission

Hackathon project submission cover

--

--

Enrique Gamboa
Enrique Gamboa

Written by Enrique Gamboa

If art is a human abstraction, Artificial Intelligence is the abstraction of humanity 🦾

No responses yet