Captions GPT: From Idea to Launch With Vibe Coding β
"Every app you see started with someone who had a problem to solve." - Anonymous
Captions GPT is a mobile app that generates creative captions for social media images and videos using AI. This case study shows how the app was built from initial concept to deployment using Vibe Coding - an iterative approach powered by AI agents like Cursor.
GitHub Repository: cursor-captionsgpt
AI Tools Used in This Project β
AI Tool | Role in Development | Key Contributions |
---|---|---|
Cursor | Primary Development Agent | Code generation, debugging, API integration, UI component creation |
ChatGPT | Planning & Research Tool | System prompt design, implementation research, documentation creation |
Replicate | Core AI Engine | Image & video caption generation models, API for media analysis |
Ideation & Problem Discovery β
Identifying a Problem Worth Solving β
π The Problem β
As an avid traveler and social media enthusiast, I love sharing my experiences online. However, I often struggled to create creative and engaging post captions.
While this may seem like a small inconvenience, it's a frustration shared by millions of social media users who want their posts to stand out. A great caption can drive engagement, but not everyone has the time or creativity to craft one.
I knew this was a problem worth solving because:
- β Personal Experience: Being the target user, I have firsthand insight into the pain points and requirements
- β Skill Alignment: My background in product development equips me to design an intuitive and effective solution
- β Passion for AI: This project aligns with my interest in experimenting with Gen AI to solve real-world problems
π Understanding the Ideal Customer Profile (ICP) β
To validate the idea, I outlined the Ideal Customer Profile (ICP):
- πΉ Who they are: Social media users, influencers, travelers, and content creators, in short consumers (B2C Persona)
- πΉ Pain point: Struggle to come up with unique, engaging, and on-brand captions consistently
- πΉ Current alternatives: Existing Apps, Googling captions, using basic caption generators, or asking friends for ideas
- πΉ Behavior: Frequently post on Instagram, Twitter, TikTok, or Facebook and care about engagement
π Market & Competitive Research β
To validate the problem and identify opportunities, I analyzed existing caption generator applications:
Feature | CapGen β AI Captions & Hashtags | Captions for Instagram & FB | AI Captions & Hashtags Generator | Caption Generator for Instagram |
---|---|---|---|---|
Personalised using AI | Yes | No | Yes | No |
Customization Options | Yes (tone, style, emojis, hashtags) | No | Limited | No |
Media | Only Images | NA | Only Images | NA |
User Interface | Modern and user-friendly | Basic | Outdated | Simple |
Monetization Model | Free with ads | Free | Free with ads | Free |
App Store Rating | 4.5/5 | 4.0/5 | 3.8/5 | 3.5/5 |
π‘ Key insight β
Many people already use caption generators, but existing solutions are either expensive, lack personalized caption generation or have a cluttered, complex UX.
π‘ Competitive Edge for Captions GPT β
- πΉ AI-Powered Captions for Photos & Videos β Unlike competitors that focus only on images, Captions GPT generates creative captions for both photos and videos using Gen AI.
- πΉ Clutter-Free, Ad-Free UX β No login is required, no ads, and a simple, intuitive flow for a seamless experience.
- πΉ Competitive Pricing β More affordable than ChatGPT, making it a cost-effective alternative for social media users.
Building Your Product β
β Product Vision Recap β
Captions GPT aims to help social media users quickly generate creative, AI-powered captions for their images (and eventually videos), making their posts more engaging and effortless.
After validating the user problem and designing the initial solution, the next step was to define clear, actionable requirements β both product and technical β and break them into manageable releases for iterative delivery.
π± Product Requirements β
- Image Upload β Users can upload images from their devices
- AI Caption Generation β Captions are generated using AI based on the uploaded image
- Caption Display β A clean UI to display captions
- Copy to Clipboard β Users can copy captions easily
- History β Track previously generated captions
- Terms & Privacy β Include links to legal docs
- Welcome Animation β Engaging intro animation on the first launch
- Dark Mode β For user preferences and accessibility
Caveat: Instead of detailing the entire user experience using mockups or wireframes before development, the approach for this app is "vibe coding" β iteratively building and improving the product based on prompts and an evolving understanding of user needs. This ensures faster experimentation and real-time course correction without getting locked into rigid designs upfront.
βοΈ Technical Requirements β
- Use Expo for cross-platform development (iOS & Android)
- Replicate API for AI-powered caption generation
- Streaming Mechanism to stream responses from Replicate API for real-time caption display
- Cursor as the AI Agent for development
Product Development β Iterative Releases β
π’ Release 1 β Core Captioning App (MVP) β
Prompt to Cursor AI Agent: β
Build a mobile app that:
{Product Vision}
{Product Requirements}
{Technical Requirements}
Development Journey: β
Initial challenges:
- Replicate API integration caused app crashes
- Cursor helped debug and fix the initial API wiring
- Faced streaming issue β captions appeared only at the end instead of progressively. Cursor adjusted API call handling to enable real-time display
- Formatting issue: Each word appeared on a new line. After several prompts, Cursor reformatted captions into full sentences
β Outcome: A functional MVP that allowed users to upload an image, generate an AI caption, and copy it β with real-time streaming captions.
π‘ Release 2 β Context-Aware Captions with System Prompt β
To make captions more relevant and creative, we added a system prompt and user context input:
System Prompt to AI:
You are my social media manager. Write me an Instagram caption based on the image. Add relevant hashtags. Additional context: ${context}
New feature:
- A text box for users to add context (e.g., "Birthday party in Paris")
- Cursor modified the API integration to pass user context dynamically
β Outcome: Personalized captions that understood the image's purpose, mood, and social context.
π΅ Release 3 β Video Caption Generation β
Expanding functionality to videos required additional AI model handling:
Challenge:
- Used Replicate's video captioning model:
lucataco/qwen2-vl-7b-instruct:bf57361c75677fc33d480d0c5f02926e621b2caa2000347cb74aeae9d2ca07ee
- Cursor initially failed to upload videos properly
- After debugging, switched to Base64 encoding for video uploads to ensure compatibility
β Outcome:
- Users can now upload both images and videos and receive AI-generated captions
- A video preview player was added for better UX
π£ Release 4 β Usability Enhancements & Cancel Feature β
Final polish to make the product robust and user-friendly:
- Cancel ongoing generation to give users control
- Handled errors for interrupted API calls
- Progress indicators during long video processing
- Final UX fixes like dark mode, history tracking, and welcome animation
β Outcome: A production-ready, cross-platform mobile app supporting real-time, AI-powered caption generation for images and videos, with thoughtful user experience design.
Monetization & Analytics β
π£ Release 5 β Analytics Integration β
Context: As we plan for the public launch of the product, it's critical to measure user engagement and feature usage to evaluate product success. To achieve this, we are integrating PostHog, a product analytics tool that will help us track key user actions and flows inside the app.
β Requirements
- Integrate PostHog SDK into the application to monitor user actions
- Track the following user actions:
- Screen views β Which screen/page the user is on
- Button clicks β Which button is clicked
- Action outcomes β What happens after a button click (e.g., successful image generation, error messages)
- Keep API keys secure β Store API keys and sensitive configs in .env files
- Do NOT commit .env files or secrets to GitHub
Post initial integration, we faced minor issues which were later fixed using Cursor AI code suggestions.
π£ Release 6 β Payment Wall β
β οΈ Context: To monetize the app we would need to have Payment Wall. Revenue Cat provides an easy way to implement payment walls. The payment wall can be dynamically controlled through the back-office of Revenue Cat with minimal code changes. Note that the Payment Wall can only be implemented post-production deployment.
Implementation Strategy: β
Research & Planning:
- Used ChatGPT to outline implementation requirements
- Prompt: "I am building a mobile app using Expo and I want to incorporate payment wall using Revenue Cat. Write step-by-step instructions I need to follow..."
Core Logic:
- Redirect to payment wall when user clicks "Generate Captions"
- First-day free trial with no payment wall
- Check subscription status before caption generation
Integration Details:
- API Keys stored in .env file
- Same user ID for PostHog and Revenue Cat for unified analytics
Bonus: Authentication Implementation β
Though not required for the initial version, we prepared authentication infrastructure:
- Researched Supabase implementation for email/password login
- Created secure credential storage using environment variables
- Designed login/signup flow that integrates with existing UI
Deployment & GitHub β
π£ GitHub Setup β
- Repository created: github.com/angshu-min-js/cursor-captionsgpt
- Code pushed directly to the repo
- .env files and sensitive information excluded from repository
App Store Deployment β
π‘ Note: For detailed instructions on deploying your own mobile app to the App Store and Google Play Store, check out Chapter 8: Mobile App Deployment which covers the entire process from build generation to store submission.
Tech Stack Summary β
The final application uses:
- React Native: Cross-platform mobile development
- Expo: Simplified React Native development and deployment
- TypeScript: Type-safe JavaScript
- React Navigation: Navigation between screens
- Replicate API: AI-powered caption generation with real-time updates
- Expo SecureStore: Local storage for caption history and settings
- React Native Reanimated: Animations
- PostHog: Analytics for tracking user engagement and app performance
- Revenue Cat: Subscription management and payment processing
Key Lessons from Vibe Coding with AI β
π What Worked Well β
- Rapid Prototyping: Using Cursor AI allowed for quick development cycles, cutting weeks off traditional development time
- Problem Solving: AI helped overcome technical challenges like streaming API responses and Base64 encoding issues
- Iterative Development: "Vibe coding" enabled functionality to evolve naturally based on testing and feedback
- Full Stack Capabilities: AI handled both frontend (UI components) and backend (API integration) equally well
π§ Challenges & Limitations β
- Integration Complexity: Some third-party services required manual troubleshooting beyond AI's initial solutions
- Context Limitation: The AI sometimes forgot previous implementation details, requiring reminders
- Testing Gaps: Manual testing was still essential, especially for edge cases
- API Updates: When APIs changed, AI guidance needed to be supplemented with documentation research
π‘ Key Takeaways β
- Start Simple: Begin with a clear MVP and expand features incrementally
- Work With AI: Treat AI as a collaborative junior developer - guide it with clear requirements
- Verify Implementation: Always check the generated code against requirements
- Keep Learning: Understanding the basics of your tech stack improves AI collaboration
- Document Everything: Keep track of key decisions and solutions for future reference
Next Steps β
Next I will cover:
- Landing page development
- App Store assets creation
- App Store Optimization (ASO)
- Promotional video production
- Marketing strategy
Stay tuned for updates on the full launch story!
Stay Updated π¬ β
Check back soon for the complete case study on Captions GPT's public launch, or contribute to this section if you have insights about launching AI-powered mobile apps.
Comments
Share your thoughts and questions about this page. Sign in with GitHub to comment.