[Day 5] Testing Strategic Thinking with a Real Work Challenge
[Day 5] Testing Strategic Thinking with a Real Work Challenge
What happens when you give three AI tools the same complex virtual training scheduling problem and one of them walks straight into a cultural landmine? I ran an experiment with Claude, ChatGPT, and Gemini: identical prompts asking each to recommend the best approach for scheduling a 100-person virtual training across six APAC countries during Lunar New Year season. One tool recommended a culturally tone-deaf date to save $2,500, and one framed the time zone constraint as a "Golden Window" that actually helped me think through the problem differently. This is what AI collaboration looks like in practice and why the human still needs to be in the loop.
When I ask AI to help with a genuinely complex business problem, I'm not just looking for information retrieval. I want to see strategic thinking. The kind of synthesis that requires holding multiple constraints in mind, making trade-offs, and arriving at a clear recommendation.
So I designed a prompt that would force exactly that: a multi-country virtual training scheduling problem with competing considerations around public holidays, time zones, cost trade-offs, and executive-ready formatting. This is the kind of real work professionals in Training or Learning & Development face regularly. It turns out to be a much harder test of AI capability than simply asking for a list.
Today's Experiment
I gave Claude, ChatGPT, and Gemini identical prompts for an training scheduling challenge: recommend the best approach for delivering a 3-session virtual training to 100 people across six APAC countries (India, Philippines, Singapore, Australia, South Korea, and Japan) in February 2026. The twist? Two pricing options with different trade-offs, a minefield of public holidays including Lunar New Year, and a request for a half-page executive summary format.
This prompt was deliberately designed to test whether AI tools can prioritize and synthesize rather than exhaustively list everything. This makes it a harder test of strategic thinking.
The Prompt
Here's exactly what I used:
Context: I'm organizing a virtual training event for a global team of 100 people across 6 countries: India, Philippines, Singapore, Australia, South Korea, and Japan. The training requires 3 sessions total (approximately 2 hours each) and must be delivered in February 2026 by an external instructor.
Pricing Structure:
Option A: All 3 sessions delivered in a single full day = $5,000 total
Option B: 3 sessions delivered across 3 separate days = $2,500 per session ($7,500 total)
Constraints:
Sessions must avoid public holidays in all 6 countries
Session times should be reasonable working hours (ideally 8am-6pm local time) for the majority of participants
We want to maximize live attendance across time zones
Your Task:
Research: Identify all public holidays in February 2026 for India, Philippines, Singapore, Australia, South Korea, and Japan.
Time Zone Analysis: Map the time zones for each country and identify optimal session windows where the most countries have reasonable working hours overlap.
Options Comparison: Compare Option A (single day) vs Option B (3 separate days), including cost difference, cost per attendee, and scheduling tradeoffs.
Recommendation: Provide your recommended approach with specific proposed date(s), session time(s) in UTC, and a table showing local times for each country.
Format: Present this as a half-page executive summary for a senior leader. Lead with your recommendation, follow with key supporting points, and include a brief cost comparison and proposed schedule. Be concise and decision-ready.
The Process
I ran the identical prompt through three AI tools:
Claude (Anthropic)
ChatGPT (OpenAI)
Google Gemini
Then I evaluated each response against five criteria that matter for real-world HR decision-making:
Holiday accuracy — Did they get all the public holidays right?
Math correctness — Is the $2,500 cost difference stated? Cost per person calculated?
Time zone realism — Did they actually check overlaps or just guess?
Recommendation clarity — Is there a clear "I recommend X because Y" statement?
Appropriate caveats — Did they note limitations or suggest verification?
Outputs: What Each Tool Delivered
Claude's Response
Recommendation: Option B (3 separate days) at $7,500 total
Proposed dates: February 5, 12, and 26 (Thursdays at 03:00 UTC)
Recommendation from Claude. It’s impressive how it was able to recommend scheduling the session on Thursdays to avoid weekends and provide for mid-week engagement and nailing the cultural nuance to skip CNY week. 🤯
Key strengths:
Correctly identified the major holiday conflicts: Japan's National Foundation Day (Feb 11), Chinese New Year across Singapore/Philippines/Korea (Feb 16-18), and Japan's Emperor's Birthday (Feb 23)
Provided a clean local time table showing all six countries within working hours
Clear cost comparison with per-attendee breakdown ($50 vs $75)
Explicitly noted why Option A doesn't work: time zone constraints make a 6+ hour single-day session impractical
Notable caveat: Listed alternative dates (Feb 3, 4, 9, 10, 24, 25, 26, 27) as backups if primary dates don't work.
Format observation: Claude automatically produced a downloadable Word document with no extra steps required.
ChatGPT's Response
Recommendation: Option A (single day) at $5,000 total
Proposed date: February 19, 2026
Recommendation from ChatGPT
Key strengths:
Comprehensive holiday research with specific dates
Detailed time zone analysis with a proper table
Clear math: stated the $2,500 savings explicitly
Critical weakness: ChatGPT initially recommended February 19th, which falls on the third day of Lunar New Year. When I followed up asking about cultural sensitivity, ChatGPT acknowledged this was problematic and revised the recommendation to February 24-26 instead.
What this reveals: ChatGPT got the "technical" holiday dates right but missed the cultural nuance that many employees in Singapore, Korea, and Chinese-influenced markets treat Day 3 of Lunar New Year as a "soft holiday" even when it's technically a working day. The tool needed human prompting to catch this.
Format observation: ChatGPT was the only tool where I couldn't easily export the output. I had to copy and paste the content elsewhere—there was no offer to create a document or artifact directly. Both Gemini and Claude handled this better.
Gemini's Response
Recommendation: Option B (3 separate days)
Proposed dates: February 3, 4, and 5, 2026 (first week of February)
Recommendation from Gemini. While the formatting is not as polished as Claude, the logic of scheduling the training to be completed in the first week of February is very sound and practical.
Key strengths:
Correctly identified Week 1 as the "only window that avoids all public holidays across the 6 nations"
Strong articulation of why Option A fails: "would force Australian participants to stay until ~11:00 PM or Indian participants to start at 4:00 AM"
Used visual formatting (color-coded weeks, checkmarks) for scanability
Explicitly labeled the 3.5-hour overlap window as the "Golden Window"
Unique insight: Gemini framed the time zone constraint as "rigid"—emphasizing that extending beyond 2 hours per session would push Australia into evening hours. This kind of practical constraint recognition is exactly what a human training professional would flag.
Format observation: Gemini offers options to convert the output into a Google Doc, which integrates well with typical workplace workflows.
Comparison Summary
Criteria
Claude
ChatGPT
Gemini
Holiday accuracy
✅ Correct
⚠️ Missed cultural nuance
✅ Correct
Math correctness
✅ $2,500 difference, $50/$75 per person
✅ $2,500 difference, $50/$75 per person
✅ $2,500 difference, $50/$75 per person
Time zone realism
✅ Verified overlap
✅ Verified overlap
✅ "Golden Window" framing
Recommendation clarity
✅ Clear with rationale
⚠️ Required follow-up to correct
✅ Clear with rationale
Appropriate caveats
✅ Alternative dates provided
✅ After prompting
✅ Notes on rigid constraints
Output format
✅ Auto-generated Word doc
❌ Copy/paste required
✅ Offered Google Doc
What I Learned Today
1. Cultural context requires human judgment (still)
ChatGPT's initial recommendation technically avoided public holidays but would have created a cultural misstep. This is a perfect example of where AI needs human oversight. The tool can gather the data, but the human brings the contextual wisdom about "soft holidays" and cultural sensitivities that don't appear in official calendars.
2. Constraint framing matters for strategic output
Gemini's "Golden Window" framing and explicit statement that the time zone overlap is "rigid" demonstrates a level of strategic synthesis that's genuinely useful for decision-making. The best AI responses don't just list constraints. They articulate why those constraints matter.
3. Output format is part of the deliverable
One underrated difference: Claude produced a downloadable Word document automatically, Gemini offered Google Docs integration, but ChatGPT required manual copy-paste. When you're trying to move fast and deliver recommendations to stakeholders, this friction adds up. The tool that reduces steps between "AI output" and "usable deliverable" wins.
4. The "recommend Option A to save money" trap
ChatGPT defaulted to recommending the cheaper option, even when it created significant practical problems. Both Claude and Gemini correctly identified that the cost savings weren't worth the compromises as a six-hour single-day training across these time zones simply doesn't work. This is the kind of prioritization that separates useful AI assistance from naive optimization.
Try It Yourself
Want to test this with your own scheduling challenge? Here's how to adapt this experiment:
Pick a real multi-constraint problem — The more competing factors (cost, timing, geography, cultural considerations), the better the test of AI synthesis.
Ask for a specific format — "Half-page executive summary" or "bullet points for a team email" forces the AI to prioritize rather than dump everything it knows.
Follow up on cultural nuance — If your scenario involves multiple countries or cultures, explicitly ask: "Are there any cultural considerations I should know about for these dates?"
Compare outputs across tools — Different tools have different strengths. Running the same prompt through 2-3 options often reveals which one handles your specific use case best.
Check the export options — Before you commit to a tool, verify how easily you can get the output into your actual workflow. A brilliant recommendation that's trapped in a chat interface is only half as useful.
The real skill isn't just prompting AI. It's knowing which tools work better for the problem you’re looking to solve, when to trust the output and when to probe deeper. That's the human in the loop that makes AI collaboration actually work.
I've tried using Claude, ChatGPT, and Gemini to create presentations and the results were always underwhelming. The content was decent, but the formatting? A manual nightmare. So when I tested Gamma.app with a genuinely complex use case (employee offboarding compliance across six Asia-Pacific countries), I wanted to see if a specialized tool could actually deliver what general-purpose AI assistants can't: polished, presentation-ready output in minutes, not hours. Here's exactly what happened, step by step, including the limitations they don't tell you about upfront.
I uploaded one PDF chapter on biomechanics and five minutes later had a polished infographic, a 12-question quiz, 51 flashcards, and a podcast episode—all generated by AI, all grounded exclusively in my source material with zero hallucinations. No Canva, no Anki, no quiz app, no piecing together outputs from different tools. Today’s experiment explores NotebookLM, Google’s quietly powerful (and still free) learning tool that handles production while you handle curation, and what it reveals about the shifting role of humans in AI-augmented learning.
I put Claude, ChatGPT, and Gemini to the test with the same handwritten notes. One misread a key number, one nailed the formatting with icons, and one gave me three different options I didn’t ask for. Here’s what happened—and what it taught me about choosing the right AI tool for the job.