7. AI-Assisted Coding

🎯 Learning Goals

Explain the purpose of prompt engineering and its use cases

Describe the limitations of large language models like ChatGPT

Design prompts with specificity and technical language to improve responses

📗 Technical Vocabulary

Prompt Engineering

Artificial Intelligence

Large Language Models

Tokenization

Hallucinations

What is Prompt Engineering?

Prompt Engineering is the art of crafting precise, clear, and specific instructions for AI tools. It is a critical skill for guiding AI models, such as ChatGPT, to produce accurate and useful responses. In the context of Data Science, prompt engineering can help you with everything from data cleaning and exploration to generating SQL queries.

Introduction

AI is changing the way we program. Tools like ChatGPT can help with debugging, code generation, and understanding complex programming concepts.

Large Language Models (LLMs) like ChatGPT, Google Gemini, and Claude are powerful coding assistants when used effectively.

In this lesson, we’ll explore how to craft high-quality prompts to get accurate and useful responses. Why? Because the way you talk to these tools makes all the difference in how useful they are! We’ll share tips and tricks to make sure your AI game is on point.

💭

Think About It

How have you used LLMs in the past? What are the benefits and limitations?

This is a good opportunity to mention that there are other LLMs outside of ChatGPT. Other popular options include:

Gemini — Google

Llama — Meta

Claude — Anthropic

Demystifying LLMs

LLMs are essentially very fancy autocomplete systems. They rely on patterns, the context you provide, and huge amounts of training data to predict the next word in a sentence.

LLMs break down text into smaller pieces called tokens, which the model uses to understand and generate language. Tokens can be as small as individual characters, but are sometimes whole words! For example, the sentence “I love coding!” might be tokenized like this:

LLMs like ChatGPT process and analyze text at the token level, using patterns in these tokens to predict the next value in a sequence. For example, if you write “The best name for a dog is“ the LLM predicts the most likely tokens to come next, based on its model of human language. In the image below, you can see that while the model selected “Fido” to complete the sentence, there were other likely tokens that could come next such as “R”, “B”, “Spot”, or “Max.” It’s important to note that LLMs are generally set to include some randomness, so it doesn’t always select the most likely token! This is why when you ask an LLM the same question twice, you will likely get slightly different responses.

Once the LLM selects the next token, it doesn’t stop there! Now, it uses the entire original sentence plus the new token to predict the next token after that and so on. This leads to a butterfly effect where small differences in the starting state of a system lead to very different outcomes. In the example below you can see that since the sentence was completed with “Fido,” the remaining tokens follow that idea.

However, if the model had completed the sentence with “Spot” instead, the rest of the response would have been different as well.

These examples begin to illuminate some of the limitations of AI tools like Large Language Models. The AI is simply guessing the next word based on statistical patterns in its training data, but it can’t really “think” like humans do.

💭

Think About It

Now that you know about how LLMs work, why do you think ChatGPT generated this incorrect response?

From the AI's perspective, "strawberry" is not a sequence of individual letters, but a sequence of token IDs.

Miscounting the 'R's in "strawberry" shows how humans and AI see text differently. We read text naturally, letter by letter, but AI tools, like ChatGPT, chunk text into tokens that combine multiple letters or even entire word parts.

Knowing this helps you use AI better! It’s a reminder that while these models are super smart, they don’t "think" like we do. That’s why they sometimes mess up even on easy stuff.

Other Limitations of AI

Beware of biases! These models are trained on large datasets that reflect societal biases.

Watch out for hallucinations. LLMs sometimes make stuff up! Because of the tokenization process, LLMs sometimes generate text that seems realistic, but is actually inaccurate or misleading.

Think of it like a friend who’s super confident about random facts but sometimes just makes stuff up when they don’t know the answer. For example, if you asked, "Who invented pizza?" and the AI said, "Pizza was invented by aliens in 1850"—that’s a hallucination. It’s not lying on purpose; it just doesn’t know the answer and guesses based on patterns in its training data.
These hallucinations happen because AI doesn’t "know" things the way humans do. It doesn’t have facts stored like a library—it predicts what sounds right based on the data it’s been trained on and the previous context. Sometimes those predictions are spot on, but other times... not so much!

Limited knowledge. LLMs are essentially a snapshot of the world’s knowledge at the moment of their training. For GPT-4 Turbo, the training data cut-off was December 2023. This means the model typically does not have knowledge of recent events unless specifically enabled with a "browse the internet" feature.

AI tools are fantastic at generating ideas and speeding up workflows, but they’re not magic. They’re super advanced word machines that still rely on your input. That’s why learning to craft effective prompts is such a big deal—it helps you get the most accurate and useful results from AI.

Privacy Considerations

While the model itself doesn’t “remember” things from your conversation, the platform through which you access that model might! For example, ChatGPT is a platform through which you access and interact with various large language models developed by OpenAI, like GPT-4o. As part of this platform (ChatGPT), OpenAI does implement some memory systems to store previous messages and conversations. As with any modern web application, they do collect user data, including the messages you send and any data you share with the application. This means OpenAI could store this data and use it in the training of future models!

For this reason, it’s important to be mindful of the information you share on platforms like ChatGPT. Never share personal or sensitive information, such as passwords or financial details, in a chat with an LLM. Because chatbots can feel personable, it's easy to make this mistake—but it's important to avoid sharing personal details like names, addresses, or other sensitive information.

💡

Did You Know?

You can easily exclude your data from future training by changing the settings in ChatGPT. Click your user icon in the top right corner and select Settings. From there, select Data Controls and then turn Off the option to Improve the model for everyone.

During this course, we’ll use ChatGPT as a helpful coding buddy. Ready to unlock the power of AI for Data Science? Let’s get started! 💻

Writing Effective Prompts

Remember, Large Language Models are trained to have some randomness. This means you might get a different response than the one I get! You’ll probably get similar responses, but not exactly the same.

General Techniques

✅ Be Specific — Specify the problem in detail

✅ Use Technical Language — Include relevant SQL terms and concepts

✅ Provide Context — Describe the goal and constraints

Example: SQL Query Generation

Here’s an example of a poorly-crafted prompt.

❌

"How do I find the mean?"

This is too vague. The AI will need to ask follow-up questions to help you craft a query for your specific dataset. Try refining the prompt to be more specific, use technical language, and provide context.

✅

"I am querying a SQLite database. One of the columns contains years. How can I find the mean year in that column?”

It takes a little more time upfront to craft a detailed prompt, but you’re much more likely to get an accurate response!

✏️

Try-It | Craft Detailed Prompts

Enter this prompt into ChatGPT or LLM of your choice: "How do I select records?" Read the response carefully and consider how you might improve the prompt.

Enter this revised prompt into ChatGPT or LLM of your choice: “How do I select records which meet some criteria in SQLite?” Read the response and determine if it is accurate.

Why did the second prompt result in a more accurate response?

💡

Did You Know?

LLMs like ChatGPT actually do respond better to kindness. “Using polite prompts can produce higher-quality responses,” according to a study by a team at Waseda University and the RIKEN Center for Advanced Intelligence Project. But don’t overdo it! Excessive flattery can result in poorer performance.

Iterative Prompting

So what if you craft or engineer your prompt, but it still doesn’t produce exactly what you were looking for? You can always try again! Generative AI like ChatGPT is like having an infinitely patient friend. You can follow-up and ask it to revise the results 10 times and it will never get annoyed with you.

Iterative prompting is the process of prompting, evaluating the response, and then revising to clarify what you want and prompting again.

Prompt

Evaluate

Revise/Follow-Up

Potential Pitfalls

Obviously using AI is a great tool for supercharging your coding, but it’s not magic. It’s important to know when to step away from using AI:

You aren’t getting the result you want after several revisions.

You no longer understand the code.

You are caught in a Copy, Paste, and Cross Your Fingers loop.

‼️ The “Copy, Paste, and Cross Your Fingers” Loop

AI generates some code, but it’s not perfect—it’s got some bugs. So, you ask it to fix those bugs, and sure, it does... but surprise! The new code comes with fresh problems. You go back and ask it to fix those, and now you’re stuck in this endless cycle: copying, pasting, and hoping for the best, all without really understanding what the code is doing. It’s like trying to patch a sinking boat without knowing where the holes are!

AI is a tool — not a replacement for your own logic and problem-solving skills. Take the time to read your code and know when it’s time to walk away from the AI and use your own brain!

Controlling Response Format

Summarize the purpose of SQL joins in 3 bullet points.

List the steps to process CSV files.

Create a table comparing SQL and Tableau.

Explain how Tableau and SQL work together using an analogy.

Describe hallucinations in the context of large language models in 30 words or less.

Give me an acronym to help me remember the structure of a case statement in SQL.

✏️

Try-It | Use ChatGPT as a Helpful Tutor
Think of a topic you’re unfamiliar with. Ask ChatGPT to explain it to you in whichever format you think would be helpful!

Other Tips

Don’t overthink it! Just get started. You can always redirect after getting an initial response. That’s the great thing about using a language model compared to other tools — it’s a conversation!

If your question is complex, break it into smaller steps! The AI will be more accurate if you break up complex tasks into more manageable steps.

Ask for explanations. If AI generates code, request a line-by-line breakdown as comments within the code block.

Verify AI-generated code. AI solutions are not always correct. Read the output carefully and test it before using.

Now, you might be wondering: will AI take over developers’ jobs? 🤔 Honestly, we can’t predict the future (if we could, we’d totally share those lottery numbers with you). But here’s the deal: being a great developer is so much more than just writing code. It’s about thinking critically, solving real-world problems, and creating things with empathy—skills that AI doesn’t quite have.

📝

Practice

Open up the Women on High Courts Replit.

Write a prompt that introduces an AI to your dataset. Be sure to share some context about it, the columns and datatypes, the fact that you are using SQL to explore the dataset, and any other information that is helpful.

Write a prompt asking the AI to guide you through one of the code-along exercises. Be sure to specify that you want help, not answers. Read through the response and determine whether this is a helpful answer.

Write a prompt asking the AI to help you understand one of the SQL queries that you find complicated or confusing. Read through the response and determine whether this is helpful answer. Be sure to ask follow up questions if you have any.

Come up with a question about the dataset that might require a complicated SQL query to answer. Example questions are below. Ask the AI to guide you through creating the SQL query step by step. Does the final query actually answer the question you came up with?
Example questions:
- What were the first ten countries to appoint a woman to a high court?
- What is a country whose percentage of women on high courts has declined over time?
- What is the average percentage of women on high courts for each region?

Discuss: Where was the AI most helpful? Where did it need additional context? Was there anything that the AI did not take into account? Were its answers accurate and useful?

💼 Takeaways

Prompt engineering is the art of crafting precise, clear, and specific instructions for AI tools

LLMs use tokenization for processing text and predicting the next word in the sequence

Use specificity, technical language, and context to improve responses

Next Steps

In this lesson, you practiced writing detailed prompts that give you awesome results. Now, use those skills throughout the remainder of camp! Check it out: there's a Gemini button in the top right corner of any Colab playground. Click it to open a Gemini chat right there in the same window – it's like having a coding buddy built-in to the Colab notebook!

For a summary of this lesson, check out the 7. AI-Assisted Coding One-Pager!