[00:00:00] Will AI Take Data Analyst Jobs? An Exploration with Chachibt's Data Interpreter
[00:00:00] Luke Barousse: Data nerds Chachibt's new code interpreter plugin can do some pretty advanced problem solving for my job, like analyzing this data set and showing me some pretty incredible insights in a matter of seconds.
[00:00:12] Luke Barousse: And a lot of people are claiming that this is going to take away data analyst jobs.
[00:00:16] Luke Barousse: So I've been testing this bad boy non stop since its release, and I have some prompts to see if it's true.
[00:00:21] Luke Barousse: First test up is some EDA or just exploring a data set.
[00:00:25] Luke Barousse: Let's see what type of files this tool can take, and it looks like it takes a number of files.
[00:00:29] Luke Barousse: Let's go.
[00:00:29] Luke Barousse: With CSV.
[00:00:30] Luke Barousse: Providing this data set with no prompt, it takes the initiative to start diving into this and exploring what this CSV is even about.
[00:00:38] Luke Barousse: It shows a sample of the first four rows of the data set, along with some of the columns.
[00:00:42] Luke Barousse: And all of this is done with Python code, which you can easily see it if you want to, and that's not too bad.
[00:00:48] Luke Barousse: Let's move into a harder question, asking it what this data set is even about.
[00:00:53] Luke Barousse: It pretty impressively.
[00:00:54] Luke Barousse: Identifies that this is a collection of job postings related to data science roles, where each row in the data set represents a different job posting.
[00:01:01] Luke Barousse: It then goes as far to highlight some of the key columns in the data set with a description.
[00:01:05] Luke Barousse: Next, I prompt it with your knowledge of the data set, perform exploratory data analysis, and it starts by identifying five main steps that it's going to take.
[00:01:14] Luke Barousse: First, it shows the data types of this, and it looks like everyone's an object, so that's not really useful.
[00:01:19] Luke Barousse: From there, it even identifies there are missing values with a summary, clarifying which columns have a significant number missing.
[00:01:26] Luke Barousse: Next, it provides some summary statistics about the numerical columns.
[00:01:29] Luke Barousse: It even goes as far as visualizing these distributions on its own.
[00:01:33] Luke Barousse: And then providing an analysis of it points out that the salary data is skewed to the right, which is typical for salary data.
[00:01:41] Luke Barousse: That's pretty neat.
[00:01:41] Luke Barousse: That provides this analysis of how this compares to expected values.
[00:01:45] Luke Barousse: Finally, it identifies some columns of interest that it wants to dive into further.
[00:01:49] Luke Barousse: And it provides these visualizations showing what are the top ten job titles and what are the top ten companies.
[00:01:56] Luke Barousse: And it looks like data engineers and data scientists are beating out data analysts.
[00:02:00] Luke Barousse: And so if I want to export this code, all I have to do is prompt it.
[00:02:04] Luke Barousse: And this is pretty revolutionary.
[00:02:06] Luke Barousse: So I think it's settled.
[00:02:08] Luke Barousse: Code Interpreter it's taking my job.
[00:02:14] Luke Barousse: Well, not so fast.
[00:02:16] Luke Barousse: To answer that, we need to look at this.
[00:02:18] Luke Barousse: Yes, it's a blank Excel spreadsheet.
[00:02:20] Luke Barousse: But hear me out.
[00:02:21] Luke Barousse: In order to look into the future of what AI holds for our jobs, we need to look at the past, what previous tools have done to transform our jobs.
[00:02:28] Luke Barousse: Before spreadsheets migrated to computers, they started out being physical papers that accountants would use in order to calculate finances.
[00:02:36] Luke Barousse: Yeah, a physical paper.
[00:02:38] Luke Barousse: My hands hurt just thinking about this.
[00:02:40] Luke Barousse: There were entire departments with numerous accountants with the sole purpose of updating these paper spreadsheets.
[00:02:46] Luke Barousse: Then in comes the invention of the personal computer and these dudes want to revolutionize the way we work.
[00:02:52] Luke Barousse: In this video, you're going to see the future.
[00:02:54] Luke Barousse: And so the electronic spreadsheet was conceived.
[00:02:57] Luke Barousse: Marketers everywhere began to overdramatize just how powerful this tool was going to be, with sub ads claiming it would take up to 150 jobs.
[00:03:05] Luke Barousse: But you know what happened to accountants jobs after this?
[00:03:07] Luke Barousse: Well, they actually increased.
[00:03:09] Luke Barousse: Accountants could now refocus their time from tallying numbers and focus on more important things like building those mind numbing PowerPoints.
[00:03:17] Luke Barousse: But seriously, their attention now shifted into performing deeper analysis.
[00:03:21] Luke Barousse: They could now use these tools and provide higher value with their jobs.
[00:03:25] Luke Barousse: And with that history lesson, maybe we can also infer where we'll go next with these AI tools.
[00:03:31] Exploring the Limitations of Code Interpreter in Data Analysis
[00:03:31] Luke Barousse: Alright, so it's been a few days and I've been going through and using code interpreter for my job analyzing and trying to find any limitations.
[00:03:38] Luke Barousse: Surprisingly, I found quite a bit.
[00:03:40] Luke Barousse: Also during this, OpenAI released a new feature for custom instructions.
[00:03:44] Luke Barousse: So I've customize my graphs a little bit and they're going to look a little bit different.
[00:03:47] Luke Barousse: So let's dive into some of those limitations by doing a deeper dive of that data set that we were exploring before.
[00:03:53] Luke Barousse: We're going to start with a new chat since my last one timed out.
[00:03:55] Luke Barousse: Here I have a folder with a python file we exported last the data set and also a text file that I had Chat GBT output last time.
[00:04:02] Luke Barousse: That summarized all our analysis.
[00:04:04] Luke Barousse: I compress this into a zip file for upload and then prompt Chat GBT to familiarize itself with the contents of this file.
[00:04:11] Luke Barousse: And it looks like it knows where we left off.
[00:04:13] Luke Barousse: So we're going to dive into exploring the skills from these job postings.
[00:04:17] Luke Barousse: Specifically, I want to see what is the most common skill requested.
[00:04:21] Luke Barousse: And conveniently, it gives me this graph which looks pretty good at first sight.
[00:04:24] Luke Barousse: But after diving into it, I find that the highest number of occurrences of the keyword python is at 56,000, which that's not possible because there's only 50,000 job postings.
[00:04:35] Luke Barousse: And chat.
[00:04:36] Luke Barousse: GPT should know this.
[00:04:37] Luke Barousse: It was in the summer.
[00:04:37] Luke Barousse: This type of mistake is something I would expect a data analyst to pick up on, and yet Chat GBT doesn't.
[00:04:43] Luke Barousse: So I reprompt Chat GPT to fix this error and then I take it a step further by having it display these keywords as a percentage vice as occurrence.
[00:04:51] Luke Barousse: And we finally get this visualization showing the top 20 keywords in data science job postings.
[00:04:56] Luke Barousse: Now, I did this same analysis a couple of days ago and I ran into even more issues.
[00:05:01] Luke Barousse: The first time I asked for this, it just gave me blank graphs, and then it had the audacity to start hallucinating what the top skills were in this data set.
[00:05:09] Luke Barousse: It basically reverted back to what it knew as a large language model vice, actually using the data provided.
[00:05:15] Luke Barousse: I then reprompted telling it that the graphs were blank, that it needed to fix this error.
[00:05:19] Luke Barousse: And it didn't really seem to fix it.
[00:05:21] Luke Barousse: Eventually, I just prompted it to print.
[00:05:23] Luke Barousse: After every step in the code, it somehow worked itself out and ended up getting this final visualization.
[00:05:28] Luke Barousse: So, going back to that Excel history lesson, yes, this is a powerful tool, but it still takes some sort of human operator to help guide and steer this tool on where it actually needs to go and make sure that it's staying on track.
[00:05:42] Luke Barousse: This is especially true when we're diving into deeper, more complicated subject areas.
[00:05:47] Luke Barousse: But what happens if I need a quick ad hoc analysis of maybe a subject area that I'm not familiar with, like something that's not data science?
[00:05:55] Luke Barousse: Job postings.
[00:05:59] Using OpenAI's Code Interpreter for Speaker Amplification Analysis
[00:05:59] Luke Barousse: All right, so I think I have a unique use case for code interpreter and involves this Alexa play.
[00:06:07] Luke Barousse: Spotify.
[00:06:09] Luke Barousse: Sure.
[00:06:09] Luke Barousse: Here's Spotify.
[00:06:13] Luke Barousse: So, I have two outdoor speakers here, and if you can't tell, it's really not that loud.
[00:06:19] Luke Barousse: There's a big problem that I'm having.
[00:06:20] Luke Barousse: These speakers themselves are meant to be actually connected into some sort of amplifier, and right now, we just have them going right into Alexa.
[00:06:30] Luke Barousse: When I go to Amazon and look for an outdoor amplifier, I get a shit whack of results.
[00:06:35] Luke Barousse: I'm not really sure which amplifier to choose, so I searched through quite a few forums trying to find out what size amplifier I needed to get, and it looks like it's really math based.
[00:06:43] Luke Barousse: So that's why I think code interpreter is gonna be perfect for this.
[00:06:45] Luke Barousse: So I looked up the model number online and found the different specs that I think I needed, and I gave this information to it.
[00:06:51] Luke Barousse: So this is pretty crazy.
[00:06:52] Luke Barousse: Chat GPT went through and knew it needed to calculate both the impedance and maximum power, and it used some Python code to actually calculate both of those things and determine what it needs to be.
[00:07:04] Luke Barousse: So I think I found the perfect amplifier from this company called Xerong.
[00:07:08] Luke Barousse: So now we just got to wait for that amplifier to come in, and we'll test to see if Chat GBT was right.
[00:07:13] Luke Barousse: So I've tested code interpreter to its limits, and I think I've reached it.
[00:07:17] Luke Barousse: Now I have some bad news.
[00:07:19] Luke Barousse: Code interpreter probably not going to take away my job with these limitations.
[00:07:23] Luke Barousse: 1 second thought, it's actually good news.
[00:07:25] Luke Barousse: So, let's say we have some data online, like this Google Sheet full of data.
[00:07:28] Luke Barousse: In the past, I've used Python to connect to this.
[00:07:30] Luke Barousse: However, it tells me it doesn't have the ability to access the Internet, prompting it further asking if I just pip install libraries.
[00:07:36] Luke Barousse: It tells me this is prevented as a security feature designed to protect user data and privacy.
[00:07:41] Luke Barousse: Kind of get this because chatgb's previous plugins had issues when accessing the Internet.
[00:07:47] Luke Barousse: Anyway, because of all of this, I now have to take an extra step of downloading the data that I need and then uploading it to chatgbt.
[00:07:54] Luke Barousse: And my data is spread all over the place.
[00:07:56] Luke Barousse: I don't just have it in Google Sheets, but also have it in things like databases.
[00:08:00] Luke Barousse: So I downloaded one of my databases to a CSV for analysis, which at about a million rows was a pretty big data set.
[00:08:07] Luke Barousse: When I tried to upload it, it gave me this warning that it has a small ass file limit and an environment limit of only two gigs.
[00:08:14] Luke Barousse: So I can't even get all the data that I have here into it to analyze, which is like the first step of my job.
[00:08:20] Luke Barousse: I found that most I could get in 200,000 rows of data, so I was super disappointed in this.
[00:08:25] Luke Barousse: Now there are workarounds in chat GBT for connecting to an external data source like the plugin Notable, and I have a whole video on it.
[00:08:32] Luke Barousse: But comparing both of these tools, although Notable excels in some areas like data connections, I find code interpreter performs a much more thorough analysis with less prompting.
[00:08:42] Luke Barousse: Anyway, these internet issues and file limits aren't even the most detrimental issues to code interpreter.
[00:08:48] Luke Barousse: So I pulled my subscribers on LinkedIn and Twitter, I mean X, and asked them what is stopping them from implementing this tool in their jobs?
[00:08:56] Luke Barousse: And it was a resounding consensus that they had concerns with security issues.
[00:09:00] Luke Barousse: You see, these chatbots take these prompts and also data that you give it to, then be used to build on and improve these chat bots.
[00:09:07] Luke Barousse: The problem is, if it's confidential data, it could be seen by the reviewers of this chat bot, or even worse, it could be fed back into this chatbot and potentially be prompted by another user and seen by them.
[00:09:20] Luke Barousse: It's kind of a big deal.
[00:09:21] Luke Barousse: So big in fact, that Google has told its own employees not to put confidential data into their very own chatbot barred.
[00:09:28] Luke Barousse: That's like telling meta employees they can't use Facebook or Instagram.
[00:09:32] Luke Barousse: There's a pretty big red flag.
[00:09:33] Luke Barousse: However, with all these limitations, I think there's hope for the future.
[00:09:37] Luke Barousse: Take midjourney for an example, an AI tool to generate art content.
[00:09:41] Luke Barousse: Look how far this tool has come in as little as a year.
[00:09:44] Luke Barousse: So imagine we'll be in the future with these type of tools once we get through all these different limitations.
[00:09:50] Luke Barousse: Anyway, the sample arrived today and we're going to go install it.
[00:09:54] Luke Barousse: I hope it doesn't go x wrong.
[00:10:02] Luke Barousse: After unpacking this bad boy, I realized it had no instructions.
[00:10:05] Luke Barousse: So I resorted to using chat GPT to tell me how to install this and gave okay advice.
[00:10:12] Luke Barousse: Anyway, after a few steps and nearly getting shocked and losing my life, I was able to get it installed.
[00:10:18] Luke Barousse: All that's left now is to test that bad boy.
[00:10:20] Luke Barousse: Oh, if you're curious about that video that used the notable plugin, it's right here with that Lexa play.
[00:10:26] Luke Barousse: Spotify here's.
[00:10:29] Luke Barousse: Spotify, as always, got value out of this video.
[00:10:35] Luke Barousse: Smash that like button that see you in the next one.