Use AI assistants to write code for Data Analysis
(or any other coding task \(\to\) broader audience).
Use AI to actually do Data Analysis.
We will focus on the first, but we will also give some hints about the second.
Focus on two assistants:
Google’s Gemini via Colab. To use Colab and Gemini all you need is a Google account.
GitHub’s Copilot. We will provide access instructions below.
Our goal in this example is to illustrate the use of AI code assistants such as Google’s Gemini or GitHub Copilot in any task involving code writing. We will use a data analysis problem as an example, but the same general ideas apply to many other situations.
The following link (Google account required) opens up a Colab notebook for this example. When the link opens, click on the button on top.
There is also an static version of a previous run of the notebook that you can find here
The same example using GitHub Copilot in VS Code can be found in the GitHub repo for this session.
We have also created a (static snapshot) html version of the notebook that you can inspect in a web browser.
To try it you need Python and a virtual environment, highly recommended. The CONDA_SETUP.html file provides some extra information. Talk to someone experienced if you are not!
For new Copilot users: Video recommendation
LLMs are particularly well-suited for tasks involving NLP: sentiment analysis, text classification, summarization, and question-answering.
OpenAI API offers a Python library that can be used for that. Not free! You pay per tokens read and generated (0.02€ for the example below).
To do this we create a prompt template and programatically use it with our data. For sentiment analysis (positive/negative) we create a prompt for each text in the dataset. See an example notebook here (for VS code): OpenAI_API_example.html.
Hugging Face is a company and open-source community best known for the Transformers Python Library, with > 1.8M pre-trained models. See the References section.
Hugging Face Spaces is a platform that allows hosting and sharing model demos and applications.
This Colab notebook contains a very basic example using some models via Hugging Face.
Trummer’s book introduces LangChain and Llama-Index frameworks to build complex multi-step data analysis pipelines.