https://bit.ly/IA_AD

Herramientas útiles para investigación con análisis de datos

Two applications of AI in Data Analysis

  • Use AI assistants to write code for Data Analysis
    (or any other coding task \(\to\) broader audience).

  • Use AI to actually do Data Analysis.

We will focus on the first, but we will also give some hints about the second.

AI assistants to write code for Data Analysis

Focus on two assistants:

  • Google’s Gemini via Colab. To use Colab and Gemini all you need is a Google account.

  • GitHub’s Copilot. We will provide access instructions below.

Example with Google Gemini

  • Our goal in this example is to illustrate the use of AI code assistants such as Google’s Gemini or GitHub Copilot in any task involving code writing. We will use a data analysis problem as an example, but the same general ideas apply to many other situations.

  • The following link (Google account required) opens up a Colab notebook for this example. When the link opens, click on the button on top.

  • There is also an static version of a previous run of the notebook that you can find here

Rerunning the example with GitHub Copilot and VS Code

  • The same example using GitHub Copilot in VS Code can be found in the GitHub repo for this session.

  • We have also created a (static snapshot) html version of the notebook that you can inspect in a web browser.

  • To try it you need Python and a virtual environment, highly recommended. The CONDA_SETUP.html file provides some extra information. Talk to someone experienced if you are not!

  • For new Copilot users: Video recommendation

How to get access to Copilot through GitHub Education

  • First you need a (free) GitHub account using the University email.
  • For academic users (students & teachers) this link shows how to get a (free) GitHub Education account, that includes Copilot. Also look here.
  • The recommended way to use it is with VS Code. Once you have VS Code installed you can follow these instructions.

Uses of AI to actually do Data Analysis

  • LLMs are particularly well-suited for tasks involving NLP: sentiment analysis, text classification, summarization, and question-answering.

  • OpenAI API offers a Python library that can be used for that. Not free! You pay per tokens read and generated (0.02€ for the example below).

  • To do this we create a prompt template and programatically use it with our data. For sentiment analysis (positive/negative) we create a prompt for each text in the dataset. See an example notebook here (for VS code): OpenAI_API_example.html.

Hugging Face.

  • Hugging Face is a company and open-source community best known for the Transformers Python Library, with > 1.8M pre-trained models. See the References section.

  • Hugging Face Spaces is a platform that allows hosting and sharing model demos and applications.

  • This Colab notebook contains a very basic example using some models via Hugging Face.

  • Trummer’s book introduces LangChain and Llama-Index frameworks to build complex multi-step data analysis pipelines.

References

Guja, Artur, Marlena Siwiak, and Marian Siwiak. 2024. Starting Data Analytics with Generative AI and Python. Shelter Island, NY: Manning Publications.
Lee, Wei-Meng. 2024. Hugging Face in Action. Shelter Island, NY: Manning Publications. https://www.manning.com/books/hugging-face-in-action.
Porter, Leo, and Daniel Zingaro. 2024. Learn AI-Assisted Python Programming: With GitHub Copilot and ChatGPT. Manning Publications.
Trummer, Immanuel. 2025. Data Analysis with LLMs: Text, Tables, Images and Sound. Manning Publications.