Data Analysis Using LLM(GPT-3.5) | Retail data analytics using langchain
The project “Data Analysis Using LLM | ChatGPT Data analysis | Retail data analytics using Langchain” aims to leverage natural language processing (NLP) techniques and Langchain’s ChatGPT model to perform data analysis on a retail dataset. The approach involves integrating ChatGPT with a SQL database to enable querying and analysis of the retail data.
1. Objectives:
- Explore the potential of LLMs like ChatGPT for data analysis tasks.
- Develop a conversational interface for querying and analyzing retail data.
- Demonstrate the effectiveness of LLM-driven analytics in extracting actionable insights from retail datasets.
- Evaluate the performance and usability of the developed system in real-world scenarios.
2. Approach
- Data Collection: Gather retail datasets containing relevant information such as sales transactions, date, productcategory, price, quantity etc.
- Kaggle public data source: https://www.kaggle.com/datasets/mohammadtalib786/retail-sales-dataset/code
- Preprocessing: The data is cleaned and well organized.
- Importing data and loading to SQLite3
# import necessaries libraries
from langchain.chat_models import ChatOpenAI
import sqlite3
import pandas as pd
import mykey
# Set your openai API key
OPENAI_API_KEY= mykey.OPENAI_API_KEY
# Initialize the langchain openAI
llm= ChatOpenAI(openai_api_key= OPENAI_API_KEY, model_name="gpt-3.5-turbo-1106")
# Reading input data
retail_df= pd.read_csv("retail_dataset.csv")
# Connect to sqlite database
sqlite_conn= sqlite3.connect("customer.db")
# convert df to sqlite table named retail_sales_table
retail_df.to_sql("Sales", sqlite_conn, if_exists='replace')
- Model Integration: Integrate GPT-3.5 into the data analytics pipeline to enable natural language interaction with the dataset.
- Query Processing: Develop algorithms to parse and interpret user queries, translating them into actionable data analysis tasks.
from langchain.agents import create_sql_agent
from langchain.agents.agent_toolkits import SQLDatabase
from langchain.agents.agent_types import AgentTypeToolkit
from langchain.sql_database import SQLDatabase
my_db= SQLDatabase.from_uri('sqlite:///customer.db')
# create an sql agent executor with specified parameters
agent_executor = create_sql_agent(llm, db=my_db, agent_type="openai-tools", verbose=True)
# User inquery
user_inquery= "What is the average sales accross diffrent product category?"
# run the agent executor
agent_executor.run(user_inquery)
- Now we can create our FewShotPromptTemplate, which takes our example selector, an example prompt for formatting each example, and a string prefix and suffix to put before and after our formatted examples:
from langchain_core.prompts import (
ChatPromptTemplate,
FewShotPromptTemplate,
MessagesPlaceholder,
PromptTemplate,
SystemMessagePromptTemplate,
)
system_prefix = """You are an agent designed to interact with a SQL database.
My role is to analyse the transaction in patterns of customers. The feature Engineering in table 'Sales' is crucial for statiscal exploration. For example:
- column 'Age' can be grouped into range of ages such as 21-25, 26-30, and so on.
Understanding the data of these columns helps us gain insights about our customers, enabling us to offer personnalized offer services and develop effective marketing strategies.
"""
few_shot_prompt = FewShotPromptTemplate(
example_selector=example_selector,
example_prompt=PromptTemplate.from_template(
"User input: {input}\nSQL query: {query}"
),
input_variables=["input", "dialect", "top_k"],
prefix=system_prefix,
suffix="",
)
Since our underlying agent is an Open AI agent tool, which uses OpenAI function calling, our full prompt should be a chat prompt with a human message template and an agent_scratchpad MessagesPlaceholder
. The few-shot prompt will be used for our system message:
full_prompt = ChatPromptTemplate.from_messages(
[
SystemMessagePromptTemplate(prompt=few_shot_prompt),
("human", "{input}"),
MessagesPlaceholder("agent_scratchpad"),
]
)
- And now we can create our agent with our custom prompt:
agent = create_sql_agent(
llm=llm,
db=my_db,
prompt=full_prompt,
verbose=True,
agent_type="openai-tools",
)
# run the agent with the formatted template
agent.invoke({"input": "What age group is buying most?"})
3. Expected Outcomes:
- A functional prototype demonstrating the integration of LLMs into retail data analytics.
- Improved accessibility and usability of data analysis tools through natural language interfaces.
- Enhanced decision-making capabilities for retail stakeholders through intuitive data exploration and insight generation.
- Insights gained from the project can inform future research and development efforts in the intersection of NLP and data analytics.
By combining the power of LLMs with traditional data analytics methodologies, this project seeks to advance the state-of-the-art in retail data analysis and pave the way for more intuitive and user-friendly analytical tools.
References:
- https://python.langchain.com/docs/use_cases/sql/agents
- https://www.google.com/search?sca_esv=f821761c2da7a39d&q=Data+Analysis+Using+LLM+%7C+ChatGPT+Data+analysis+%7C+Retail+data+analytics+using+langchain&tbm=isch&source=lnms&prmd=vinbtz&sa=X&ved=2ahUKEwi_57TGroGFAxVkbPEDHVMJCsEQ0pQJegQIDxAB&biw=1517&bih=718&dpr=0.9#imgrc=44rzGPEdXAe8iM
- https://www.google.com/url?sa=i&url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DziuXicfYnHo&psig=AOvVaw17OtGmfMtN4cwBSbYfmsmA&ust=1710973351591000&source=images&cd=vfe&opi=89978449&ved=0CBQQjhxqFwoTCOC6pM6ugYUDFQAAAAAdAAAAABAI