ProjectSearch
This is a pseudo-search engine which goes through notes and gives answers to questions.
This project is work in progress(the code works).
We expect to make a python program which takes notes in .txt format and ask for questions and answer them.
Instructions to run the code
- Make a python virtual environment by running this command in the terminal.
python3 -m venv venv - Activate the virtual environment by running this command
source src/venv/Scripts/activate - Install all the required packages by running this command
pip3 install requirements.txt - Run the code by running this command.
python3 src/main.py corpus
Directory Structure
📦src
┣ 📂data
┃ ┣ 📂corpus
┃ ┃ ┣ 📜artificial_intelligence.txt
┃ ┃ ┣ 📜machine_learning.txt
┃ ┃ ┣ 📜natural_language_processing.txt
┃ ┃ ┣ 📜neural_network.txt
┃ ┃ ┣ 📜probability.txt
┃ ┃ ┗ 📜python.txt
┃ ┗ 📜loadData.py
┣ 📂process
┃ ┣ 📜qprocess.py
┃ ┗ 📜tfidf.py
┣ 📂tests
┃ ┗ 📜tests.py
┣ 📜main.py
┗ 📜requirements.txt
src/ directory
src/directory hasmain.pyand two other directoriesdata/andprocess/data/directory- It has
data.pyand thecorpus/directoryprocess/directory - It has two files
tfidf.pyandqprocess.py
files
We have 4 scripts as of now
main.pydata.pytifidf.pyqprocess.py
main.py
- It is the main script which links all other files
data.py
- It has functions to curate the data , to search from
tfidf.py
- It has the functions to find the results.
qprocess.py
- It has the funtions to filter the questions to provide better answers.
Current Deliverables
- take an input from
.txtfiles and - take questions as input and also the marking for the question
- make a tf-idf algorithm to rank the sentences
- and return those sentences
Deadline
We expect to complete this project before March 2021