Merge 36f1c07317 into 2f7977a95a

Chore(AI): Fix sp500 subject and audit
Chore(DPxAI): Fix format
2024-09-18 16:41:41 +03:00 · 2024-09-09 09:58:11 +01:00 · 2024-09-06 11:18:31 +01:00 · 2024-09-06 11:18:31 +01:00 · 2024-09-06 11:18:31 +01:00 · 2024-09-06 11:18:31 +01:00
4 changed files with 53 additions and 36 deletions
--- a/js/tests/test.mjs
+++ b/js/tests/test.mjs
@ -149,17 +149,27 @@ ${tests.trim()};`.trim()
 }

 const loadAndSanitizeSolution = async () => {
-  const path = `${solutionPath}/${name}.js`
-  const rawCode = await read(path, "student solution")
+    try {
+        const path = `${solutionPath}/${name}.js`
+        const rawCode = await read(path, "student solution")
+        const sanitizedCode = removeComments(rawCode)

-  // this is a very crude and basic removal of comments
-  // since checking code is only use to prevent cheating
-  // it's not that important if it doesn't work 100% of the time.
-  const code = rawCode.replace(/\/\*[\s\S]*?\*\/|\/\/.*/g, "").trim()
-  if (code.includes("import")) fatal("import keyword not allowed")
-  return { code, rawCode, path }
+        if (sanitizedCode.includes("import ")) { // space is important as it prevents "imported" or "importance" or other words containing "import"
+            throw new Error("The use of the 'import' keyword is not allowed.")
+        }
+        return { code: sanitizedCode, rawCode, path }
+    } catch (error) {
+        console.error(error)
+    }
 }

+const removeComments = (code) => {
+    // removes JS single line and multi-line comments only. Not for bash files etc.
+    // for use with multiple file-types, I suggest writing a removeComments function with language-type as input and then handling accordingly
+    return code.replace(/\/\*[\s\S]*?\*\/|\/\/.*/g, "").trim()
+}
+
+
 const runTests = async ({ url, path, code }) => {
  const { setup, tests } = await import(url).catch(err =>
    fatal(`Unable to execute ${name}, error:\n${stackFmt(err, url)}`),
--- a/subjects/ai/emotions-detector/README.md
+++ b/subjects/ai/emotions-detector/README.md
@ -1,18 +1,22 @@
 ## Emotions detection with Deep Learning

 Cameras are everywhere. Videos and images have become one of the most interesting data sets for artificial intelligence.
-Image processing is a quite broad research area, not just filtering, compression, and enhancement. Besides, we are even interested in the question, “what is in images?”, i.e., content analysis of visual inputs, which is part of the main task of computer vision.
-The study of computer vision could make possible such tasks as 3D reconstruction of scenes, motion capturing, and object recognition, which are crucial for even higher-level intelligence such as image and video understanding, and motion understanding.
-For this 2 months project we will
-focus on two tasks:
+Image processing is a quite broad research area, not just filtering, compression, and enhancement.

- emotion classification
- face tracking
+Besides, we are even interested in the question, “what is in images?”, i.e., content analysis of visual inputs, which is part of the main task of computer vision.
+
+The study of computer vision could make possible such tasks as 3D reconstruction of scenes, motion capturing, and object recognition, which are crucial for even higher-level intelligence such as image and video understanding, and motion understanding.
+
+For this project we will focus on two tasks:
+
+- Emotion classification
+- Face tracking

 With the computing power exponentially increasing the computer vision field has been developing exponentially. This is a key element because the computer power allows using more easily a type of neural networks very powerful on images:
-CNN's (Convolutional Neural Networks). Before the CNNs were democratized, the algorithms used relied a lot on human analysis to extract features which obviously time-consuming and not reliable. If you're interested in the "old
-school methodology" [this article](https://towardsdatascience.com/classifying-facial-emotions-via-machine-learning-5aac111932d3) explains it.
-The history behind this field is fascinating! [Here](https://kapernikov.com/basic-introduction-to-computer-vision/) is a short summary of its history.
+
+- CNN's (Convolutional Neural Networks). Before the CNNs were democratized, the algorithms used relied a lot on human analysis to extract features which obviously time-consuming and not reliable. If you're interested in the "old school methodology" [this article](https://towardsdatascience.com/classifying-facial-emotions-via-machine-learning-5aac111932d3) explains it.
+
+- The history behind this field is fascinating! [Here](https://kapernikov.com/basic-introduction-to-computer-vision/) is a short summary of its history.

 ### Project goal and suggested timeline

@ -31,15 +35,18 @@ The two steps are detailed below.
 ### Preliminary:

 - Take [this course](https://www.coursera.org/learn/convolutional-neural-networks). This course is a reference for many reasons and one of them is the creator: **Andrew Ng**. He explains the basics of CNNs but also some more advanced topics as transfer learning, siamese networks etc ...
-  I suggest to focus on Week 1 and 2 and to spend less time on Week 3 and 4. Don't worry the time scoping of such MOOCs are conservative. You can attend the lessons for free!
+- I suggest to focus on Week 1 and 2 and to spend less time on Week 3 and 4. Don't worry the time scoping of such MOOCs are conservative. You can attend the lessons for free!

 - Participate in [this challenge](https://www.kaggle.com/c/digit-recognizer/code). The MNIST dataset is a reference in computer vision. Researchers use it as a benchmark to compare their models.
-  Start first with a logistic regression to understand how to handle images in Python. And then train your first CNN on this data set.
+
+- Start first with a logistic regression to understand how to handle images in Python. And then train your first CNN on this data set.

 ### Face emotions classification

 Emotion detection is one of the most researched topics in the modern-day machine learning arena. The ability to accurately detect and identify an emotion opens up numerous doors for Advanced Human Computer Interaction.
-The aim of this project is to detect up to seven distinct facial emotions in real time. This project runs on top of a Convolutional Neural Network (CNN) that is built with the help of Keras whose backend is TensorFlow in Python.
+The aim of this project is to detect up to seven distinct facial emotions in real time.
+
+This project runs on top of a Convolutional Neural Network (CNN) that is built with the help of Keras whose backend is TensorFlow in Python.
 The facial emotions that can be detected and classified by this system are Happy, Sad, Angry, Surprise, Fear, Disgust and Neutral.

 Your goal is to implement a program that takes as input a video stream that contains a person's face and that predicts the emotion of the person.
@ -49,10 +56,10 @@ Your goal is to implement a program that takes as input a video stream that cont
 - Download and unzip the [data here](https://assets.01-edu.org/ai-branch/project3/emotions-detector.zip).
  This dataset was provided for this past [Kaggle challenge](https://www.kaggle.com/competitions/challenges-in-representation-learning-facial-expression-recognition-challenge/overview).
  It is possible to find more information about on the challenge page. Train a CNN on the dataset `train.csv`. Here is an [example of architecture](https://www.quora.com/What-is-the-VGG-neural-network) you can implement.
-  **The CNN has to perform more than 70% on the test set**. You can use the `test_with_emotions.csv` file for this. You will see that the CNNs take a lot of time to train.
+  **The CNN has to perform more than 60% on the test set**. You can use the `test_with_emotions.csv` file for this. You will see that the CNNs take a lot of time to train.
  You don't want to overfit the neural network. I strongly suggest to use early stopping, callbacks and to monitor the training using the `TensorBoard`.

-You have to save the trained model in `my_own_model.pkl` and to explain the chosen architecture in `my_own_model_architecture.txt`. Use `model.summary())` to print the architecture.
+You have to save the trained model in `final_emotion_model.keras` and to explain the chosen architecture in `final_emotion_model_arch.txt`. Use `model.summary())` to print the architecture.
 It is also expected that you explain the iterations and how you end up choosing your final architecture. Save a screenshot of the `TensorBoard` while the model's training in `tensorboard.png` and save a plot with the learning curves showing the model training and stopping BEFORE the model starts overfitting in `learning_curves.png`.

 - Optional: Use a pre-trained CNN to improve the accuracy. You will find some huge CNN's architecture that perform well. The issue is that it is expensive to train them from scratch.
@ -86,13 +93,10 @@ project
 ├── environment.yml
 ├── README.md
 ├── results
-│   ├── hack_cnn
-│   │   ├── hacked_image.png
-│   │   └── input_image.png
 │   ├── model
 │   │   ├── learning_curves.png
-│   │   ├── my_own_model_architecture.txt
-│   │   ├── my_own_model.pkl
+│   │   ├── final_emotion_model_arch.txt
+│   │   ├── final_emotion_model.keras
 │   │   ├── pre_trained_model_architecture.txt
 │   │   └── pre_trained_model.pkl
 │   └── preprocessing_test
@ -101,7 +105,7 @@ project
 │       ├── image_n.png
 │       └── input_video.mp4
 └── scripts
-    ├── hack_the_cnn.py
+    |__ validation_loss_accuracy.py
    ├── predict_live_stream.py
    ├── predict.py
    ├── preprocess.py
@ -114,7 +118,7 @@ project
 ```prompt
 python ./scripts/predict.py

-Accuracy on test set: 72%
+Accuracy on test set: 62%

 ```

--- a/subjects/ai/emotions-detector/audit/README.md
+++ b/subjects/ai/emotions-detector/audit/README.md
@ -24,12 +24,12 @@

 ###### Does the text document explain why the architecture was chosen, and what were the previous iterations?

-###### Does the following command `python ./scripts/predict.py` run without any error and returns an accuracy greater than 70%?
+###### Does the following command `python ./scripts/predict.py` run without any error and returns an accuracy greater than 60%?

 ```prompt
    python ./scripts/predict.py

-    Accuracy on test set: 72%
+    Accuracy on test set: 62%

 ```

--- a/subjects/ai/sp500-strategies/README.md
+++ b/subjects/ai/sp500-strategies/README.md
@ -1,10 +1,12 @@
 ## Financial strategies on the SP500

-In this project we will apply machine to finance. You are a Quant/Data Scientist and your goal is to create a financial strategy based on a signal outputted by a machine learning model that over-performs the [SP500](https://en.wikipedia.org/wiki/S%26P_500).
+In this project, you'll apply machine learning to finance. Your goal as a Quant/Data Scientist is to create a financial strategy that uses a signal generated by a machine learning model to outperform the [SP500](https://en.wikipedia.org/wiki/S%26P_500).

-The Standard & Poor's 500 Index is a collection of stocks intended to reflect the overall return characteristics of the stock market as a whole. The stocks that make up the S&P 500 are selected by market capitalization, liquidity, and industry. Companies to be included in the S&P are selected by the S&P 500 Index Committee, which consists of a group of analysts employed by Standard & Poor's.
-The S&P 500 Index originally began in 1926 as the "composite index" comprised of only 90 stocks. According to historical records, the average annual return since its inception in 1926 through 2018 is approximately 10%–11%. The average annual return since adopting 500 stocks into the index in 1957 through 2018 is roughly 8%.
-As a Quant Researcher, you may beat the SP500 one year or few years. The real challenge though is to beat the SP500 consistently over decades. That's what most hedge funds in the world are trying to do.
+The S&P 500 Index is a collection of 500 stocks that represent the overall performance of the U.S. stock market. The stocks in the S&P 500 are chosen based on factors like market value, liquidity, and industry. These selections are made by the S&P 500 Index Committee, which is a group of analysts from Standard & Poor's.
+
+The S&P 500 started in 1926 with only 90 stocks and has grown to include 500 stocks since 1957. Historically, the average annual return of the S&P 500 has been about 10-11% since 1926, and around 8% since 1957.
+
+As a Quantitative Researcher, your challenge is to develop a strategy that can consistently outperform the S&P 500, not just in one year, but over many years. This is a difficult task and is the primary goal of many hedge funds around the world.

 The project is divided in parts:

@ -199,4 +201,5 @@ Note: `features_engineering.py` can be used in `gridsearch.py`

 ### Files for this project

-You can find the data required for this project in this [link](https://assets.01-edu.org/ai-branch/project4/project04-20221031T173034Z-001.zip)
+You can find the data required for this project in this :
+[link](https://assets.01-edu.org/ai-branch/project4/project04-20221031T173034Z-001.zip)
Author	SHA1	Message	Date
sagarishere	0338dfad69	Merge `36f1c07317` into `2f7977a95a`	2024-09-18 16:41:41 +03:00
Oumaima Fisaoui	2f7977a95a	Chore(AI): Fix sp500 subject and audit	2024-09-09 09:58:11 +01:00
Oumaima Fisaoui	1d34ea0a71	Chore(DPxAI): Fix format	2024-09-06 11:18:31 +01:00
Oumaima Fisaoui	75472c0ed6	Chore(DPxAI): Fix format	2024-09-06 11:18:31 +01:00
Oumaima Fisaoui	cccab05477	Chore(DPxAI): Fix format	2024-09-06 11:18:31 +01:00
Oumaima Fisaoui	aa54ab1e66	Chore(DPxAI): Fix format	2024-09-06 11:18:31 +01:00
Oumaima Fisaoui	62486ed720	Chore(DPxAI): Fix the accuracy on test set	2024-09-06 11:18:31 +01:00
Oumaima Fisaoui	659074232f	Chore(AI): Fix emotions detector	2024-09-06 11:18:31 +01:00
sagarishere	36f1c07317	Improved checking for import, moved removeComments Improved checking of imports and moved removeComments to separate function, clearly explaining that it's limited to javascript type of comments.	2023-01-24 22:11:51 +02:00