mirror of https://github.com/01-edu/public.git
fix(ai/numpy): fix readme and audit
This commit is contained in:
parent
ac0e7766ed
commit
bf5260262f
|
@ -290,7 +290,7 @@ Expected output:
|
|||
|
||||
The goal of this exercise is to perform fundamental data analysis on real data using NumPy.
|
||||
|
||||
The dataset chosen for this task is the [red wine dataset](https://archive.ics.uci.edu/ml/datasets/wine+quality)
|
||||
The dataset chosen for this task was the [red wine dataset](./data/winequality-red.csv). You can find more info [HERE](./data/)
|
||||
|
||||
1. Load the data using `genfromtxt`, specifying the delimiter as ';', and optimize the numpy array size by reducing the data types. Use `np.float32` and verify that the resulting numpy array weighs **76800 bytes**.
|
||||
|
||||
|
|
|
@ -314,16 +314,7 @@ else:
|
|||
|
||||
"Display the 2nd, 7th, and 12th rows as a two-dimensional array. Exclude `np.nan` values if present."
|
||||
|
||||
###### Is the output the following?
|
||||
|
||||
```console
|
||||
[[ 7.8 0.76 0.04 2.3 0.092 15. 54. 0.997 3.26
|
||||
0.65 9.8 5. ]
|
||||
[ 7.3 0.65 0. 1.2 0.065 15. 21. 0.9946 3.39
|
||||
0.47 10. 7. ]
|
||||
[ 5.6 0.615 0. 1.6 0.089 16. 59. 0.9943 3.58
|
||||
0.52 9.9 5. ]]
|
||||
```
|
||||
###### Is the output in line with the data present in the provided dataset in the subject?
|
||||
|
||||
##### For question 3:
|
||||
|
||||
|
|
|
@ -1,8 +1,8 @@
|
|||
Citation Request:
|
||||
This dataset is public available for research. The details are described in [Cortez et al., 2009].
|
||||
This dataset is public available for research. The details are described in [Cortez et al., 2009].
|
||||
Please include this citation if you plan to use this database:
|
||||
|
||||
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
|
||||
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
|
||||
Modeling wine preferences by data mining from physicochemical properties.
|
||||
In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.
|
||||
|
||||
|
@ -10,43 +10,43 @@ Citation Request:
|
|||
[Pre-press (pdf)] http://www3.dsi.uminho.pt/pcortez/winequality09.pdf
|
||||
[bib] http://www3.dsi.uminho.pt/pcortez/dss09.bib
|
||||
|
||||
1. Title: Wine Quality
|
||||
1. Title: Wine Quality
|
||||
|
||||
2. Sources
|
||||
Created by: Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis (CVRVV) @ 2009
|
||||
|
||||
|
||||
3. Past Usage:
|
||||
|
||||
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
|
||||
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
|
||||
Modeling wine preferences by data mining from physicochemical properties.
|
||||
In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.
|
||||
|
||||
In the above reference, two datasets were created, using red and white wine samples.
|
||||
The inputs include objective tests (e.g. PH values) and the output is based on sensory data
|
||||
(median of at least 3 evaluations made by wine experts). Each expert graded the wine quality
|
||||
(median of at least 3 evaluations made by wine experts). Each expert graded the wine quality
|
||||
between 0 (very bad) and 10 (very excellent). Several data mining methods were applied to model
|
||||
these datasets under a regression approach. The support vector machine model achieved the
|
||||
best results. Several metrics were computed: MAD, confusion matrix for a fixed error tolerance (T),
|
||||
etc. Also, we plot the relative importances of the input variables (as measured by a sensitivity
|
||||
analysis procedure).
|
||||
|
||||
|
||||
4. Relevant Information:
|
||||
|
||||
The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine.
|
||||
For more details, consult: http://www.vinhoverde.pt/en/ or the reference [Cortez et al., 2009].
|
||||
Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables
|
||||
Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables
|
||||
are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).
|
||||
|
||||
These datasets can be viewed as classification or regression tasks.
|
||||
The classes are ordered and not balanced (e.g. there are munch more normal wines than
|
||||
excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent
|
||||
or poor wines. Also, we are not sure if all input variables are relevant. So
|
||||
it could be interesting to test feature selection methods.
|
||||
it could be interesting to test feature selection methods.
|
||||
|
||||
5. Number of Instances: red wine - 1599; white wine - 4898.
|
||||
5. Number of Instances: red wine - 1599; white wine - 4898.
|
||||
|
||||
6. Number of Attributes: 11 + output attribute
|
||||
|
||||
|
||||
Note: several of the attributes may be correlated, thus it makes sense to apply some sort of
|
||||
feature selection.
|
||||
|
||||
|
@ -66,7 +66,7 @@ Citation Request:
|
|||
9 - pH
|
||||
10 - sulphates
|
||||
11 - alcohol
|
||||
Output variable (based on sensory data):
|
||||
Output variable (based on sensory data):
|
||||
12 - quality (score between 0 and 10)
|
||||
|
||||
8. Missing Attribute Values: None
|
||||
|
|
Loading…
Reference in New Issue