- Use the following learning schemes to analyze the zoo data (in zoo.arff):
OneR
– weka.classifiers.OneR
Decision table
– weka.classifiers.DecisionTable -R
C4.5
– weka.classifiers.j48.J48
K-means
– weka.clusterers.SimpleKMeans
Try using reduced error pruning for the C4.5. Did it change the produced model? Why?
For K-means, for the first run, set k=10. Adjust as needed. What was the final number of k? Why?
- Use the following learning schemes to analyze the breast tumor data.
Linear regression
– weka.classifiers.LinearRegression
M5′
– weka.classifiers.M5′
Regression Tree
– weka.classifiers.M5′
K-means clustering
– weka.clusterers.SimpleKMeans
A) How many leaves did the Model tree produce? Regression Tree? What happens if you change the pruning factor?
How many clusters did you choose for the K-means method? Was that a good choice? Did you try a different value for k?
B) Now perform the same analysis on the bodyfat.arff data set.
- Use a k-means clustering technique to analyze the iris data set. What did you set the k value to be? Try several different values. What was the random seed value? Experiment with different random seed values. How did changing of these values influence the produced models?
- Produce a hierarchical clustering (COBWEB) model for iris data. How many clusters did it produce? Why? Does it make sense? What did you expect?