Data mining – Ace Star Tutors

[ad_1]

Please note the reference if anything is mentioned.
1.When does a decision tree overfits the training data? Explain Prepruning and Postpruning approaches to avoid overfitting in decision trees.

What is the use of kernel functions in SVM?
3.K-Means does not explicitly use a fitness function. What are the characteristics of the solutions that K-Means finds — which fitness function does it implicitly minimize?
) Assume the following dataset is given: (2,2), (4,4), (5,5), (6,6), (8,8),(9,9), (0,4), (4,0) . K-Means is used with k=4 to cluster the dataset. Moreover, Manhattan distance is used as the distance function (formula below) to compute distances between centroids and objects in the dataset. Moreover, K-Means’s initial clusters C1, C2, C3, and C4 are as follows:
C1: {(2,2), (4,4), (6,6)} C2: {(0,4), (4,0)}
C3: {(5,5), (9,9)}
C4: {(8,8)}
Manhattan distance => d((x1,x2),(x1’,x2’))= |x1-x1’| + |x2-x2|
Now K-means is run for a single iteration; what are the new clusters and what are their centroids?
4.Class-imbalance problem happens when there is a large difference between positive and negative samples such as in medical diagnosis or fraud transactions. Briefly explain one method to overcome the imbalanced data issue for classification.
Compared to the error on the training dataset, the classification error on the test set can be larger, smaller, or equal. When do we usually have it larger and when smaller?
Given a list of 12 measured temperature as 19, 71, 48, 63, 35, 85, 69, 81, 72, 88, 99, 95, partition them into 4 bins using these methods (1) equal-frequency, (2) equal-width, and (3) a better way such as clustering.

Sample Solution

The post Data mining appeared first on acestar tutors.

[ad_2]

Source link