Data mining Assignment Paper

This week we also discuss the concepts in chapter seven, which deals with the basic concepts and algorithms of cluster analysis.  After reading chapter seven answer the following questions:

  1. What is K-means from a basic standpoint?
  2. What are the various types of clusters and why is the distinction important?
  3. What are the strengths and weaknesses of K-means?
  4. What is a cluster evaluation?
  5. Select at least two types of cluster evaluations, discuss the concepts of each method. Data mining Assignment Paper.

With the increased and widespread use of technologies, interest in data mining has increased rapidly. Companies are now utilized data mining techniques to exam their database looking for trends, relationships, and outcomes to enhance their overall operations and discover new patterns that may allow them to better serve their customers. Data mining provides numerous benefits to businesses, government, society as well as individual persons. However, like many technologies, there are negative things that caused by data mining such as invasion of privacy right. This paper tries to explore the advantages as well as the disadvantages of data mining. In addition, the ethical and global issues regarding the use of data mining
Before a data set can be mined, it first has to be ?cleaned?. This cleaning process removes errors, ensures consistency and takes missing values into account. Next, computer algorithms are used to ?mine? the clean data looking for unusual patterns. Finally, the patterns are interpreted to produce new knowledge.3

ORDER A PLAGIARISM-FREE PAPER NOW

How data mining can assist bankers in enhancing their businesses is illustrated in this example. Records include information such as age, sex, marital status, occupation, number of children, and etc. of the bank?s customers over the years are used in the mining process. First, an algorithm is used to identify characteristics that distinguish customers who took out a particular kind of loan from those who did not. Eventually, it develops ?rules? by which it can identify customers who are likely to be good candidates for such a loan. These rules are then used to identify such customers on the remainder of the database. Next, another algorithm is used to sort the database into cluster or groups of people with many similar attributes, with the hope that these might reveal interesting and unusual patterns. Finally, the patterns revealed by these clusters are then interpreted by the data miners, in collaboration with bank personnel.4

Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information – information that can be used to increase revenue, cuts costs, or both.
Increasingly, organizations are generating vast amounts of data as a result of running a variety of information systems. This data is normally used to record transactions and for status reporting purposes. Data mining Assignment Paper. What data mining does is use elements of statistics, artificial intelligence, machine learning and advance modeling techniques to predict future business trends and customer behavior patterns from large data warehouses and other form of data resources. This is accomplished by running commercial-off-the-shelf applications to convert vast amount of data into actionable, proactive and knowledge-driven decisions.
The two critical success factors for data mining are:
• a large well-integrated data warehouse
• clear understanding of the business process for the application of data mining
Data mining is primarily used today by companies with a strong consumer focus – retail, financial, communication, and marketing organizations. It enables these companies to determine relationships among “internal” factors such as price, product positioning, or staff skills, and “external” factors such as economic indicators, competition, and customer demographics. And, it enables them to determine the impact on sales, customer satisfaction, and corporate profits. Finally, it enables them to “drill down” into summary information to view detail transactional data.
With data mining, a retailer could use point-of-sale records of customer purchases to send targeted promotions based on an individual’s purchase history. By mining demographic data from comment or warranty cards, the retailer could develop products and promotions to appeal to specific customer segments.

In recent times, the relatively new discipline of data mining has been a subject of widely published debate in mainstream forums and academic discourses, not only due to the fact that it forms a critical constituent in the more general process of Knowledge Discovery in Databases (KDD), but also due to the increased realization that this discipline can be applied in a number of areas to enhance decision making processes, efficiency, and competitiveness in contemporary organizations (Kusiak, 2006). Data mining Assignment Paper.

The basic concept behind the emergence of data mining, and which has contributed immensely to its admissibility as one of the increasingly used strategies in business establishments as well as scientific and research undertakings, is that by automatically sifting through large volumes of information which may primarily appear irrelevant, it should be possible for interested parties to extract nuggets of useful knowledge which can then be used to drive their agenda forward (Adams, 2010).

Goth (2010) observes that the emergence of data mining has been primarily informed by the rapid growth in data warehouses as well as the recognition that this heap of operational data can be potentially exploited as an extension of both business and scientific intelligence.

The present paper seeks to critically discuss the discipline of data mining with a view to illuminate knowledge about its origins, concepts, applications, and the legal and ethical issues involved in this particular field.

Definition & History of Data Mining

Although data mining as a concept has been defined differentially in diverse mediums, this report will adopt the simple definition given by Payne & Trumbach (2009), that “…data mining is the set of activities used to find new, hidden or unexpected patterns in data” (p. 241-242).

The purpose of data mining, as observed by these authors, is to extract information that would not be readily established by searching databases of raw data alone. Through data mining, organizations are now able to combine data from incongruent sources, both internal and external, from across a multiplicity of platforms with a view to assist in a variety of business applications. Data mining Assignment Paper.

At its most elemental state, data mining utilizes proved procedures, including modeling techniques, statistical investigation, machine learning, and database technology, among others, to seek prototypes of data and fine relationships in the sifted data with the main objective of deducing rules and intricate relationships that will inarguably permit the extrapolation of future outcomes (Pain & Trumbach, 2009; Adams, 2010).

Researchers and practitioners are in agreement that the capability of both generating and collecting data from a wide variety of sources has greatly impacted the growth trajectories of data mining as a discipline.

This capability, according to Adams (2010) and Chen (2006), was precipitated by a number of variables, which can be categorized into the following:

  1. increased computerization of business, scientific, and government transactions with the view to increase efficiency and productivity,
  2. extensive usage of electronic cameras, scanners, publication devices, and internationally recognized bar codes for most business-related products,
  3. advances in data gathering instruments ranging from scanned documents and image platforms to global positioning and remote sensing systems,
  4. the development and popularization of the World Wide Web and the internet as widely accepted global information systems.

This explosive growth in stored or ephemeral data brought us to the information age, which was, and continues to be, characterized by an imperative need to develop new techniques, procedures and automated tools that can astutely assist us in transforming and making sense of the huge quantities of data collected via the above stated protocols (Goth, 2010).

ORDER A PLAGIARISM-FREE PAPER NOW

To dig a bit deeper into the history of data mining, research has been able to establish that the term ‘data mining’, which was introduced in the 1990s, has its origins in three interrelated family lines. It is important to note that the convergence of these family lines to develop a unique discipline in the context of data mining certainly gives it its scientific foundation (Adams, 2010).

This notwithstanding, extant research (Adams, 2010; Chez, 2006) demonstrate that the longest of these family lines to be credited with the gradual development of data mining as a fully-fledged discipline is known as classical statistics. Data mining Assignment Paper.

Researchers are in agreement that it would not have been possible to develop the field of data mining in the absence of statistics as the latter provides the foundation of most technologies on which the former is built, such as “regression analysis, standard distribution, standard deviation, standard variance, discriminant analysis, and confidence intervals” (Goth, 2010, p. 14).

All these concepts, according to this author, are used to study data and data relationships – central aspects in any data mining exercise.

The second longest family line that has contributed immensely to the emergence of data mining as a fully-fledged field is known as artificial intelligence, or simply AI. Extant research demonstrate that the AI discipline, which is developed upon heuristics as opposed to statistics, endeavors to apply human-thought-like processing to statistical challenges while using computer processing power as the appropriate medium (Talia & Trunfio, 2010).

It is important to mention that since this approach was tied to the availability of computers and supercomputers to undertake the heuristics, it was not practical until the early 1980s, when computers started trickling into the market at reasonable prices (Goth, 2010).

The third family line to have influenced the field of data mining is what is generally known as machine learning or, better still, the amalgamation of statistics and AI (Adams, 2010). Here, it is of importance to note that while AI could not have been viewed as a commercial success during the formative years, its techniques and strategies were largely co-opted by machine learning.

It is also important to note that machine learning, while able to take the full benefit of the ever-improving price/performance quotients provided by computers in the decades of the 1980s and 1990s, found usage in more applications because the entry price was lower that that of AI, not mentioning that it was largely considered as an evolved facet of AI as it was effectively able to blend AI heuristics with complex statistical analysis (Chen, 2006). Data mining Assignment Paper.