Business Intelligence and Big Data
Problem-1: LO-2
The main data mining techniques include classification, clustering, regression, association rules and outer detection. The classification technique involves the retrieval of important and relevant information about data which is important in classifying data in different classes. Clustering technique involving the identification of similar data by analyzing the differences and similarities of data. Regression is the technique that identifies and analyzes the relationship of variables by finding the likelihood of a specific variable over other variables. Association rule is a data mining technique that focuses on the finding associated attributes and connection between the data items. The technique strives to identify hidden patterns in the data set. Outer detection is a technique that involves the observation of data items in the data to identify features of data that are not matching the expected pattern and behavior of all other data. Sequential patterns is a data mining technique that works to identify trends and repetitive patterns on transactional data.
The fundamental differences between the data mining techniques include the treatment and strategy of data analysis in the process of data mining. Data mining techniques can be differentiated into supervised and unsupervised. The supervised data mining techniques have a predefined target variable. A data mining technique such as Regression utilizes a target variable which is Dichotomous and Multinomial. The unsupervised techniques involve clustering and association approach which utilizes informed data to generate groups. The association technique identifies relationships between different items in a set of data. Classification data mining technique classify data through labelling class while clustering has no class labels on the data.
Problem-2: LO-2 and LO-3
The RFID tags involves capture of radio waves which are translated into electronic data. RFID applications makes the collection of information about assets automatic. This increases the speed and accuracy as well as reduces the costs of the information collection process. This process of data collection eliminates the need in filling of forms. The RFID tag technology increases the speed of the identification of products and performs better than the barcode scanning technology.
New data applications include Lie detection, intrusion and fraud detection, future healthcare and market basket analysis. Fraud detection is a new data mining application which is enhanced by RFID technology. Data mining processes use RFID tags to collect required information which is useful in the provision of useful patterns and protect the information from all users. The RFID technology is key in the collection of sample records after which the classification data mining technique is used to give the fraudulent and non-fraudulent record classes. An intrusion is an action that compromises integrity and confidentiality of resources within an organization. Intrusion detection is a new application of data mining which integrates the RFID technology. Data mining is important in the analysis of data information by distinguishing an activity from the common activities. The data mining application is involved in the extraction of data that is relevant to the issue.
Market basket analysis applies data mining in the identification of what a customer is more likely to buy after the analysis of the group of items they have already bought. Its highlights the purchase behavior of a buyer. In future healthcare, the combination of data mining and RFID technology will improve care and reduced costs through data and analytics to predict the volume of patients in different categories. It is also helping the healthcare insurance in the detection of fraud and abuse. RFID technology have been suggested to be implanted in humans to effectively monitor and watch over their health and other factors. Issues the implantation of RFID tags in human includes the relevance and social acceptability of the use of RFID chipping as well as the lack of enhancements and innovation in the technology. Most people are concerned with their privacy as the chips can give actual GPS locations and can be used in activities such as abduction and child trafficking.
Problem-3: LO-1 and LO-3
Transamerica and Dell implementation of Hadoop are among the success stories of Hadoop implementation practices. In the two cases studies, the organizations were enhanced with 360-degree view of the consumers which improves the organization processes of planning, analytics and delivery to customers. the difference in the case studies, is that the Dell company is a provider of IT infrastructure and equipment in service of consumers and enterprises while on the other hand, Transamerica is financial services company that provides insurance, investment and savings solutions to the customers. The challenge of Transamerica was the presence of multiple business lines in different locations while Dell required Big data storage and real-time analytics which they did not have the capacity to process and store the multi-structured data of the organization.
Problem-4: LO-1 and LO-3
Stream analytics has great demands for performance and memory. The performance of stream analytics can be improved through the integration of the sampling technique on approximate computation and the randomized response relevant for privacy-preserving analytics. For data stream analytic system, data input is received continuously to trigger processing and update of the analytic results. Real-time constraints affect the completeness of the processing process of any data unit within the given time. Cloud infrastructure enhances the ability of the stream data analytics system to handle fast and large amounts of data.
References
Beck, M., Bhatotia, P., Chen, R., Fetzer, C., & Strufe, T. (2017). PrivApprox: privacy-preserving stream analytics. In 2017 {USENIX} Annual Technical Conference ({USENIX}{ATC} 17) (pp. 659-672).
Dell | Customer success | Cloudera. (2019, November 27). Cloudera. Retrieved from https://www.cloudera.com/about/customers/dell.html
Fu, T. Z., Ding, J., Ma, R. T., Winslett, M., Yang, Y., & Zhang, Z. (2017). DRS: Auto-scaling for real-time stream analytics. IEEE/ACM Transactions on Networking, 25(6), 3338-3352.
Jain, A., Hautier, G., Ong, S. P., & Persson, K. (2016). New opportunities for materials informatics: Resources and data mining techniques for uncovering hidden relationships. Journal of Materials Research, 31(8), 977-994.
Quoc, D. L., Chen, R., Bhatotia, P., Fetzer, C., Hilt, V., & Strufe, T. (2017, December). StreamApprox: approximate computing for stream analytics. In Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference (pp. 185-197).
Transamerica | Customer success | Cloudera. (2018, November 28). Cloudera. Retrieved from https://www.cloudera.com/about/customers/transamerica.html
Ye, Y., Li, T., Adjeroh, D., & Iyengar, S. S. (2017). A survey on malware detection using data mining techniques. ACM Computing Surveys (CSUR), 50(3), 1-40.