Tuesday, August 20, 2019

The Heart Disease Prediction System Computer Science Essay

The Heart Disease Prediction System Computer Science Essay There are enormous amount of data available from medical industry which could be useful for medical practitioners when it is used for discovering hidden pattern with help of existing data mining techniques. The basic medical records from a patients profile can be useful in identifying hidden pattern with data mining techniques. In this paper, NaÃÆ'Â ¯ve Bayes algorithm to predict heart disease is implemented with basic records of patients like age, sex, heart rate, blood pressure etc., from a sample dataset. The benefits, limitations, and technical details of this implementation will also be discussed in this paper. 1 Introduction Over these years in medical history, many types of medical problems have been identified and many data are available regarding a particular problem. But not all the medical data are same, but there are many patterns hidden inside those data which needs to be identified. Data mining techniques could help identify these hidden patterns by knowledge discovery. In the medical field, patients health issues are predicted by doctors intuition or experience [2] where the knowledge rich data is suppressed which results in high medical expenses and unnecessary medical tests. In recent years, there are many researches being conducted in order to find the hidden pattern from basic medical data [1]. Identifying these hidden pattern would result in a developing an efficient decision making system in medical industry which aide as a tool to support doctors decision making or at least serve as a prediction system for any medical issues. In this paper, we have taken into consideration of heart disease and predict it using the set of data that are already in existence with the help of data mining technique. The algorithm that we have chosen is the NaÃÆ'Â ¯ve Bayes algorithm, this algorithm is ideal for a vast amount of database that may contain hundreds and thousands of rows and columns. The NaÃÆ'Â ¯ve Bayes algorithm provides the intended output faster and more accurate as the number of data in the database increase. 1.1 Problem Scenario There are only few decision support systems available in medical industry whose functionalities are very limited. As mentioned earlier, medical decisions are made with doctors intuition and not from the rich data from the medical database. Wrong treatment due to misdiagnosis causes serious threat in medical field. In order to solve these issues data mining solution was with help of medical databases was introduced. 1.2 Related Work There are many techniques available to discover knowledge from medical database [1]. Researchers at Southern California used data mining technique to discover the success and failure of back surgery in order to improve medical treatment [3]. Shouman et al [4] implemented predictive data mining to diagnose heart disease of patients. Palaniappan et al [2] developed a prototype Intelligent Heart Disease Prediction System (IHDPS), using data mining techniques. 1.3 Objective In this paper, NaÃÆ'Â ¯ve Bayes algorithm to predict heart disease is implemented with basic records of patients like age, sex, heart rate, blood pressure etc., from a sample dataset. Based on the literature survey NaÃÆ'Â ¯ve Bayes algorithm was found to be an effective technique. The probabilistic method helped in finding the converse probability of the conditional relationship. The dependence relation may exist between two attributes of data set which can be determined with this algorithm. 2 Data Preparation In order to implement the algorithm, a medical data was required. The sample dataset used for the purpose of implementation of algorithm was obtained from Cleveland Clinic Foundation. The sample of dataset is shown in the below figure (Figure1.) C:UsersMadan KumarDesktopUntitled2.jpg Figure1. Sample dataset 2.1 Dataset Source The Cleveland institute medical data was downloaded from website of University of California, Irvine. 2.2 Dataset Attributes The dataset consists of 16 attributes. The last attribute of dataset consists of value 0 and 1. The value 0 indicates that the patient does not have heart disease whereas 1 indicates that the patient has a heart disease. The prediction of algorithm can be verified with this value while evaluating the algorithm. The first 15 attributes are shown in the figure2. C:UsersMadan KumarDesktopattri.jpg Figure2. Dataset attributes 3 Program Architecture The program was implemented using JAVA. Apache TOMCAT server and MySQL Database is also used. The NaÃÆ'Â ¯ve Bayes algorithm has three class files: Calculation.java, Prediction.java, and Detection.java. Detection.java reads the data file from the source path and stores the attributes into temporary array list. The mean and standard deviation values calculations are performed and probability calculation is also done in Prediction.java. All the dataset attributes are defined in calculation.java where mean and standard deviation of attributes were calculated. The calculation.java calls the other two classes while executing the program. Figure3 represents the program architecture. C:UsersKirubanidhyDesktopArchitecture.jpg Figure3. Architecture 3.1 Building and running a Demo TOMCAT server is used to present the output in web based form. The output will run in localhost. The MySQL database is used to identify the patient records. At the execution point, the local host is accessed and 15 questions will be displayed which will be obtained from user and algorithm will be called to calculate and predict the disease possibility on that person. A report will be generated at the end of the demo which says if the person is predicted with heart disease or not. In general, 1. Obtains the values from user. 2. Reads the data file. 3. Calls the algorithm and calculates mean, deviation, and probability of attributes. 4. Generates a report displaying the values given with the prediction of disease. 4. Implementation All the attributes of dataset is of a numerical value that has some meaning. The meaning of dataset attributes are as shown in figure2. Example: the attributes sex is denoted with values 1 and 0 where 1 denotes Male and 0 denote Female. Fasting blood sugar values are also denoted using 1 and 0 where 1 denotes >120mg of fasting blood sugar level and 0 denotes These values from the data file are accessed by the NaÃÆ'Â ¯ve Bayes algorithm. The values 0 and 1 are extracted from data file and stored to an array list for each attribute e.g. age array list, sex array list, and chest pain type array list etc., in order to perform calculation. Here, the values are defined on what those values stands for before storing to the array list. The sample of the interface (for obtaining slope value) is shown in figure4. Here the un-sloping, flat, and down- sloping represents the value 1, 2, and 3 respectively. C:UsersMadan KumarDownloadsUntitled.jpg Figure 4. Interface Sample C:UsersMadan KumarDownloadsUntitled2.jpg Figure5. Sample of report format 5. Modules Description Analyzing the Data set The attribute Diagnosis was identified as the predictable attribute with value 1 for patients with heart disease and value 0 for patients with no heart disease. The attribute PatientID was used as the key; the rest are input attributes. It is assumed that problems such as missing data, inconsistent data, and duplicate data have all been resolved. Naives Bayes Implementation in Mining Bayes Theorem finds the probability of an event occurring given the probability of another event that has already occurred. If B represents the dependent event and A represents the prior event, Bayes theorem can be stated as follows. 5.2.1 Bayes Theorem Prob (B given A) = Prob (A and B)/Prob (A) To calculate the probability of B given A, the algorithm counts the number of cases where A and B occur together and divide it by the cases where A occurs alone. Applying NaÃÆ'Â ¯ve Bayes to data with numerical attributes, predict the class using NaÃÆ'Â ¯ve Bayes classification: Figure6 (a) Top Mean (b) Bottom Standard Deviation Figure6 (c) Laplace Transform 6. Evaluation User enters the values for the questionnaire to find out whether the patient has a heart disease or not. By feeding sample data from the dataset and performing the mining operations with the NaÃÆ'Â ¯ve Bayes algorithm, it is found out that the NaÃÆ'Â ¯ve Bayes algorithm gives 95% probability in predicting if patient have heart disease or not. 95% accuracy is quite good to use as a decision support system. The figure shows the accuracy of NaÃÆ'Â ¯ve Bayes algorithm (figure7). The figure shows the highest probability of correct predictions and lowest probability of incorrect predictions. C:UsersMadan KumarDesktopUntitled1.jpg Figure7. Model Results of three algorithms [2] 7. Limitations Apart from the benefits like probabilistic approaches and fast reliable algorithm of NaÃÆ'Â ¯ve Bayes, the serious shortcoming of the algorithm is its ability in handling small datasets. NaÃÆ'Â ¯ve Bayes classifier requires relatively large dataset to obtain best results. Yet, studies showed that Naive Bayes algorithm outperforms other algorithms in accuracy and efficiency. Notable limitation of this paper is the usage of small dataset. This dataset can be used for training or testing purpose only. Also the dataset could include more attributes for a more effective prediction in supporting clinical decisions. 8. Future Work The algorithm is working well with this sample dataset. Implementing the algorithm with large dataset could give better results which can aid as a supporting tool in making medical decisions. In future, other possible algorithms could be implemented where efficiency of all algorithms could be analyzed to decide on best suitable technique in terms of speed, reliability, and accuracy. 9. Conclusion In this paper, NaÃÆ'Â ¯ve Bayes algorithm is the only algorithm used for calculation of attributes and prediction. Efficiency and accuracy of the algorithm in predicting were discussed. Designing effective models are constrained by size of the datasets and noisy, incorrect, missing data values. The prototype developed so far has been generally tested by computer experts and not by the doctors. For effective understanding of the health issues, medical experts have to work collaboratively and test the prototypes in order to implement the system in real life to support medical experts in taking clinical decisions.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.