Thesis Advisor: 

Assigned Student: 

Kiran Kumar Bonu

Thesis Status: 



Zika virus is declared as the Public Health Emergency of International Concern by the World Health Organization (WHO), on February 1st, 2016, moreover millions of Brazilian citizens were infected with this disease, as this disease is currently an outbreak in Brazil. In addition to that, this virus is also causing microcephaly in the newborn babies. Very little is known about this disease. Hence, this thesis had created two innovative artifacts, Information Infrastructure as an artifact and Predictive model as an artifact to answer the seriousness of the Zika virus in the U.S.A.
However, this thesis paper studied the previous arbovirus data especially the West Nile data available in different data sources to create an innovative information infrastructure. Furthermore, by leveraging this innovative information infrastructure architecture different predictive models were built that predicted the seriousness of the Zika virus in the U.S.A. The predictive models are innovative, because these models employed novel parameters like network variables of the United States human travel patterns like betweenness centrality, closeness centrality and Eigenvector centrality. Moreover, the network variables are created from the air travel passenger data sets by using an open source software called Gephi.
Depending upon the target variable type of the final data set, two types of problem types are formed, they are classification problem type and continuous problem type. The classification problem type employed decision trees and Naïve Bayes algorithm to predict the Zika disease class labels furthermore, classification problem type used prediction accuracy, precision, and recall are the measurements to select the final model. Whereas, the continuous problem type had employed Artificial Neural Networks to predict the exact Zika disease case counts furthermore, continuous problem type used mean absolute percent error and mean absolute deviation are the measurements to select the final predictive model. This thesis selected predictive model created from the Artificial Neural Networks as the final model because this model predicted the exact disease case counts with minimal prediction error rate.
Subsequently, the big data technologies are used to extract the tweets related to the Zika virus from the social media website especially Twitter. This thesis furthermore concluded that the predictive model output and the twitter output are partially correlated. Moreover, it is also found that more tweets are coming from those states which are predicted to be at the highest risk in registering more number of Zika disease case counts in the U.S.A. These findings can help governments and health care organizations to plan the resource allocation in the high risk states rather than concentrating all over the country, that can save time and billions of dollars, and also warn the passengers travelling between the high risk states within the U.S.A.