Deep  learning to predict image QC label

Finally, a deep learning model was trained on the brain slices to predict the XGBoost probability score. All brain slices were resized to 256 by 256 pixels and converted to 3 color channels (RGB) to be compatible with the VGG16 input layer. The data was split into 80%-10%-10% training-validation-test sets. The data was split such that all slices belonging to the same subject were grouped together, so that any individual subject could be only in either training, validation or test. We loaded the VGG16 network that was pretrained with ImageNet weights  \cite{simonyan2014very}  implemented in Keras  \cite{chollet2015keras}, removed the top layer, and ran inference on all the data. The output of the VGG16 inference was then used to train a small sequential neural network consisting of a dense layer with 256 nodes and a rectified linear unit activation function (ReLu), followed by a dropout layer set to drop 50% of the weights to prevent overfitting, and finally a single node output layer with sigmoid activation. The training of the final layer was run for 50 epochs and the best model on the validation set across the 50 epochs was saved. We ran this model 10 separate times, each time with a different random initialization seed, in order to measure the variability of our ROC AUC on the test set. 

Training the MRIQC model

MRIQC was run on all images in the HBN dataset. Rather than using the previously trained MRIQC classifier from Esteban and colleagues \cite{Esteban2017}, the extracted QC features were used to train another XGBoost classifier to predict gold-standard labels. Two thirds of the data was used to train the model, where a 2-fold cross-validation was used to optimize hyper parameters: learning rate = 0.001, 0.01, 0.1, number of estimators = 200, 600, and maximum depth = 2,6,8. An ROC analysis was run, and the computed area under the curve was 0.99. 

Gray matter volume vs age during development

Finally, to explore the relationship between gray matter volume and age over development as a function of QC threshold, gray matter volume was computed from running the Mindboggle software  \cite{klein2017mindboggling} on the entire dataset. Mindboggle combines the image segmentation output from Freesurfer \cite{fischl2012freesurfer} with that of ANTS \cite{Avants_2011} to improve the accuracy of segmentation, labeling and volume shape features. Extremely low quality scans did not make it through the entire Mindboggle pipeline, and as a result the dataset size was reduced to 629 for this part of the analysis. The final QC score for the brain volumes was computed by taking the average of the predicted braindr rating from the deep learning model for all five slices. We ran an ordinary least squares (OLS) model on gray matter volume versus age on the data with and without QC thresholding, where the QC threshold was set at 0.7. Figure \ref{182176} shows the result of this analysis, which showed an effect size that nearly doubled and replicated previous findings when QC was performed on the data. 

Acknowledgements

This research was supported through a grant from the Gordon and Betty Moore Foundation and the Alfred P. Sloan Foundation to the University of Washington eScience Institute. A.K is also supported through a fellowship from the eScience Institute and the University of Washington Institute for Neuroengineering. We thank the NVIDIA corporation for supporting our research through their GPU seed grant. We’d like to acknowledge the following people for fruitful discussions and contributions to the project. Dylan Nielson, Satra Ghosh and Dave Kennedy for the inspiration for braindr. Greg Kiar, for contributing badges to the braindr application, and naming it. Chris Markiewicz, for discussions on application performance, and for application testing in the early stages. Katie Bottenhorn, Dave Kennedy, and Amanda Easson for quality controlling the gold-standard dataset. Jamie Hanson, for sharing the MRIQC metrics. Chris Madan, for application testing and for discussions regarding QC standards. Arno Klein and Lei Ai, for providing us the segmented images from the HBN dataset. Tal Yarkoni and Alejandro de la Vega, for organizing a “code rodeo” for neuroimagers in Austin, TX, where the idea for braindr was born. Finally, we’d like to thank all the citizen scientists who swiped on braindr - we are very grateful for your contributions!

Code and Data Availability

The code for the braindr application can be found at https://doi.org/10.5281/zenodo.1208140. The brain slice data and model weights are hosted at  https://osf.io/j5d4y/ . The code for the analysis for this project, including all figures and the source code for the interactive version of this manuscript, can be found at https://github.com/akeshavan/braindr-results (including the Jupyter notebook for the full analysis at https://github.com/akeshavan/braindr-results/blob/master/notebooks/braindr-full-v0.3.ipynb) and https://github.com/akeshavan/braindr-analysis (which also has the original Mindcontrol quality ratings at https://raw.githubusercontent.com/akeshavan/braindr-analysis/master/braindrAnalysis/data/mindcontrol-feb-21-18_anon.json).