Researchers Attempt to Use Facebook Data to Predict the Diagnosis of Schizophrenia and Mood Disorders

Using Facebook data, machine learning algorithms attempt to predict whether people will be diagnosed with schizophrenia and mood disorders.

Emaline Friedman, PhD

New research published in Nature Partner Journals, in partnership with the Schizophrenia International Research Society, reports on a machine learning classifier system applied to patient Facebook data that was able to differentiate Schizophrenia Spectrum Disorders (SSD) from Mood Disorders (MD) up to 18 months before patients’ first hospitalizations.

The researchers, led by Michael L. Birnbaum and Raquel Norel, suggest that Facebook data can be integrated with clinical information to inform clinical decision-making. However, their results show that the AI predictions may not significantly improve upon existing screening methods and critics have raised concerns over privacy and overdiagnosis.

Digital technologies are being developed and rolled out in the mental health field. In addition to supporting delivery of treatment, as in teletherapy, mental health apps, and medications with embedded digital sensors, such technologies are developed to aid clinical judgment. Among these are machine learning algorithms that use vocal and other biomarkers to diagnose ‘mental illness.’

Despite industry enthusiasm, critics have pointed out that algorithms often rely on incomplete data and can replicate and even exacerbate existing biases in healthcare.  Further ethical concerns have been raised over the lack of transparency concerning how these algorithms actually make decisions, the difficulty of communicating such decisions with confidence to patients, and the tendency to “pass the buck” to technologies to avoid liability, once they are put into use.

Meanwhile, mental health researchers are debating the ill effects of social media and questioning aspects of these technologies and the types of environment they create for users. What is clear, however, is that social media platforms generate a glut of highly personal data, which is exactly what machine learning algorithms need to improve the accuracy of their predictions. This data is generated across a wide swath of society, not just by those who are already displaying symptoms of mental distress. This means that Facebook and other corporate social media platforms are well-positioned to provide such data in service of prediction and early detection of ‘mental illness.’ For example, data from Facebook has been used to attempt the prediction of suicides since 2017, and monitoring like this has led to invasive incidents of police and crisis teams intervening on people without their informed consent.

This latest study assesses whether patient data can confirm prior identification of associations between social media activity, including private “messenger” communications, and psychiatric diagnoses, by differentiating individuals diagnosed with Schizophrenia Spectrum Disorders (SSD), Mood Disorders, and healthy volunteers (HV).

The researchers collected a total of 3,404,959 Facebook messages and 142,390 Facebook Images across 223 participants with a mean age of 23.7 years and near even split between genders and diagnoses (SSD (n = 79), MD (n = 74), and HV (n = 70)).

Their first objective was to evaluate whether it was possible to distinguish between SSD, MD, and HV based on Facebook data alone. A pairwise classification used aggregated data for 18 months in a standard cross-validation scheme representing each participant by a single-feature vector meant to indicate their state as an overall average of their data across the six trimesters.

The algorithms correctly classified participants with SSD from those with MD or HV with an accuracy of 52% (chance = 33%). Participants with MD were correctly classified with an accuracy of 57% (chance = 37%), and HVs were correctly classified with an accuracy of 56% (chance = 29%).

The researchers suggest that such machine-learning algorithms can identify those with SSD and MD using Facebook activity alone over a year in advance of the first psychiatric hospitalization.

Compared to HV, participants with SSD and MD demonstrated significant differences in the use of words related to “anger,” “swearing,” “negative emotions,” “sex,” and “perception.” Many linguistic differences existed before the individual’s first hospitalization, suggesting that certain linguistic features may represent a trait rather than a state marker of impending symptoms or that clinically meaningful changes manifest online before hospitalization.

Analyzing word choice on Facebook could potentially help clinicians identify people at high risk of SSD or MD before the emergence of clinically significant symptoms.

While age, sex, and race were not associated with linguistic differences in SSD or HV participants, men and women with MD were significantly more likely (P < 0.01) to vary in their use of numerals. Compared to HV, photos posted by SSD or MD were significantly smaller, and participants with MD posted photos with more blue and less yellow colors.

The researchers also assessed if signals identified in the first trimester (when psychiatric symptoms are the most prominent, resulting in hospitalization) are also present in the trimesters farther away from hospitalization.

Classifications were performed on models trained with data from the first trimester only. They tested using data from the other trimesters showed the increasing differentiation of several word categories closer to the date of hospitalization. The authors attribute this to changes in “anxiety, mood, preoccupations, perceptions, social functioning, and other domains known to accompany illness emergence.”

The increased use of biological process words (blood, pain) and words related to negative emotions increased closer to hospitalization between HV and MD. Both SSD and MD patients used more negations, anger-oriented language, and swear words compared to healthy volunteers closer to their hospitalization dates.

The authors argue that while Facebook data alone cannot yet be used to make diagnoses, the integration of social media communication data could help improve diagnostic accuracy, serve as a “low burden screening tool” for at-risk youth, and provide collateral information. However, the predictions are correct only slightly more than half of the time. While the results here in the 50% accuracy range, 70-80% has been proposed as an acceptable threshold for prediction rates.

Further caution about the implications of this study is warranted on the grounds that diagnostic screening tools have been found to significantly overestimate mood disorders compared to clinical interviews. It is not necessarily the case that adding a secondary screening tool in practice will improve detection, as it is often the case that differing diagnostic tools can be assessing different underlying constructs, leading to misdiagnosis and overdiagnosis. There is also considerable debate concerning risks and ethical issues inherent in additional monitoring of patients and at-risk individuals.




Birnbaum, M. L., Norel, R., Van Meter, A., Ali, A. F., Arenare, E., Eyigoz, E., Agurto, C., Germano, N., Kane, J. M., & Cecchi, G. A. (2020). Identifying signals associated with psychiatric illness utilizing language and images posted to Facebook. Npj Schizophrenia, 6(1), 38. (Link)


  1. What is the procedure for the “youth at risk”?
    10 percent of brain vanished and 25 years of life removed?

    This expression makes me laugh so hard.

    I was at risk when psychiatry destroyed my mind as early as 14.

    People with problems are more likely to curse? Really?
    Is there only one person in the world who can take this seriously?

  2. Putting aside that such as endeavour is grotesquely huxleyian for a moment.

    I’ve written some python code to download psychiatric research from and train a word2vec model on it. Even in psychiatric literature schizophrenia and mood disorders are not distinguishable entities. This is especially true for bipolar, which has a cosinus similiarity of ~0.6 to schizophrenia. That’s one of the highest similiarities found out of all diagnoses. Off the top of my head only anxiety and depression (~0.7) are more closely related.

    Even assuming psychiatric dianoses were valid (as in corresponding to a physical thing) the current terms used would be highly correlated and fail to describe different things. Birnbaum et al are trying to run before they can walk.

    Edit: test accuracy should always be given in two numbers. sensitivity and specificity. A specificity of 80% is in no way even remotely acceptable. It would mean that 20% of test subjects would be called schizophrenic, even tho they aren’t. For comparision: the current covid tests have a specificity (and sensitivity) of >99% and even they produce a lot of false positives, when covid is relatively rare in the base population.

    • And we can see how scientific the research is. It’s as scientific as the ruling of the church was. Psych is the biggest adherent to belief in a god that punishes evil doers, but they extended the punishments even against emotion. The punishments are words and chemicals. The “REFORM” happened already. “reform” was and is psych. At the moment.
      Something, someone has to rule no matter how wretched it is. And it is ALWAYS about personal gain, one form or another. Most shrinks never even think about “helping people” as a drive to pursue. It is about “hmm, I’m 18 and 22 and I have to get an education. What am I interested in”. I do think there are a few out there that are actually changing towards caring. They have enough character to not be pressured or bullied and enough evidence right in front of their eyes to understand the system itself is one “mentally ill”. Look at a group of shrinks and honestly, what the hell is that written on their faces, or their lips. That is some fundamentalist thought process. Look at what their “production” is. What are their results?

  3. Does this apply to politicians? Cops? Shrinks?
    Attention Mark Zuckerberg. You have an obligation
    to warn users of this. It is an invasion of privacy where
    the shrink deputies are acting as FBI agents.
    I hope Mark is aware of the potential abuse to unwitting people.
    OOPS, I hope no researchers are reading my comment.

  4. And the most important thing going on is the use of the word “Schizophrenia”
    and “disorders”
    Those words are for the public and business partners to make them fearful, biased
    and convince everyone that invasion of privacy is a good thing.
    It is invasion if stuff is collected to use against people to dehumanize them even more.

    I will email this to every young person I know, although a few of them are using facebook less and less
    anyway. It’s getting old.

    And as far as suicide, psych is the very last system that could ever prevent it. In fact they cause huge distress and disability for people. Hopefully the self support groups pop up more and more and perhaps resort to other forms of communication.

  5. And so what? If they actually had something worthwhile to offer, it might be worth some risk of offending people or risking some false positives to get them some information. But they offer hopelessness and dependence on life-threatening, soul-flattening drugs that may or may not even work to “reduce the symptoms” of a “disorder” (and may in fact bring about the very “disorder” they’re supposed to address) that there is a 50-50 chance they won’t even develop??? Are these people serious????