Linguistic analysis of social media posts has been used to detect signs of depression in individuals. However, a recent study found that the linguistic features linked to depression are absent in social media posts made by Black individuals. This raises important questions about how demographic factors impact the measurement of mental health using social media data. Computer scientists like Munmun De Choudhury are concerned about the implications of these findings on public health implications, as machine learning programs that predict depression based on language markers may not be effective for predicting depression in a diverse population.
In a study by researchers at the University of Pennsylvania, 868 participants in the United States were recruited, with half identified as Black and half as white, and matched by age and gender. Participants completed a depression survey and allowed access to their Facebook posts, which were analyzed using a text analysis program. The results showed that the use of first-person singular pronouns increased with depression scores, while the use of first-person plural pronouns was linked to lower depression scores. Words reflecting negative emotions were also linked to higher depression scores, consistent with previous research.
However, when the researchers analyzed the data based on race, they found that the text analysis program performed well in predicting depression in white participants but failed to predict depression in Black participants. Even when the program was trained on just Black participants’ social media posts, it still could not identify any linguistic patterns associated with depression. This discrepancy suggests that the signs of depression in Black individuals may not be linked to communication in the same way as in white individuals.
The reasons why the program struggled to predict depression in Black individuals are not clear. It is possible that the signs of depression in Black individuals are not communicated through written language or that other nonverbal cues play a more significant role. Alternatively, the public nature of social media may discourage Black individuals from expressing their feelings openly. It is also possible that depression does not have universal linguistic features, which would call into question the accuracy of machine learning programs that rely on standardized measures of depression.
Further research is needed to determine if social media is an appropriate platform for studying depression in diverse populations or if other factors need to be considered. Understanding how depression manifests in different populations is crucial to developing effective tools for detecting and managing mental health issues. The findings of this study highlight the importance of considering demographic factors when using language analysis to predict mental health outcomes.