Can Small Language Models Generate Therapist-Like Responses? A Lightweight Study of Therapist Imitation in Mental Health Support
Abstract
Therapist-like response generation is increasingly discussed in digital mental health, yet most studies either focus on large pretrained systems or show illustrative outputs without a full lightweight benchmark. This paper asks whether small, non-pretrained language models can imitate therapist-style discourse in a reproducible setting. We used EmpatheticDialogues, an empathy-oriented dialogue corpus of roughly 25,000 conversations (Rashkin et al., 2019). Its widely used utterance-level split is 76,673 training records, 12,030 validation records, and 10,943 test records; the parsed conversation release used here contained 19,532/2,769/2,546 dialogues and yielded 40,252/5,736/5,257 listener-turn targets after we restricted supervision to supportive listener responses. We evaluated six lightweight systems on the full validation and test sets: an emotion template, TF-IDF retrieval, retrieval with micro-skill bias, an emotion-conditioned bigram language model, an emotion-conditioned trigram language model, and a trigram model with therapist-style biasing. All reported numbers are measured empirical results. The best overall system, Emotion-TrigramLM+Bias, achieved BLEU-4 of 0.0191 on validation and 0.0183 on test, ROUGE-L of 0.1652/0.1633, and therapist imitation score (TIS) of 0.6500/0.6487. Retrieval remained the most diverse model, reaching test Distinct-2 of 0.2551, but its therapist-style density was low at TIS = 0.2005. Adding therapist micro-skill bias improved retrieval by +0.0042 BLEU-4 and +0.3603 TIS on the test set, and improved the trigram model by +0.0059 BLEU-4 and +0.3056 TIS. Performance was strongest on negative-emotion turns, where acknowledgments and follow-up questions aligned closely with the references. The findings show that very small models can imitate the surface form of therapeutic language surprisingly well, but they do so mainly by compressing support into generic scripts. Lightweight therapist imitation is therefore feasible for low-risk acknowledgment support, but it is not a replacement for licensed mental health care.
Downloads
References
Davis, M. H. (1983). Measuring individual differences in empathy: Evidence for a multidimensional approach. Journal of Personality and Social Psychology, 44(1), 113-126. https://doi.org/10.1037/0022-3514.44.1.113
Decety, J., & Jackson, P. L. (2004). The functional architecture of human empathy. Behavioral and Cognitive Neuroscience Reviews, 3(2), 71-100. https://doi.org/10.1177/1534582304267187
Dinan, E., Roller, S., Shuster, K., Fan, A., Auli, M., & Weston, J. (2019). Wizard of Wikipedia: Knowledge-powered conversational agents. In Proceedings of the 7th International Conference on Learning Representations.
Elliott, R., Bohart, A. C., Watson, J. C., & Greenberg, L. S. (2011). Empathy. Psychotherapy, 48(1), 43-49. https://doi.org/10.1037/a0022187
Fitzpatrick, K. K., Darcy, A., & Vierhile, M. (2017). Delivering cognitive behavior therapy to young adults with symptoms of depression and anxiety using a fully automated conversational agent (Woebot): A randomized controlled trial. JMIR Mental Health, 4(2), e19. https://doi.org/10.2196/mental.7785
Fulmer, R., Joerin, A., Gentile, B., Lakerink, L., & Rauws, M. (2018). Using psychological artificial intelligence (Tess) to relieve symptoms of depression and anxiety: Randomized controlled trial. JMIR Mental Health, 5(4), e64. https://doi.org/10.2196/mental.9782
Hill, C. E. (2014). Helping skills: Facilitating exploration, insight, and action (4th ed.). American Psychological Association. https://doi.org/10.1037/14345-000
Hojat, M., Mangione, S., Nasca, T. J., Cohen, M. J., Gonnella, J. S., Erdmann, J. B., Veloski, J., & Magee, M. (2001). The Jefferson Scale of Physician Empathy: Development and preliminary psychometric data. Educational and Psychological Measurement, 61(2), 349-365. https://doi.org/10.1177/00131640121971158
Inkster, B., Sarda, S., & Subramanian, V. (2018). An empathy-driven, conversational artificial intelligence agent (Wysa) for digital mental well-being: Real-world data evaluation. JMIR mHealth and uHealth, 6(11), e12106. https://doi.org/10.2196/12106
Li, J., Monroe, W., Ritter, A., Galley, M., Gao, J., & Jurafsky, D. (2016). A diversity-promoting objective function for neural conversation models. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 110-119). https://doi.org/10.18653/v1/N16-1014
Lin, C.-Y. (2004). ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop (pp. 74-81).
Majumder, N., Hong, P., Peng, S., Lu, J., Ghosal, D., Gelbukh, A., Mihalcea, R., & Poria, S. (2020). MIME: MIMicking emotions for empathetic response generation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. https://doi.org/10.18653/v1/2020.emnlp-main.721
Miller, A. H., Feng, W., Fisch, A., Lu, J., Batra, D., Bordes, A., Parikh, D., & Weston, J. (2017). ParlAI: A dialog research software platform. arXiv preprint arXiv:1705.06476. https://doi.org/10.18653/v1/D17-2014
Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (pp. 311-318). https://doi.org/10.3115/1073083.1073135
Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (pp. 1532-1543). https://doi.org/10.3115/v1/D14-1162
Rashkin, H., Smith, E. M., Li, M., & Boureau, Y.-L. (2019). Towards empathetic open-domain conversation models: A new benchmark and dataset. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 5370-5381). https://doi.org/10.18653/v1/P19-1534
Rogers, C. R. (1957). The necessary and sufficient conditions of therapeutic personality change. Journal of Consulting Psychology, 21(2), 95-103. https://doi.org/10.1037/h0045357
Shum, H.-Y., He, X.-D., & Li, D. (2018). From Eliza to XiaoIce: Challenges and opportunities with social chatbots. Frontiers of Information Technology & Electronic Engineering, 19(1), 10-26. https://doi.org/10.1631/FITEE.1700826
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27 (pp. 3104-3112).
Vaidyam, A. N., Wisniewski, H., Halamka, J. D., Kashavan, M. S., & Torous, J. B. (2019). Chatbots and conversational agents in mental health: A review of the psychiatric landscape. Canadian Journal of Psychiatry, 64(7), 456-464. https://doi.org/10.1177/0706743719828977
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems 30 (pp. 5998-6008).
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., et al. (2020). Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (pp. 38-45). https://doi.org/10.18653/v1/2020.emnlp-demos.6
Zhang, S., Dinan, E., Urbanek, J., Szlam, A., Kiela, D., & Weston, J. (2018). Personalizing dialogue agents: I have a dog; do you have pets too?. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (pp. 2204-2213). https://doi.org/10.18653/v1/P18-1205
Copyright (c) 2026 Yifan Zhang, Zhongwen Zhou

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
1) Authors retain copyright and grant the journal the right to first publication, with the work simultaneously licensed under the Creative Commons Attribution that allows the sharing of articles published with the acknowledgement of authorship and the initial publication in this magazine.
2) The authors are authorized to make additional contracts separately for distribution of the version of the work published in this journal (for example, publication in an institutional repository or as a chapter of the book), as long as there is recognition of authorship and initial publication in this journal.
3) Authors are authorized and encouraged to publish and distribute their work online (for example, in institutional repositories or on their personal pages) at any time before or during the editorial process, as it increases the impact and reference of the published work.





