Coding in content analysis is costly and fragile. Scholars use quality control indexes, crowd-sourcing, and specific algorithms to trade-off costs and quality. Instead, the forth version of generative pre-trained transformer (GPT-4) outperforms its previous versions in various tasks (OpenAI, 2023a) at a reasonable cost. We intend to investigate whether such large language models (LLMs) can replace prior methods. Results show that, GPT-4 with prompt-training (PT) outperforms human-beings and a dedicated model in binary classification of emotion in Chinese. The model yields higher accuracy and f1-score than artificial coding and SKEP (Tian et al., 2020), a pre-trained model without downstream fine-tuning. Additionally, GPT-series models remain competitive inter-coder reliability, and GPT-4 has strong inter-hyper-parameter reliability. Thus, they are accurate, reliable, and economical.