Chinese texts are character-based, not word-based, and there is no boundary mark between words in Chinese sentences. Each Chinese character stands for one phonological syllable and, in most cases, represents a morpheme. This raises a problem because, in Chinese, less than 10% of the word types (and less than 50% of the tokens in a text) are composed of a single character. In most Chinese IR tasks, identifying keywords is difficult because of segmentation ambiguities and the occurrence of unknown words. As a result, a great deal of research has focused on extracting words from raw Chinese texts (i.e., sentences without text segmentation). In this dissertation, we have proposed two different approaches to deal with Chinese natural language processing problems: (1) Term Contributed Frequency for Chinese Word Extraction We introduce a statistical suffix array-based Chinese term extraction approach that calculates the term contributed frequency (TCF) without a dictionary. We use an external data structure called the TCF-Node to store two kinds of term frequency, which can be used to solve the N-gram frequency distortion problem. The proposed term contributed frequency-based approach is a novel attempt to extract Chinese terms automatically and effectively. In addition to handle text corpora dynamically, our approach does not impose any strict requirements on the size and quality of the training corpora. (2) Alignment-Based Surface Patterns for Chinese Factoid Question Answering Systems Traditional information retrieval (IR) uses keywords or implicit rules, such as latent semantic indexing, to index a text. However, humans recognize a text through semantic information. Therefore, we propose an alignment-based surface pattern approach, called ABSP, which integrates semantic information into syntactic patterns. ABSP employs a new strategy to extract surface patterns from non-segmented passages. It uses the surface patterns to extract important terms from questions, and then constructs the terms’ relations from sentences in the corpus. Finally, the relations are used to rank answer candidates. We incorporate the approach into Chinese question answering (QA) to verify the possibility of ABSP in Chinese. Our experiments show that ABSP improves the answer accuracy in existing cross-lingual QA system that has high coverage. We believe the approach is robust and portable to other domains.