Accession Number : AD0769560

Title :   RAND Corporation Data in Systran. Volume 2.

Descriptive Note : Final rept. 1 Feb 72-1 May 73,

Corporate Author : LATSEC INC LA JOLLA CALIF

Personal Author(s) : Toma,Peter P. ; Kozlik,Ludek A.

Report Date : AUG 1973

Pagination or Media Count : 400

Abstract : NTS SOME EMPIRICAL LINGUISTIC FINDINGS BASED ON A MILLION-WORD Russian corpus with syntactic annotations. The corpus, consisting of Russian mathematics, physics, cybernetics, astrobotany and physiology, has been produced by the Rand Corp., Santa Monica, California and converted for use by SYSTRAN language-analysis processing procedures. Since all syntagmas are explicitly marked in the Rand data base, little or no contextual reference is necessary in order to establish semosyntactic relationships that may be utilized as the most essential components of an automatic parser for S+T text. Volume II deals with text statistics, the bulk of which is high-frequency wordlists in descending frequency order as well as alphabetical order for both individual and combined subject matters. (Modified author abstract)

Descriptors :   (*Machine translation, *Russian language), Computational linguistics, English language, Syntax, Computer programming

Subject Categories : Linguistics

Distribution Statement : APPROVED FOR PUBLIC RELEASE