Skip to content

Language

Language is the interface in Sora Editor to provide language-specific functionality, including syntax analysis, auto-completion and auto-indent.

Single Language instance should serve for only one editor. And it is automatically destroyed when the editor is released or a new Language instance is set.

You can use CodeEditor#setEditorLanguage to apply a new Language to it. By default, the editor uses built-in EmptyLanguage and no analysis is performed. Thus, syntax-highlight and other language features are unavailable.

We provide some universal language implementation for you to setup the analysis and syntax-highlight for a programming language. Note that language-java module is only for simple token-based Java syntax-highlight.

Use Language Modules

Before using the language module, make sure you have imported it into your project.

language-textmate

This module uses TextMate grammars to help tokenize text and highlight for various programming languages. TextMate is also used in Visual Studio Code and Eclipse for syntax-highlight. Most library integrators will in favour of using this module instead of writing Language implementation themselves.

Follow the steps below to use TextMate for your editor.

Find Language Syntax and Config

TextMate supports various languages, and syntax-highlight rules are defined by *.tmLanguage PLIST files or *.tmLanguage.json JSON files. You need these TextMate rule files (aka syntaxes) and optionally language configuration files (*.language-configuration.json) for your target language.

You can find those files in:

Find Themes

TextMate must be used together with TextMate themes. You also need to find theme JSON files from VSCode Extensions. There are some folders named in theme-* pattern. Those folders are for VSCode built-in TextMate themes.

Prepare Language Registry

Multiple languages can be loaded by TextMate. We should prepare languages.json for later loading. For exmaple, your assets directory:

Text
.
├─ textmate
│  ├─ java
│  │  ├─ syntaxes
│  │  │  └─ java.tmLanguage.json
│  │  └─ language-configuration.json
│  └─ kotlin
│     ├─ syntaxes
│     │  └─ Kotlin.tmLanguage
│     └─ language-configuration.json
└─ language.json

Your language.json:

JSON
{
  "languages": [
    {
      "grammar": "textmate/java/syntaxes/java.tmLanguage.json",
      "name": "java",
      "scopeName": "source.java",
      "languageConfiguration": "textmate/java/language-configuration.json"
    },
    {
      "grammar": "textmate/kotlin/syntaxes/Kotlin.tmLanguage",
      "name": "kotlin",
      "scopeName": "source.kotlin",
      "languageConfiguration": "textmate/kotlin/language-configuration.json"
    }
  ]
}

name is custom and scopeName is the root scope of the syntax file.

For language (like HTML and Markdown) with embedded languages, refer to HTML sample in Demo App

Load Syntaxes and Themes

Before using TextMate languages in editor, we should load the syntax and theme files into registry. These steps are performed only once, no matter how many editors are to use TextMate.

Supposing we are to load textmate files from our APK assets. First, we need to add FileResolver for TextMate internal file access.

Kotlin
FileProviderRegistry.getInstance().addFileProvider(
    AssetsFileResolver(
        applicationContext.assets // use application context
    )
)
Java
FileProviderRegistry.getInstance().addFileProvider(
    new AssetsFileResolver(
        getApplicationContext().getAssets() // use application context
    )
)

Then, the themes should be loaded. The code below shows how to load a single theme into the editor.

Kotlin
val themeRegistry = ThemeRegistry.getInstance()
val name = "quietlight" // name of theme
val themeAssetsPath = "textmate/$name.json"
themeRegistry.loadTheme(
    ThemeModel(
        IThemeSource.fromInputStream(
            FileProviderRegistry.getInstance().tryGetInputStream(themeAssetsPath), themeAssetsPath, null
        ), 
        name
    ).apply {
        // If the theme is dark
        // isDark = true
    }
)
Java
var themeRegistry = ThemeRegistry.getInstance();
var name = "quietlight"; // name of theme
var themeAssetsPath = "textmate/" + name + ".json";
var model = new ThemeModel(
        IThemeSource.fromInputStream(
            FileProviderRegistry.getInstance().tryGetInputStream(themeAssetsPath), themeAssetsPath, null
        ), 
        name
    );
// If the theme is dark
// model.setDark(true);
themeRegistry.loadTheme(model);

Next, select an active theme for TextMate. TextMate uses its registry to manage global color scheme.

Kotlin
ThemeRegistry.getInstance().setTheme("your-theme-name")
Java
ThemeRegistry.getInstance().setTheme("your-theme-name");

Finally, we load the language syntaxes and configurations.

Kotlin
GrammarRegistry.getInstance().loadGrammars("textmate/languages.json")
Java
GrammarRegistry.getInstance().loadGrammars("textmate/languages.json");
Load by Kotlin DSL

You can load languages into grammar registry without languages.json, by Kotlin DSL.

For example:

Kotlin
GrammarRegistry.getInstance().loadGrammars(
    languages {
        language("java") {
            grammar = "textmate/java/syntaxes/java.tmLanguage.json"
            defaultScopeName()
            languageConfiguration = "textmate/java/language-configuration.json"
        }
        language("kotlin") {
            grammar = "textmate/kotlin/syntaxes/Kotlin.tmLanguage"
            defaultScopeName()
            languageConfiguration = "textmate/kotlin/language-configuration.json"
        }
        language("python") {
            grammar = "textmate/python/syntaxes/python.tmLanguage.json"
            defaultScopeName()
            languageConfiguration = "textmate/python/language-configuration.json"
        }
    }
)

defaultScopeName() sets scopeName to source.${languageName}.

Setup Editor

Set color scheme for the editor. If TextMateColorScheme is not applied to the editor, the colors of syntax-highlight result from TextMate will be transparent.

Kotlin
editor.colorScheme = TextMateColorScheme.create(ThemeRegistry.getInstance())
Java
editor.setColorScheme(TextMateColorScheme.create(ThemeRegistry.getInstance()));

Set editor language.

Kotlin
val languageScopeName = "source.java" // The scope name of target language
val language = TextMateLanguage.create(
    languageScopeName, true /* true for enabling auto-completion */
)
editor.setEditorLanguage(language)
Java
var languageScopeName = "source.java"; // The scope name of target language
var language = TextMateLanguage.create(
    languageScopeName, true /* true for enabling auto-completion */
);
editor.setEditorLanguage(language);

Congratulations! You've done all the setup. Enjoy!

language-java

The Java language support provides token-based highlight, identifier auto-completion and code block markers. It also has some experimental features for testing editor.

Though its functionality remains to be simple, its speed is fairly fast than other complex language analysis.

To create and apply the language, see code below:

Kotlin
editor.editorLanguage = JavaLanguage()
Java
editor.setEditorLanguage(new JavaLanguage());

language-treesitter

TreeSitter is developed by the creators of Atom and now Zed and used in the two code editors. TreeSitter is a parser generator tool and an incremental parsing library.

With TreeSitter, we can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited. And use the syntax tree for accurate syntax-highlight.

We use Java binding android-tree-sitter to invoke tree-sitter APIs.

Before reading ahead, we strongly recommend you to check out TextStyle in editor framework first.

Prepare Language

You can find existing language implementation from android-tree-sitter. If the language you want is missing, you have to build the language for Android on your own.

Besides, Four scm files for querying the syntax tree are required.

    1. For highlight highlights.scm for most languages can be found in TreeSitter language repositories. For exmaple, the one for Java is here
    1. For code blocks (optional) This is sora-editor specific queries. Refer to here for instructions and sample.
    1. For brackets (optional) This is sora-editor specific queries. Refer to here for instructions and sample.
    1. For local variables (optional) locals.scm for most languages can be found in nvim-treesitter repository.

Useful Links:

Create Language Spec

First, TsLanguageSpec should be created with tree-sitter language instance and scm source texts. You may also need to add a custom LocalsCaptureSpec for your locals.scm.

Kotlin
val spec = TsLanguageSpec(
    // Your tree-sitter language instance
    language = TSLanguageJava.getInstance(),
    // scm source texts
    highlightScmSource = assets.open("tree-sitter-queries/java/highlights.scm")
        .reader().readText(),
    codeBlocksScmSource = assets.open("tree-sitter-queries/java/blocks.scm")
        .reader().readText(),
    bracketsScmSource = assets.open("tree-sitter-queries/java/brackets.scm")
        .reader().readText(),
    localsScmSource = assets.open("tree-sitter-queries/java/locals.scm")
        .reader().readText(),
    localsCaptureSpec = object : LocalsCaptureSpec() {
        // Override any method to change the specification
    }
)

Sometimes, your scm file uses external predicate methods (client predicates) to better querying the syntax tree. In this case, add your predicate implementations to the predicates argument.

Make Language and Theme

Create a TsLanguage with your TsLanguageSpec and theme builder DSL.

Kotlin
// Extension Function for easily make text styles in Kotlin
import io.github.rosemoe.sora.lang.styling.textStyle

// ...
val language = TsLanguage(languageSpec, false /* useTab */) {
    // Theme Builder DSL
    // Apply text style to captured syntax nodes

    // Apply style to single type of node
    textStyle(KEYWORD, bold = true) applyTo "keyword"
    // Apply to multiple
    textStyle(LITERAL) applyTo arrayOf("constant.builtin", "string", "number")
}

Apply Language

Now the language instance can be applied to the editor.

Kotlin
editor.setEditorLanguage(language)

Note that, the TsLanguageSpec object can not be reused, because it is closed when the TsLanguage is destroyed.

Released under the LGPL-2.1 License.