Analogical Modeling Weka Plugin

State-of-the-art analogical modeling plugin for Weka.

Installation and Use in Weka

Download Weka. You need at least 3.8.5 to use this package. You can download Weka here: http://www.cs.waikato.ac.nz/ml/weka/
Start up Weka, and in the initial screen ("GUI Chooser") go to the tools menu and select "Package Manager". You'll see the screen below. Select "AnalogicalModeling" and click "Install".

Close the package manager and click on the "Experimenter" button in the GUI Chooser window. In the "Preprocess" tab, open your arff file. If you need an example file, try data/ch3example.arff from this repository. (This contains a toy example from chapter 3 of Royall Skousen's Analogical Modeling of Language).
Analogical modeling can only work with nominal data, so if your dataset contains other types of data (e.g. numeric), you'll need to pre-process it. For example, to discretize a continuous numeric attribute into bucketed nominal attributes, in the "Preprocess" tab you can add the following filter: filters.unsupervised.attribute.Discretize. More information on this filter is available via the Weka MOOC. Screenshot below:

In the "Classify" tab, click "Choose" and select the AnalogicalModeling classifier from the "lazy" package. Screenshot below:

Under "Test options", select "Supplied test set" and open the arff file containing your test set. If you used data/ch2example.arff earlier, you can use data/ch3exampleTest.arff here.
Click the "More options..." button, then the "Choose" button labeled "Output predictions". From there, select AnalogicalModelingOutput. Please note that this output option can ONLY be used with the Analogical Modeling classifier; If you switch to another classifier, you will also need to change this field. Screenshot below:

Click on the AnalogicalModelingOutput text that appeared in the field next to the "Choose" button. From here, you can configure what information you want printed, including analogical sets and gang effects, as well as the desired output format. You can also choose to suppress the output in the window and write it to a file instead. Screenshot below:

Back on the "Classify" tab again, click "Start". If you used the chapter 3 data and enabled output for analogical sets and gang effects, the results should appear as in the below screenshot:

About Analogical Modeling

Analogical Modeling (or AM) was developed as an exemplar-based approach to modeling language usage, and has also been found useful in modeling other "sticky" phenomena. AM is especially suited to this because it predicts probabilistic occurrences instead of assigning static labels for instances.

AM was not designed to be a classifier, but as a cognitive theory explaining variation in human behavior. As such, though in practice it is often used like any other machine learning classifier, there are fine theoretical points in which it differs. As a theory of human behavior, much of the value in its predictions lies in matching observed human behavior, including non-determinism and degradations in accuracy caused by paucity of data.

The AM algorithm could be called a probabilistic, instance-based classifier. However, the probabilities given for each classification are not degrees of certainty, but actual probabilities of occurring in real usage. AM models "sticky" phenomena as being intrinsically sticky, not as deterministic phenomena that just require more data to be predicted perfectly.

Though it is possible to choose an outcome probabilistically, in practice users are generally interested in either the full predicted probability distribution or the outcome with the highest probability.

AM practitioners generally use terminology taken from statistics, most of which has equivalent terminology used by computer scientists (and most machine learning frameworks in general). Examples are 'exemplar' (training instance), 'outcome' (class label), and 'variable' (feature). This software uses the CS terminology internally, but user-facing reports use the AM terminology.

The running time for analogical modeling is exponential in the number of features (variables); exact calculation becomes impractical after about 50 features. Therefore, this tool will automatically use an approximation algorithm when there are 50 or more features.

Features

As an evolving project, the most important design principle has been modularity and ease of experimentation with core algorithms. As such, the system is able to adapt for data of different cardinalities:

Context labels scale up from ints to longs and BigIntegers
Very small vectors are placed in a single lattice
Larger vectors are placed in a distributed lattice, with the number of lattices increasing with size
Very large vectors (50 or more features) are classified approximately using Monte Carlo simulation

Some algorithmic improvements have been made to the distributed lattice and approximate lattice filling algorithms. Concurrency is also used extensively so that 8 CPU cores will fill lattices roughly 8 times faster, etc.

Development

The project JavaDoc is uploaded to GitHub pages automatically via a GitHub Action. Browse here.

An additional GitHub Action builds and tests the project for every branch and pull request, so contributors should get feedback quickly if a change breaks anything.

Prerequisites for First-Time Java Developers

Understanding Java Project Structure

For developers new to Java, here's what the key directories mean:

src/main/java/ - Your Java source code files (.java)
src/test/java/ - Unit test files
src/main/resources/ - Non-code resources (config files, etc.)
build/ - Generated files (compiled code, reports) - don't edit these
gradle/ - Gradle wrapper files
build.gradle.kts - Project configuration and dependencies (like package.json in Node.js)

Common Java Development Terms

JDK (Java Development Kit): Tools for developing Java applications (includes compiler)
JVM (Java Virtual Machine): Runs compiled Java code
Classpath: Where Java looks for compiled code and libraries
JAR file: Java Archive - packaged Java application (like a .zip with compiled code)
Gradle: Build tool that manages dependencies and compilation (like npm/yarn for JavaScript)

Developing with IntelliJ IDEA (Recommended)

Step 1: Install Java 11

Using SDKMan! (Recommended)

Install SDKMan!:

curl -s "https://get.sdkman.io" | bash
source "$HOME/.sdkman/bin/sdkman-init.sh"

Install Java 11:

sdk install java 11.0.25-tem
sdk use java 11.0.25-tem

Direct Download Alternative

Download Eclipse Temurin 11 from Adoptium
Follow the installer for your OS

Step 2: Install and Configure IntelliJ IDEA

Download IntelliJ IDEA Community Edition (free) from JetBrains
Open the project:
- Launch IntelliJ IDEA
- Click "Open" on the welcome screen
- Navigate to this project folder and select build.gradle.kts
Configure the JDK:
- Go to File → Project Structure (Cmd+; on Mac, Ctrl+Alt+Shift+S on Windows/Linux)
- Under "Project", set SDK to Java 11
- If not listed, click "Add SDK" → "JDK" and browse to your Java 11 installation
Wait for Gradle sync:
- IntelliJ will automatically download dependencies (first time takes a few minutes)
- Look for the progress bar at the bottom of the window

Step 3: Working in IntelliJ

Running the build:

Open the Gradle panel (View → Tool Windows → Gradle)
Navigate to Tasks → build → build
Double-click to run

Running tests:

To run all tests: In Gradle panel, Tasks → verification → test
To run a single test: Open the test file, click the green arrow next to the test method
Test results appear in the Run panel at the bottom

Debugging:

Set breakpoints by clicking in the left margin of any Java file
Right-click a test and select "Debug"
Use the Debug panel to step through code and inspect variables

Common IntelliJ shortcuts:

Cmd+Shift+F (Mac) / Ctrl+Shift+F (Win/Linux): Search across all files
Cmd+Click (Mac) / Ctrl+Click (Win/Linux): Go to definition
Shift+F6: Rename variable/method everywhere
Alt+Enter: Show quick fixes for errors

Option 2: Developing from the Command Line

Step 1: Install Java 11

Using SDKMan!

# Install SDKMan
curl -s "https://get.sdkman.io" | bash
source "$HOME/.sdkman/bin/sdkman-init.sh"

# Install and use Java 11
sdk install java 11.0.25-tem
sdk use java 11.0.25-tem

# Verify installation
java -version  # Should show openjdk 11.0.25 or similar

Manual Installation

Download Eclipse Temurin 11 from Adoptium

Set environment variables:

# Add to ~/.bashrc or ~/.zshrc
export JAVA_HOME=/path/to/jdk-11
export PATH=$JAVA_HOME/bin:$PATH

Reload your shell configuration:
```
source ~/.bashrc  # or source ~/.zshrc
```

Step 2: Building and Testing

This project uses Gradle with an included wrapper, so you don't need to install Gradle separately.

Available commands:

# Build and test the project
./gradlew build

# Run only unit tests
./gradlew test

# Generate JavaDoc documentation
./gradlew javadoc

# Build the Weka plugin package
./gradlew weka_package

# Clean build artifacts
./gradlew clean

# Run with verbose output for debugging
./gradlew build --info

Note for Windows: Use gradlew.bat instead of ./gradlew

Step 3: Viewing Results

After building:

Compiled classes: build/classes/java/main/
JAR files: build/libs/
Test reports: build/reports/tests/test/index.html (open in browser; unnecessary, though, because results are shown in ./gradlew test output)
JavaDoc: build/docs/javadoc/index.html (open in browser)

Troubleshooting Command Line Builds

If the build fails:

Check Java version:

java -version  # Should be version 11
./gradlew --version  # Should show JVM version 11

Clear Gradle cache and retry:
```
./gradlew clean build --no-build-cache
```
Run with more details:
```
./gradlew build --stacktrace --info
```
Common issues:
- Wrong Java version: Use sdk use java 11.0.25-tem or check JAVA_HOME
- Permission denied: Run chmod +x gradlew
- Out of memory: Set export GRADLE_OPTS="-Xmx2g"

Releasing

To release a new version of the plugin:

Update and commit Description.props
- version number is in several locations
- date
Create and push a new git tag with the next version number
run ./gradlew weka_package, and upload the resulting artifact (distributions/Weka_AnalogicalModeling-X.Y.Z.zip) to the GitHub release
send the new Description.props file to Mark Hall

Running in the Terminal

Under construction; try testing AnalogicalModeling.java with -t data/ch3example.arff -x 5.

License

Released under the Apache 2.0 license (see the LICENSE file for details). Copyright Nathan Glenn, 2021.

Name		Name	Last commit message	Last commit date
Latest commit History 267 Commits
.github/workflows		.github/workflows
data		data
doc		doc
gradle/wrapper		gradle/wrapper
scripts		scripts
src		src
.editorconfig		.editorconfig
.gitignore		.gitignore
Description.props		Description.props
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
build.gradle.kts		build.gradle.kts
gradlew		gradlew
gradlew.bat		gradlew.bat
todo.txt		todo.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Analogical Modeling Weka Plugin

Installation and Use in Weka

About Analogical Modeling

Features

Development

Prerequisites for First-Time Java Developers

Understanding Java Project Structure

Common Java Development Terms

Developing with IntelliJ IDEA (Recommended)

Step 1: Install Java 11

Step 2: Install and Configure IntelliJ IDEA

Step 3: Working in IntelliJ

Option 2: Developing from the Command Line

Step 1: Install Java 11

Step 2: Building and Testing

Step 3: Viewing Results

Troubleshooting Command Line Builds

Releasing

Running in the Terminal

License

See Also

About

Uh oh!

Releases 8

Packages

Uh oh!

Languages

License

garfieldnate/Weka_AnalogicalModeling

Folders and files

Latest commit

History

Repository files navigation

Analogical Modeling Weka Plugin

Installation and Use in Weka

About Analogical Modeling

Features

Development

Prerequisites for First-Time Java Developers

Understanding Java Project Structure

Common Java Development Terms

Developing with IntelliJ IDEA (Recommended)

Step 1: Install Java 11

Step 2: Install and Configure IntelliJ IDEA

Step 3: Working in IntelliJ

Option 2: Developing from the Command Line

Step 1: Install Java 11

Step 2: Building and Testing

Step 3: Viewing Results

Troubleshooting Command Line Builds

Releasing

Running in the Terminal

License

See Also

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Uh oh!

Languages

Packages