Tron

Tron is a centralized system for managing periodic batch processes across a cluster. If this is your first time using Tron, read Tutorial and Overview to get a better idea of what it is, how it works, and how to use it.

Note

Please report bugs in the documentation at our Github issue tracker.

Table of Contents

What’s New

See the CHANGELOG.

Tutorial

To install Tron you will need:

  • A copy of the most recent Tron release from either github or pypi (see Installing Tron).
  • A server on which to run trond.
  • One or more batch boxes which will run the Jobs.
  • An SSH key and a user that will allow the tron daemon to login to all of the batch machines without a password prompt.

Installing Tron

The easiest way to install Tron is from PyPI:

$ sudo pip install tron

You can also get a copy of the current development release from github. See setup.py in the source package for a full list of required packages.

If you are interested in working on Tron development see Contributing to Tron for additional requirements and setting up a dev environment.

Running Tron

Tron runs as a single daemon, trond.

On your management node, run:

$ sudo -u <tron user> trond

The chosen user will need SSH access to all your worker nodes, as well as permission to write to the working directory, log file, and lock file (see trond --help for defaults). You can change these directories using command line options. Also see Logging on how to change the default logging settings.

Once trond is running, you can view its status using tronview (by default tronview will connect to localhost, use --server=<host>:<port> -s to specify a different server, and have that setting saved in ~/.tron):

$ tronview

Jobs:
No jobs

Configuring Tron

There are a few options on how to configure tron, but the most straightforward is through tronfig:

$ tronfig

This will open your configured $EDITOR with the current configuration file. Edit your file to be something like this:

ssh_options:
  agent: true

nodes:
  - name: local
    hostname: 'localhost'

jobs:
  - name: "getting_node_info"
    node: local
    schedule: "interval 10 mins"
    actions:
      - name: "uname"
        command: "uname -a"
      -
        name: "cpu_info"
        command: "cat /proc/cpuinfo"
        requires: [uname]

After you exit your editor, the configuration will be validated and uploaded to trond.

Now if you run tronview again, you’ll see getting_node_info as a configured job. Note that it is configured to run 10 minutes from now. This should give you time to examine the job to ensure you really want to run it.

Jobs:
Name              State      Scheduler            Last Success
getting_node_info ENABLED    INTERVAL:0:10:00     None

You can quickly disable a job by using tronctl:

$ tronctl disable getting_node_info
Job getting_node_info is disabled

This will stop scheduled jobs and prevent anymore from being scheduled. You are now in manual control. To manually execute a job immediately, do this:

$ tronctl start getting_node_info
New job getting_node_info.1 created

You can monitor this job run by using tronview:

$ tronview getting_node_info.1
Job Run: getting_node_info.1
State: SUCC
Node: localhost

Action ID & Command  State  Start Time           End Time             Duration
.uname               SUCC   2011-02-28 16:57:48  2011-02-28 16:57:48  0:00:00
.cpu_info            SUCC   2011-02-28 16:57:48  2011-02-28 16:57:48  0:00:00

$ tronview getting_node_info.1.uname
Action Run: getting_node_info.1.uname
State: SUCC
Node: localhost

uname -a

Requirements:

Stdout:
Linux dev05 2.6.24-24-server #1 SMP Wed Apr 15 15:41:09 UTC 2009 x86_64 GNU/Linux
Stderr:

Tron also provides a simple, optional web UI that can be used to get tronview data in a browser. See tronweb for setup instructions.

That’s it for the basics. You might want to look at Overview for a more comprehensive description of how Tron works.

Overview

Batch process scheduling on a single UNIX machines has historically been managed by cron and its derivatives. But if you have many batches, complex dependencies between batches, or many machines, maintaining config files across them may be difficult. Tron solves this problem by centralizing the configuration and scheduling of jobs to a single daemon.

The Tron system is split into four commands:

trond
Daemon responsible for scheduling, running, and saving state. Provides an HTTP interface to tools.
tronview
View job state and output.
tronctl
Start, stop, enable, disable, and otherwise control jobs.
tronfig
Change Tron’s configuration while the daemon is still running.

The config file uses YAML syntax, and is further described in Configuration.

Nodes, Jobs and Actions

Tron’s orders consist of jobs. Jobs contain actions which may depend on other actions in the same job and run on a schedule.

trond is given access (via public key SSH) to one or more nodes on which to run jobs. For example, this configuration has two nodes, each of which is responsible for a single job:

nodes:
    hostname: 'localhost'
  - name: node1
    hostname: 'batch1'
  - name: node2
    hostname: 'batch2'

jobs:
  - name: "job0"
    node: node1
    schedule: "interval 20s"
    actions:
      - name: "batch1action"
        command: "sleep 3; echo asdfasdf"
  - name: "job1"
    node: node2
    schedule: "interval 20s"
    actions:
      - name: "batch2action"
        command: "cat big.txt; sleep 10"

How the nodes are set up and assigned to jobs is entirely up to you. They may have different operating systems, access to different databases, different privileges for the Tron user, etc.

See also:

Node Pools

Nodes can be grouped into pools. To continue the previous example:

node_pools:
    - name:pool
      nodes: [node1, node2]

jobs:
    # ...
    - name: "job2"
      node: pool
      schedule: "interval 5s"
      actions:
        - name: "pool_action"
          command: "ls /; sleep 1"
      cleanup_action:
        command: "echo 'all done'"

job2’s action will be run on a random node from pool every 5 seconds. When pool_action is complete, cleanup_action will run on the same node.

For more information, see Jobs.

Caveats

While Tron solves many scheduling-related problems, there are a few things to watch out for.

Tron keeps an SSH connection open for the entire lifespan of a process. This means that to upgrade trond, you have to either wait until no jobs are running, or accept an inconsistent state. This limitation is being worked on, and should be improved in later releases.

Tron is under active development. This means that some things will change. Whenever possible these changes will be backwards compatible, but in some cases there may be non-backwards compatible changes.

Tron does not support unicode. Tron is built using twisted which does not support unicode.

Configuration

Syntax

The Tron configuration file uses YAML syntax. The recommended configuration style requires only strings, decimal values, lists, and dictionaries: the subset of YAML that can be losslessly transformed into JSON. (In fact, your configuration can be entirely JSON, since YAML is mostly a strict superset of JSON.)

Past versions of Tron used additional YAML-specific features such as tags, anchors, and aliases. These features still work in version 0.3, but are now deprecated.

Basic Example

ssh_options:
  agent: true

nodes:
  - name: local
    hostname: 'localhost'

jobs:
  - name: "getting_node_info"
    node: local
    schedule: "interval 10 mins"
    actions:
      - name: "uname"
        command: "uname -a"
      - name: "cpu_info"
        command: "cat /proc/cpuinfo"
        requires: [uname]

Command Context Variables

command attribute values may contain command context variables that are inserted at runtime. The command context is populated both by Tron (see Built-In Command Context Variables) and by the config file (see Command Context). For example:

jobs:
    - name: "command_context_demo"
      node: local
      schedule: "1st monday in june"
      actions:
        - name: "print_run_id"
          # prints 'command_context_demo.1' on the first run,
          # 'command_context_demo.2' on the second, etc.
          command: "echo {runid}"

SSH

ssh_options (optional)

Options for SSH connections to Tron nodes. When tron runs a job on a node, it will add some jitter (random delay) to the run, which can be configured with the options below.

agent (optional, default False)
Set to True if trond should use an SSH agent. This requires that $SSH_AUTH_SOCK exists in the environment and points to the correct socket.
identities (optional, default [])
List of paths to SSH identity files
known_hosts_file (optional, default None)
The path to an ssh known hosts file
connect_timeout (optional, default 30)
Timeout in seconds when establishing an ssh connection
idle_connection_timeout (optional, default 3600)
Timeout in seconds that an ssh connection can remain idle after which it is closed
jitter_min_load (optional, default 4)
Minimum load on a node before any jitter is introduced. See jitter_load_factor for a description of how load is calculated
jitter_max_delay (optional, default 20)
Maximum number of seconds to add to a run
jitter_load_factor (optional, default 1)
Factor used to increment the count of running actions for determining the upper bound of jitter to add (ex. A factor of 2 would increase the upper bound by 2 seconds per running action)

Example:

ssh_options:
    agent:                    false
    known_hosts_file:         /etc/ssh/known_hosts
    identities:
        - /home/batch/.ssh/id_dsa-nopasswd

    connect_timeout:          30
    idle_connection_timeout:  3600

    jitter_min_load:          4
    jitter_max_delay:         20
    jitter_load_factor:       1

Time Zone

time_zone (optional)
Local time as observed by the system clock. If your system is obeying a time zone with daylight savings time, then some of your jobs may run early or late on the days bordering each mode. See Notes on Daylight Saving Time for more information.

Example:

time_zone: US/Pacific

Command Context

command_context
Dictionary of custom command context variables. It is an arbitrary set of key-value pairs.

Example:

command_context:
    PYTHON: /usr/bin/python
    TMPDIR: /tmp

See a list of Built-In Command Context Variables.

Output Stream Directory

output_stream_dir
A path to the directory used to store the stdout/stderr logs from jobs. It defaults to the --working_dir option passed to trond.

Example:

output_stream_dir: "/home/tronuser/output/"

State Persistence

state_persistence

Configure how trond should persist its state to disk. By default a shelve store is used and saved to ./tron_state in the working directory.

store_type
Valid options are:

shelve - uses the shelve module and saves to a local file

sql - uses sqlalchemy to save to a database (tested with version 0.7).

yaml - uses yaml and saves to a local file (this is not recommend and is provided to be backwards compatible with previous versions of Tron).

You will need the appropriate python module for the option you choose.

name
The name of this store. This will be the filename for a shelve or yaml store. It is just a label when used with an sql store.
connection_details

Ignored by shelve and yaml stores.

A connection string (see sqlalchemy engine configuration) when using an sql store.

Valid keys are: hostname, port, username, password. Example: "hostname=localhost&port=5555"

buffer_size
The number of save calls to buffer before writing the state. Defaults to 1, which is no buffering.

Example:

state_persistence:
    store_type: sql
    name: local_sqlite
    connection_details: "sqlite:///dest_state.db"
    buffer_size: 1 # No buffer

Action Runners

Note: this is an experimental feature

action_runner

Action runner configuration allows you to run Job actions through a script which records it’s pid. This provides support for a max_runtime option on jobs, and allows you to stop or kill the action from tronctl.

runner_type
Valid options are:
none
Run actions without a wrapper. This is the default
subprocess
Run actions with a script which records the pid and runs the action command in a subprocess (on the remote node). This requires that bin/action_runner.py and bin/action_status.py are available on the remote host.
remote_status_path
Path used to store status files. Defaults to /tmp.
remote_exec_path
Directory path which contains action_runner.py and action_status.py scripts.

Example:

action_runner:
    runner_type:        "subprocess"
    remote_status_path: "/tmp/tron"
    remote_exec_path:   "/usr/local/bin"

Nodes

nodes

List of nodes. Each node has the following options:

hostname (required)
The hostname or IP address of the node
name (optional, defaults to hostname)
A name to refer to this node
username (optional, defaults to current user)
The name of the user to connect with
port (optional, defaults to 22)
The port number of the node

Example:

nodes:
    - name: node1
      hostname: 'batch1'
    - hostname: 'batch2'    # name is 'batch2'

Node Pools

node_pools
List of node pools, each with a name and nodes list. name defaults to the names of each node joined by underscores.

Example:

node_pools:
    - name: pool
      nodes: [node1, batch1]
    - nodes: [batch1, node1]    # name is 'batch1_node1'

Jobs and Actions

jobs
List of jobs for Tron to manage. See Jobs for the options available to jobs and their actions.

Logging

As of v0.3.3 Logging is no longer configured in the tron configuration file.

Tron uses Python’s standard logging and by default uses a rotating log file handler that rotates files each day. The default log directory is /var/log/tron/tron.log.

To configure logging pass -l <logging.conf> to trond. You can modify the default logging.conf by copying it from tron/logging.conf. See http://docs.python.org/howto/logging.html#configuring-logging

Interesting logs

Most tron logs are named by using pythons __file__ which uses the modules name. There are a couple special cases:

twisted
Twisted sends its logs to the twisted log
tron.api.www.access
API access logs are sent to this log at the INFO log level. They follow a standard apache combined log format.

Jobs

A job consists of a name, a node/node pool, a list of actions, a schedule, and an optional cleanup action. They are periodic events that do not interact with other jobs while running.

If all actions exit with status 0, the job has succeeded. If any action exists with a nonzero status, the job has failed.

Required Fields

name
Name of the job. Used in tronview and tronctl.
node
Reference to the node or pool to run the job in. If a pool, the job is run in a random node in the pool.
schedule
When to run this job. Schedule fields can take multiple forms. See Scheduling.
actions
List of actions.

Optional Fields

monitoring (default {})

Dictionary of key: value pairs to inform the monitoring framework on how to alert teams for job failures.

team
Team responsible for the job. Must already be defined in the monitoring framework.
page (default False)
Boolean on whether or not an alert for this job is page-worthy.
runbook
Runbook associated with the job.
tip (default None)
A short 1-line version of the runbook.
notification_email
A comma-separated string of email destinations. Defaults to the “team” default.
slack_channels
A list of Slack channels to send alerts to. Defaults to the team setting. Set an empty list to specify no Slack notifications.
ticket (default False)
A Boolean value to enable ticket creation.
project (default None)
A string representing the JIRA project that the ticket should go under. Defaults to the team value.
tags (default None)
A list of arbitrary tags that can be used in handlers for different metadata needs.
component (default None)
A list of components affected by the event. A good example here would be to include the job that is being affected.
description (default None)
Human readable text giving more context on any monitoring events.
check_that_every_day_has_a_successful_run (default False)

If True, the latest job run each day will be checked to see if it was successful.

If False, only the latest overall run will be checked to see if it was successful.

queueing (default True)
If a job run is still running when the next job run is to be scheduled, add the next run to a queue if this is True. Otherwise, cancel the job run. Note that if the scheduler used for this job is not defined to queue overlapping then this setting is ignored. The ConstantScheduler will not queue overlapping.
allow_overlap (default False)
If True new job runs will start even if the previous run is still running. By default new job runs are either cancelled or queued (see queuing).
run_limit (default 50)
Number of runs which will be stored. Once a Job has more then run_limit runs, the output and state for the oldest run are removed. Failed runs will not be removed.
all_nodes (default False)

If True run this job on each node in the node pool list. If a node appears more than once in the list, the job will be run on that node once for each appearance.

If False run this job on a random node from the node pool list. If a node appears more than once in the list, the job will be more likely to run on that node, proportionate to the number of appearances.

If node is not a node pool, this option has no effect.

cleanup_action
Action to run when either all actions have succeeded or the job has failed. See Cleanup Actions.
enabled (default True)
If False the job will not be scheduled to run. This configuration option is only relevant when a Job is first added to the configuration, after which this value will be ignored.
max_runtime (default None)

A time interval (ex: “2 hours”) that limits the duration of each job run. If the job run is still running after this duration, all of it’s actions are sent SIGTERM.

Note: This requires an Action Runners to be configured. If action_runner is none max_runtime does nothing.

time_zone (default None)
Time zone used for calculating when a job should run. Defaults to None, which means it will use the default time_zone set in the master config.
expected_runtime (default 24h)
A time interval (ex: “2 hours”) that specifies the maximum expected duration of each job run. Monitoring will alert if a job run is still running after this duration. Use max_runtime instead if hard limit is needed.

Actions

Actions consist primarily of a name and command. An action’s command is executed as soon as its dependencies (specified by requires) are satisfied. So if your job has 10 actions, 1 of which depends on the other 9, then Tron will launch the first 9 actions in parallel and run the last one when all have completed successfully.

If any action exits with nonzero status, the job will continue to run any actions which do not depend on the failed action.

Required Fields
name
Name of the action. Used in tronview and tronctl.
command
Command to run on the specified node. A common mistake here is to use shell expansions or expressions in your command. Commands are run using exec so bash (or other shell) expressions will not work, and could cause the job to fail.
Optional Fields
requires
List of action names that must complete successfully before this action is run. Actions can only require actions in the same job.
node
Node or node pool to run the action on if different from the rest of the job.
retries
An integer representing how many times Tron is allowed to automatically retry the command. Tron will immediately re-run the command if it fails, and the action will not enter the failed state until retries are exhausted. Defaults to None (0 retries allowed).
retries_delay (beta)
A timedelta to wait in between retries.
expected_runtime (default 24h)
A time interval (ex: “2 hours”) that specifies the maximum expected duration of each action run. Monitoring will alert if a action run is still running after this duration.
Example Actions
jobs:
    - name: convert_logs
      node: node1
      schedule:
        start_time: 04:00:00
      actions:
        - name: verify_logs_present
          command: "ls /var/log/app/log_{shortdate-1}.txt"
        - name: convert_logs
          command: "convert_logs /var/log/app/log_{shortdate-1}.txt /var/log/app_converted/log_{shortdate-1}.txt"
          requires: [verify_logs_present]

Scheduling

Tron supports four methods for configuring the schedule of a job. Schedulers support a jitter parameter that allows them to vary their runtime by a random time delta.

Interval

Run the job every X seconds, minutes, hours, or days. The time expression is <interval> days|hours|minutes|seconds, where the units can be abbreviated.

Short form:

schedule: "interval 20s"

Long form:

schedule:
    type:   "interval"
    value:  "5 mins"
    jitter: "10 seconds"        # Optional

With alias:

schedule:
    type:   "interval"
    value:  "hourly"
Daily

Run the job on specific days at a specific time. The time expression is HH:MM:SS[ MTWRFSU].

Short form:

schedule: "daily 04:00:00"

Short form with days:

schedule: "daily 04:00:00 MWF"

Long form:

schedule:
    type:   "daily"
    value:  "07:00:00 MWF"
    jitter: "10 min"            # Optional
Cron

Schedule a job using cron syntax. Tron supports predefined schedules, ranges, and lists for each field. It supports the L in day of month field only (which schedules the job on the last day of the month). Only one of the day fields (day of month and day of week) can have a value.

Short form:

schedule: "cron */5 * * 7,8 *"  # Every 5 minutes in July and August
schedule: "cron 0 3-6 * * *"    # Every hour between 3am and 6am

Long form:

schedule:                       # long form
    type: "cron"
    value: "30 4 L * *"         # The last day of the month at 4:30am
Complex

More powerful version of the daily scheduler based on the one used by Google App Engine’s cron library. To use this scheduler, use a string in this format as the schedule:

("every"|ordinal) (days) ["of|in" (monthspec)] (["at"] HH:MM)
ordinal
Comma-separated list of 1st and so forth. Use every if you don’t want to limit by day of the month.
days
Comma-separated list of days of the week (for example, mon, tuesday, with both short and long forms being accepted); every day is equivalent to every mon,tue,wed,thu,fri,sat,sun
monthspec
Comma-separated list of month names (for example, jan, march, sep). If omitted, implies every month. You can also say month to mean every month, as in 1,8th,15,22nd of month 09:00.
HH:MM
Time of day in 24 hour time.

Some examples:

2nd,third mon,wed,thu of march 17:00
every monday at 09:00
1st monday of sep,oct,nov at 17:00
every day of oct at 00:00

In the config:

schedule: "every monday at 09:00"
schedule:
    type: "groc daily"
    value: "every day 11:22"
    jitter: "5 min"
Notes on Daylight Saving Time

Some system clocks are configured to track local time and may observe daylight savings time. For example, on November 6, 2011, 1 AM occurred twice. Prior to version 0.2.9, this would cause Tron to schedule a daily midnight job to be run an hour early on November 7, at 11 PM. For some jobs this doesn’t matter, but for jobs that depend on the availability of data for a day, it can cause a failure.

Similarly, some jobs on March 14, 2011 were scheduled an hour late.

To avoid this problem, set the Time Zone config variable. For example:

time_zone: US/Pacific

If a job is scheduled at a time that occurs twice, such as 1 AM on “fall back”, it will be run on the first occurrence of that time.

If a job is scheduled at a time that does not exists, such as 2 AM on “spring forward”, it will be run an hour later in the “new” time, in this case 3 AM. In the “old” time this is 2 AM, so from the perspective of previous jobs, it runs at the correct time.

In general, Tron tries to schedule a job as soon as is correct, and no sooner. A job that is schedule for 2:30 AM will not run at 3 AM on “spring forward” because that would be half an hour too soon from a pre-switch perspective (2 AM).

Note

If you experience unexpected scheduler behavior, file an issue on Tron’s Github page.

Cleanup Actions

Cleanup actions run after the job succeeds or fails. They are specified just like regular actions except that there is only one per job and it has no name or requirements list.

If your job creates shared resources that should be destroyed after a run regardless of success or failure, such as intermediate files or Amazon Elastic MapReduce job flows, you can use cleanup actions to tear them down.

The command context variable cleanup_job_status is provided to cleanup actions and has a value of SUCCESS or FAILURE depending on the job’s final state. For example:

-
    # ...
    cleanup_action:
      command: "python -m mrjob.tools.emr.job_flow_pool --terminate MY_POOL"

States

The following are the possible states for a Job and Job Run.

Job States
ENABLED
A run is scheduled and new runs will continue to be scheduled.
DISABLED
No new runs will be scheduled, and scheduled runs will be cancelled.
RUNNING
Job run currently in progress.
Job Run States
SCHE
The run is scheduled for a specific time
RUNN
The run is currently running
SUCC
The run completed successfully
FAIL
The run failed
QUE
The run is queued behind another run(s) and will start when said runs finish
CANC
The run was scheduled, but later cancelled.
UNKWN
The run is in and unknown state. This state occurs when tron restores a job that was running at the time of shutdown.
Action States

Job states are derived from the aggregate state of their actions. The following is a state diagram for an action.

images/action.png

Built-In Command Context Variables

Tron includes some built in command context variables that can be used in command configuration.

shortdate
Run date in YYYY-MM-DD format. Supports simple arithmetic of the form {shortdate+6} which returns a date 6 days in the future, {shortdate-2} which returns a date 2 days before the run date.
year
Current year in YYYY format. Supports the same arithmetic operations as shortdate. For example, {year-1} would return the year previous to the run date.
month
Current month in MM format. Supports the same arithmetic operations as shortdate. For example, {month+2} would return 2 months in the future.
day
Current day in DD format. Supports the same arithmetic operations as shortdate. For example, {day+1} would return the day after the run date.
hour
Current hour in HH (0-23) format. Supports the same arithmetic operations as shortdate. For example, {hour+1} would return the hour after the run hour (mod 24).
unixtime
Current timestamp. Supports addition and subtraction of seconds. For example {unixtime+20} would return the timestamp 20 seconds after the jobs runtime.
daynumber
Current day number as an ordinal (datetime.toordinal()). Supports addition and subtraction of days. For example {daynumber-3} would be 3 days before the run date.
name
Name of the job
node
Hostname of the node the action is being run on

Context variables only available to Jobs

runid
Run ID of the job run (e.g. sample_job.23)
actionnname
The name of the action
cleanup_job_status
SUCCESS if all actions have succeeded when the cleanup action runs, FAILURE otherwise. UNKNOWN if used in an action other than the cleanup action.
last_success
The last successful run date (defaults to current date if there was no previous successful run). Supports date arithmetic using the form {last_success#shortdate-1}.

tronweb

tronweb is the web-based UI for tron.

See http://localhost:8089/web/

Man Pages

tronctl

Synopsis

tronctl [--server <host:port>] [--verbose] <command> <job_name | job_run_id | action_run_id>

Description

tronctl is the control interface for Tron. tronctl allows you to enable, disable, start, stop and cancel Tron Jobs and Services.

Options
--server=<config-file>
Config file containing the address of the server the tron instance is running on
--verbose
Displays status messages along the way
--run-date=<YYYY-MM-DD>
For starting a new job, specifies the run date that should be set. Defaults to today.
--start-date=<YYYY-MM-DD>
For backfills, specifies the starting date of the first job of the backfill. Note that many jobs operate on the previous day’s data.
--end-date=<YYYY-MM-DD>
For backfills, specifies the final date of the backfill. Defaults to today. Note that many jobs operate on the previous day’s data.
Job Commands
disable <job_name>
Disables the job. Cancels all scheduled and queued runs. Doesn’t schedule any more.
enable <job_name>
Enables the job and schedules a new run.
start <job_name>
Creates a new run of the specified job and runs it immediately.
start <job_run_id>
Attempt to start the given job run. A Job run only starts if no other instance is running. If the job has already started, it will attempt to start any actions in the SCH or QUE state.
start <action_run_id>
Attempt to start the action run.
restart <job_run_id>
Creates a new job run with the same run time as this job.
retry <action_run_id>
Re-run an action within an existing job run.
rerun <job_run_id>
Creates a new job run with the same run time as this job (same as restart).
backfill <job_id>
Creates a series of start jobs for a sequence of dates. –start-date must be provided for a backfill.
cancel <job_run_id | action_run_id>
Cancels the specified job run or action run.
success <job_run_id | action_run_id>
Marks the specified job run or action run as succeeded. This behaves the same as the run actually completing. Dependent actions are run and queued runs start.
skip <action_run_id>
Marks the specified action run as skipped. This allows dependent actions to run.
fail <job_run_id | action_run_id>
Marks the specified job run or action run as failed. This behaves the same as the job actually failing.
stop <action_run_id>
Stop an action run
kill <action_run_id>
Force stop (SIGKILL) an action run
Examples
$ tronctl start job0
New Job Run job0.2 created

$ tronctl start job0.3
Job Run job0.3 now in state RUNN

$ tronctl cancel job0.4
Job Run job0.4 now in state CANC

$ tronctl fail job0.4
Job Run job0.4 now in state FAIL

$ tronctl restart job0.4
Job Run job0.4 now in state RUNN

$ tronctl success job0.5
Job Run job0.5 now in state SUCC

$ tronctl retry MASTER.job.5.action1
Retrying ActionRun: MASTER.job.5.action1
See Also

trond (8), tronfig (1), tronview (1),

trond

Synopsis

trond [--working-dir=<working dir>] [--verbose] [--debug]

Description

trond is the tron daemon that manages all jobs.

Options
--version
show program’s version number and exit
-h, --help
show this help message and exit
--working-dir=WORKING_DIR
Directory where tron’s state and output is stored (default /var/lib/tron/)
-l LOG_CONF, --log-conf=LOG_CONF
Logging configuration file to setup python logger
-c CONFIG_FILE, --config-file=CONFIG_FILE
Configuration file to load (default in working dir)
-v, --verbose
Verbose logging
--debug
Debug mode, extra error reporting, no daemonizing
--nodaemon
[DEPRECATED in 0.9.4] Indicates we should not fork and daemonize the process (default False)
--lock-file=LOCKFILE
Where to store the lock file of the executing process (default /var/run/tron.lock)
-P LISTEN_PORT, --port=LISTEN_PORT
What port to listen on, defaults 8089
-H LISTEN_HOST, --host=LISTEN_HOST
What host to listen on defaults to localhost
Files
Working directory
The directory where state and saved output of processes are stored.
Lock file
Ensures only one daemon runs at a time.
Log File
trond error log, configured from logging.conf
Signals
SIGINT
Graceful shutdown. Waits for running jobs to complete.
SIGTERM
Does some cleanup before shutting down.
SIGHUP
Reload the configuration file.
SIGUSR1
Will drop into an ipdb debugging prompt.
Logging

Tron uses Python’s standard logging and by default uses a rotating log file handler that rotates files each day. Logs go to /var/log/tron/tron.log.

To configure logging pass -l <logging.conf> to trond. You can modify the default logging.conf by coping it from tron/logging.conf. See http://docs.python.org/howto/logging.html#configuring-logging

Bugs

trond has issues around daylight savings time and may run jobs an hour early at the boundary.

Post further bugs to http://www.github.com/yelp/tron/issues.

See Also

tronctl (1), tronfig (1), tronview (1),

tronfig

Synopsis

tronfig [--server server_name ] [--verbose | -v] [<namespace>] [-p] [-]

Description

tronfig allows live editing of the Tron configuration. It retrieves the configuration file for local editing, verifies the configuration, and sends it back to the tron server. The configuration is applied immediately.

Options
--server <server_name>
The server the tron instance is running on
--verbose
Displays status messages along the way
--version
Displays version string
-p
Print the configuration
namespace
The configuration namespace to edit. Defaults to MASTER
-
Read new config from stdin.
Configuration

By default tron will run with a blank configuration file. The config file is saved to <working_dir>/config/ by default. See the full documentation at http://tron.readthedocs.io/en/latest/config.html.

See Also

trond (8), tronctl (1), tronview (1),

tronview

Synopsis

tronview [-n <numshown>] [--server <server_name>] [--verbose] [<job_name> | <job_run_id> | <action_run_id>]

Description

tronview displays the status of tron scheduled jobs.

tronview
Show all configured jobs
tronview <job_name>

Shows details for a job. Ex:

$ tronview my_job
tronview <job_run_id>

Show details for specific run or instance. Ex:

$ tronview my_job.0
tronview <action_run_id>

Show details for specific action run. Ex:

$ tronview my_job.0.my_action
Options
--version
show program’s version number and exit
-h, --help
show this help message and exit
-v, --verbose
Verbose logging
-n NUM_DISPLAYS, --numshown=NUM_DISPLAYS
The maximum number of job runs or lines of output to display(0 for show all). Does not affect the display of all jobs and the display of actions for given job.
--server=SERVER
Server URL to connect to
-c, --color
Display in color
--nocolor
Display without color
-o, --stdout
Solely displays stdout
-e, --stderr
Solely displays stderr
-s, --save
Save server and color options to client config file (~/.tron)
States

For complete list of states with a diagram of valid transitions see http://packages.python.org/tron/jobs.html#states

See Also

trond (8), tronctl (1), tronfig (1),

Contributing to Tron

Tron is an open source project and welcomes contributions from the community. The source and issue tracker are hosted on github at http://github.com/yelp/Tron.

Setting Up an Environment

Tron works well with virtualenv, which can be setup using virtualenvwrapper:

$ mkvirtualenv tron --distribute --no-site-packages
$ pip install -r dev/req_dev.txt

req_dev.txt contains a list of packages required for development, including: Testify to run the tests and Sphinx to build the documentation.

Coding Standards

All code should be PEP8 compliant, and should pass pyflakes without warnings. All new code should include full test coverage, and bug fixes should include a test which reproduces the reported issue.

This documentation must also be kept up to date with any changes in functionality.

Running Tron in a Sandbox

The source package includes a development logging.conf and a sample configuration file with a few test cases. To run a development instance of Tron create a working directory and start trond using the following:

$ make dev

Running the Tests

Tron uses the Testify unit testing framework.

Run the tests using make tests or testify tests. If you’re using a virtualenv you may want to run python `which testify` test to have it use the correct environment.

This package also includes a .pyautotest file which can be used with https://github.com/dnephin/PyAutoTest to auto run tests when you save a file.

Contributing

There should be a github issue created prior to all pull requests. Pull requests should be made to the Yelp:development branch, and should include additions to CHANGES.txt which describe what has changed.

Indices and tables