Synopsis
/etc/guardcat.conf
Description
The default config file is /etc/guardcat.conf and is loaded if no other directives are given.
You can specify config directives on the command-line to the program. When command-line arguments are present, /etc/guardcat.conf is not automatically loaded. Thus, to specify an alternate configuration file, one might use something like this:
guardcat load /etc/guardcat/main.conf
Example configuration files are provided with the guardcat package. They can be found in the "cfexamples" directory (which may vary by distribution).
Using guardcat to control your fans may damage your hardware. There is NO WARRANTY or assumption of liability. Use it strictly AT YOUR OWN RISK.
Syntax
The format is a series of keywords and/or values, separated by whitespace. Keywords and values are cAsE-sensitive. Blank lines are ignored. Anything from a hash mark (pound sign) to end-of-line is ignored.
Multiple commands can be given on one line if separated by a semicolon (just like the shell). For the same reason, on the command-line, you will need to escape the semicolon to prevent the shell from interpreting it.
Both hash marks and semicolons must be separated from the rest of the command (if any) by at least one white space character. For example, this would be an error:
pollint 60# poll once a minute
You would need to do something like:
pollint 60 # poll once a minute
Likewise, this will not work:
alert 36000;reset 36003
General Options
If a general option occurs more than once, later occurrences will override previous ones. Boolean options default to off, and are enabled by their presence. Repeated booleans are simply redundant. There is no way to turn something back off — you have to omit it in the first place.
debug
Boolean. Emit gobs of debugging info, to stderr and/or syslog LOG_DEBUG. You can also send SIGUSR2 to toggle debugging on and off once running. When debugging is enabled, the results of SIGUSR1 include a config dump.
quiet
Boolean. Suppress sending warning and debug messages to stderr (they still go to syslog, as LOG_WARN or LOG_DEBUG, as appropriate). Intended for use with systemd, which captures stderr and sends it to the system log, resulting in duplicates (as guardcat also logs to syslog). Fatal errors always go to both — unless things are so broken that we cannot syslog (e.g., perl bombs out loading the script), in which that systemd capture is really useful.
load
Include another config file, as if it were typed here. The following example supposes inputs were broken out into a separate file.
load /etc/guardcat/inputs.conf
The special value "-" loads the default config file. Especially useful on the command line when adding one more parameter for testing. For example, the following command, given to the Unix shell, will load the normal config file, but with debugging enabled (note the backslash that escapes the semicolon):
guardcat debug \; load -
If you make a recursive loop of config file loads, guardcat will crash trying to endlessly load the same file(s), so don’t do that. Typical diagnostics for that case will be things like "deep recursion", "too many open files", and/or "out of memory".
america
Boolean. Convert temperatures to Fahrenheit. Any input declared as "temp" will be converted when it is read. Parameter values specified in the config file (min and max for inputs and alarms) will need to reflect that. Logs and alerts will use Fahrenheit when written.
pollint
Poll interval, in seconds. The default is 10. This is how often we read inputs/sensors and adjust outputs/fans. Example:
pollint 10
logint
Data log interval, in multiples of the poll interval. Default is 6. Note that data logging has nothing to do with syslog. Examples:
pollint 10 ; logint 6 # log data once per minute pollint 10 ; logint 3 # log data once every 30 seconds pollint 5 ; logint 12 # log data once per minute. logint 1 # log data with every poll cycle
You can also send SIGUSR1 to poll and log immediately. See also the "logint" option below, and the "log" directive further below.
alert
How often to alert when out-of-bounds, in seconds. The default is 36000 seconds (10 hours). Ten hours is somewhat arbitrary, but will at least walk around the clock (so warnings do not occur at the same time every day, perhaps being missed).
alert 36000
reset
How long before trouble is considered cleared, in seconds. The warn interval counter resets if things are good for this long. The default is 3600 (1 hour), again somewhat arbitrary.
reset 3600
logfile
File to log data (sensor readings) to. If no log file is specified, no data logging will be done (default). Send SIGHUP to re-open the log file (useful after log rotation). See also the "log" and "logint" directives (below and above, respectively). Events and alerts are always logged to syslog.
logfile /var/log/sensors
Log to /dev/stdout to have it log to your terminal (useful for testing).
logfile /dev/stdout
Email alarm reports to this address. Omit for no email reports (default). Requires the mail(1) utility to be available and working.
mail guardcat@example.com
wall
Boolean. Use wall(1) to write alert messages to all users and terminals. Default is disabled.
Specifying Paths
guardcat is intended to mainly work with the Linux kernel "hwmon" interface (which usually appears in the filesystem under /sys/class/hwmon/). Every input (temperature or fan speed sensor) and output (fan speed control) will correspond to a hwmon file path. These paths must be specified in the configuration you provide.
guardcat provides multiple ways to specify a hwmon path.
hwmlabel
You can specify the hwmon device name and channel label. This is the best approach when labels are available, as it is clearer to read, more obvious what the config directive does, and will withstand changes in device ordering (as can happen when the kernel changes). However, labels are not always provided. The combination of device and channel name/label must also be unique.
input CPUin - - hwmlabel nct6798 CPUTIN
hwmfile
You can specify the hwmon device name, and a file name under that device. This is the usually the best choice for fan-related sensors. This is also the only type (besides "path") available for output, init, and exit declarations.
input CPUin - - hwmfile nct6798 temp2_input
path
You can specify the path directly. This works even when names are ambiguous (duplicated), or if guardcat cannot find labels/names/etc. It can also work for things which are not hwmon nodes — even regular files, although that may be of limited usefulness.
input CPUin - - path /sys/class/hwmon/hwmon0/temp2_input
wwid
If the "drivetemp" kernel module is loaded (and your disk drives are compatible), you can specify the WWID (world-wide identifier, also called a WWN, world-wide name) of the drive, and guardcat will find it. You must specify the WWID as hwmon presents it, which might not match the printed label of the drive.
input SSD - - temp wwid naa.5002536e42630f9d
Discovering paths
A quick way to see the available hwmon devices is to run the shell command:
grep . /sys/class/hwmon/hwmon*/name
Likewise, to see the labeled inputs:
grep . /sys/class/hwmon/hwmon*/*_label
To see the drives and WWIDs that hwmon sees:
grep . /sys/class/hwmon/hwmon*/device/{model,wwid}
(In all these examples, I am (ab)using grep(1) simply to put the path names in front of the file contents.)
Startup Values
It may be useful to have guardcat write fixed values to hwmon nodes during program startup. They can be specified with "hwmfile" or "path".
For example, some mainboards require the fan control mode be set to "manual control" before they will honor the PWM output setting. This is done by writing "1" to the appropriate hwmon pwm*_enable node. Examples:
init 1 path /sys/class/hwmon/hwmon0/pwm1_enable init 1 hwmfile nct6798 pwm2_enable
Values are written in the order defined. An error writing an "init" value will abort the program.
Shutdown Values
If guardcat is being used for fan control, you will almost certainly want it to do some kind of "restore" action upon exit. Otherwise, fans may stay at a slow speed as things heat up, and with guardcat not controlling them any longer, the system may overheat. Examples:
# On exit, set fan #1 to maximum speed exit 255 hwmfile nct6798 pwm1
# On exit, set fan #1 to "Smart Fan" mode exit 5 hwmfile nct6798 pwm1_enable
# On exit, set fan #1 to "control disabled" (max speed) exit 0 hwmfile nct6798 pwm1_enable
Values are written in the order defined. Errors result in warnings, but the program will continue trying the rest. guardcat normally tries to write these values even if it is aborting/crashing. The exception is if no init or output values have been written yet — the theory being, if no changes have been made, nothing needs to be put back.
Input Channels
Input channels are intended for sensors, such as temperature or fan speed. These are sometimes called "direct inputs", to differentiate them from aggregate inputs (see below). An input declaration consists of the following:
input <name> <lo> <hi> [options] <type> <arguments>
The name is arbitrary, but must be unique. It may not contain spaces.
The lo and hi fields define the range of the sensor values expected during normal operation. They are used to compute a normalized value for each input, which in turn is used to adjust outputs proportionally. If an input is not being used to control an output (i.e., it is used only for monitoring/logging), the lo and/or hi fields may be given as "-".
Zero or more options may be specified. Options and any arguments are all separated by spaces. Options end when a recognized input type is seen. See INPUT OPTIONS below for available options.
The type specifies the mechanism used to obtain the input value. Most types specify a (virtual) file to read from, typically a hwmon node. See the section SPECIFYING PATHS for details. All the types given in that section work for inputs.
The other available input type is "prog", which specifies a program to run. The program is expected to write the value to stdout (followed by a newline), and then exit. See SECURITY in guardcat(1) for implications. Since the program must be (re)started with each read, there is a performance cost; perhaps significant. See the related "slow" option below. guardcat will wait for the program to produce output. If the program hangs, so will guardcat. That could be bad if guardcat is providing fan control.
Some examples:
# name lo hi options type arguments input CPU 30 65 temp hwmlabel k10temp Tdie input HDD 20 35 temp wwid naa.5002538d42660d9e input FAN - - hwmfile nct6798 fan1_input input ROOM - - slow round prog /usr/local//bin/tempered -n
Aggregate Inputs
Inputs can be combined into an aggregate input channel. An aggregate input is the same as a direct input for purposes of monitoring, logging, and control, but is based on the values of other inputs, rather than an external file or program.
An aggregate declaration consists of the following:
agg <name> [options] <type> <constituents> ...
The names for aggregate inputs follow the name rules, and use the same namespace, as direct inputs.
The only option currently supported for "agg" is the "round" option. See INPUT OPTIONS for usage.
Three types of aggregate inputs are currently supported, with the keywords "min", "max", and "avg". They correspond to minimum, maximum, and average (arithmetic mean), respectively. The value of the aggregate channel will become the lowest, highest, or average value of the specified constituent inputs.
Following the type is a list of one or more constituent inputs. Any such inputs must have been previously defined (e.g., come earlier in the config file). The constituent inputs to an aggregate can include other aggregate inputs.
Examples:
agg MIN round min CPU1 CPU2 SSDa agg MAX round max CPU2 CPU1 SSDa agg AVG round avg CPU1 CPU2 SSDa
Input Options
Options may be specified for "input" or "agg" (aggregate input) channels.
slow
Flags the input as a slow input. Slow inputs are only read when the poll cycle is one which will write to the data log (see "logint" in GENERAL OPTIONS). The envisioned use case is: Have guardcat provide fan control based on hwmon inputs on a fast poll cycle, while also logging an external program on a less-frequent basis.
mult
Specifies a multiplier, a value applied to the input value before further processing. Intended for scaling. The "mult" keyword is followed by the number to multiply by (which need not be an integer).
offset
Specifies an offset, a value added to the input value, after the above, but before any further processing. The "offset" keyword is followed by the number to add (which need not be an integer, nor positive).
round
Rounds the input value to the nearest tenth (one decimal place past zero), after the above, but before display or further processing.
temp
Flags the input channel as a temperature value. Implicitly applies the options "mult 0.001" and "round", both of which will generally be appropriate for hwmon temperature values. Additionally, if the "america" general option is in effect, the input is assumed to be in Celsius, and will be converted to Fahrenheit.
Data Log Channels
Once input and/or aggregate channels are defined, you can specify which ones (if any) you want logged to the data log. See also "logint" and "logfile" in GENERAL OPTIONS. By default, no data logging is done. Example:
log CPU1 CPU2 SSD AVG
Alarm Limits
An alarm declaration consists of the following:
alarm <name> <lo> <hi>
The name must match an already-defined input/aggregate channel name. The lo and hi values must be present, but either may be "-". If the actual value for the corresponding channel goes below lo, or above hi, it is considered out-of-bounds (faulted). If a field is given as "-" then that end of the range is not checked.
If one or more sensors becomes faulted, an alert is raised. The alert will be repeated on the interval given by the "alert" option. If all sensors return back in-bounds for the "reset" interval, the alarm is considered cleared.
Raising an alert consists of one or more of:
-
Logging to syslog at priority LOG_ALERT
-
Sending an email to the address specified by the "mail" option
-
Sending a message to all logged-in users, per the "wall" option
Examples:
# name lo hi alarm ROOM 10 30 # Alert if it gets too warm, or really cold alarm CPU - 150 # Just alert on over-temperature for CPUs alarm FANS 200 - # Alert on low speed for fans
Output Channels
Output channels are provided for fan control. An output declaration consists of the following:
output <input> <lo> <hi> [options] <type> <arguments>
The first field is the name of an already-defined input or aggregate channel, which will control the output. Unlike "input", "agg", and "alarm" declarations, an output with the same input channel name may be repeated. This allows multiple outputs to be controlled by the same input.
The lo and hi fields define the range over which the output varies, in proportion to the input range. Typically, a hwmon fan control channel uses a range from 0 (zero, stopped) to 255. However, the actual fans will typically cease turning well above the 0 point, so some experimentation will be required to determine the proper values for your hardware.
The only option currently supported for "output" is the "floor" option. It specifies a minimum value for the output, independent of the lo/hi range. Again, this may be useful for fans, which have some minimum threshold.
Two types are supported for outputs: "hwmfile" and "path". See the section SPECIFYING PATHS for usage of these.
Examples:
# in lo hi option type data output CPU 30 255 hwmfile nct6798 pwm1 output ALL 30 255 hwmfile nct6798 pwm2 # top output ALL 30 255 floor 55 hwmfile nct6798 pwm3 # bottom
See Also
guardcat(1), sensors(1), pwmconfig(1)
In particular, see the WARNINGS and SECURITY sections of guardcat(1) for important safety tips.
License
This manual page and accompnaing software is licensed under the terms of the Unlicense. For the full terms, see the accompaning UNLICENSE file, or the http://unlicense.org/ website. To summarize, you can do anything you want with it, and there is NO WARRANTY, nor any assumptuion of liability. Any and all use is strictly AT YOUR OWN RISK.