Viewing file: theory.html (65.15 KB) -rw-r--r-- Select action/file-type: (+) | (+) | (+) | Code (+) | Session (+) | (+) | SDB (+) | (+) | (+) | (+) | (+) | (+) |Theory and pragmatics of the tz code and data
Theory and pragmatics of the tz code and data
Outline
Scope of the tz database
The tz
database attempts to record the history and predicted future of
civil time scales.
It organizes time zone and daylight saving time
data by partitioning the world into timezones
whose clocks all agree about timestamps that occur after the POSIX Epoch
(1970-01-01 00:00:00 UTC).
Although 1970 is a somewhat-arbitrary cutoff, there are significant
challenges to moving the cutoff earlier even by a decade or two, due
to the wide variety of local practices before computer timekeeping
became prevalent.
Most timezones correspond to a notable location and the database
records all known clock transitions for that location;
some timezones correspond instead to a fixed UTC offset.
Each timezone typically corresponds to a geographical region that is
smaller than a traditional time zone, because clocks in a timezone
all agree after 1970 whereas a traditional time zone merely
specifies current standard time. For example, applications that deal
with current and future timestamps in the traditional North
American mountain time zone can choose from the timezones
America/Denver which observes US-style daylight saving
time (DST),
and America/Phoenix which does not observe DST.
Applications that also deal with past timestamps in the mountain time
zone can choose from over a dozen timezones, such as
America/Boise, America/Edmonton, and
America/Hermosillo, each of which currently uses mountain
time but differs from other timezones for some timestamps after 1970.
Clock transitions before 1970 are recorded for location-based timezones,
because most systems support timestamps before 1970 and could
misbehave if data entries were omitted for pre-1970 transitions.
However, the database is not designed for and does not suffice for
applications requiring accurate handling of all past times everywhere,
as it would take far too much effort and guesswork to record all
details of pre-1970 civil timekeeping.
Although some information outside the scope of the database is
collected in a file backzone that is distributed along
with the database proper, this file is less reliable and does not
necessarily follow database guidelines.
As described below, reference source code for using the
tz database is also available.
The tz code is upwards compatible with POSIX, an international
standard for UNIX-like systems.
As of this writing, the current edition of POSIX is POSIX.1-2024,
which has been published but not yet in HTML form.
Unlike its predecessor POSIX.1-2017 ( The Open
Group Base Specifications Issue 7, IEEE Std 1003.1-2017, 2018
Edition), POSIX.1-2024 requires support for the
tz database, which has a
model for describing civil time that is more complex than the
standard and daylight saving times required by POSIX.1-2017.
A tz timezone corresponds to a ruleset that can
have more than two changes per year, these changes need not merely
flip back and forth between two alternatives, and the rules themselves
can change at times.
Whether and when a timezone changes its clock,
and even the timezone's notional base offset from UTC,
are variable.
It does not always make sense to talk about a timezone's
"base offset", which is not necessarily a single number.
Timezone identifiers
Each timezone has a name that uniquely identifies the timezone.
Inexperienced users are not expected to select these names unaided.
Distributors should provide documentation and/or a simple selection
interface that explains each name via a map or via descriptive text like
"Czech Republic" instead of the timezone name "Europe/Prague".
If geolocation information is available, a selection interface can
locate the user on a timezone map or prioritize names that are
geographically close. For an example selection interface, see the
tzselect program in the tz code.
Unicode's Common Locale Data
Repository (CLDR)
contains data that may be useful for other selection
interfaces; it maps timezone names like Europe/Prague to
locale-dependent strings like "Prague", "Praha", "Прага", and "布拉格".
The naming conventions attempt to strike a balance
among the following goals:
Uniquely identify every timezone where clocks have agreed since 1970.
This is essential for the intended use: static clocks keeping local
civil time.
Indicate to experts where the timezone's clocks typically are.
Be robust in the presence of political changes.
For example, names are typically not tied to countries, to avoid
incompatibilities when countries change their name (e.g.,
Swaziland→Eswatini) or when locations change countries (e.g., Hong
Kong from UK colony to China).
There is no requirement that every country or national
capital must have a timezone name.
Be portable to a wide variety of implementations.
Use a consistent naming conventions over the entire world.
Names normally have the format
AREA/LOCATION, where
AREA is a continent or ocean, and
LOCATION is a specific location within the area.
North and South America share the same area, 'America'.
Typical names are 'Africa/Cairo',
'America/New_York', and 'Pacific/Honolulu'.
Some names are further qualified to help avoid confusion; for example,
'America/Indiana/Petersburg' distinguishes Petersburg,
Indiana from other Petersburgs in America.
Here are the general guidelines used for
choosing timezone names,
in decreasing order of importance:
Use only valid POSIX file name components (i.e., the parts of
names other than '/').
Do not use the file name components '.' and
'..'.
Within a file name component, use only ASCII letters,
'.', '-' and '_'.
Do not use digits, as that might create an ambiguity with POSIX's proleptic
TZ strings.
A file name component must not exceed 14 characters or start with
'-'.
E.g., prefer America/Noronha to
America/Fernando_de_Noronha.
Exceptions: see the discussion of legacy names below.
A name must not be empty, or contain '//', or
start or end with '/'.
Also, a name must not be 'Etc/Unknown', as
CLDR uses that string for an unknown or invalid timezone.
Do not use names that differ only in case.
Although the reference implementation is case-sensitive, some
other implementations are not, and they would mishandle names
differing only in case.
If one name A is an initial prefix of another
name AB (ignoring case), then B must not
start with '/', as a regular file cannot have the
same name as a directory in POSIX.
For example, America/New_York precludes
America/New_York/Bronx.
Uninhabited regions like the North Pole and Bouvet Island
do not need locations, since local time is not defined there.
If all clocks in a region have agreed since 1970,
give them just one name even if some of the clocks disagreed before 1970,
or reside in different countries or in notable or faraway locations.
Otherwise these tables would become annoyingly large.
For example, do not create a name Indian/Crozet
as a near-duplicate or alias of Asia/Dubai
merely because they are different countries or territories,
or their clocks disagreed before 1970, or the
Crozet Islands
are notable in their own right,
or the Crozet Islands are not adjacent to other locations
that use Asia/Dubai.
If boundaries between regions are fluid, such as during a war or
insurrection, do not bother to create a new timezone merely
because of yet another boundary change. This helps prevent table
bloat and simplifies maintenance.
If a name is ambiguous, use a less ambiguous alternative;
e.g., many cities are named San José and Georgetown, so
prefer America/Costa_Rica to
America/San_Jose and America/Guyana
to America/Georgetown.
Keep locations compact.
Use cities or small islands, not countries or regions, so that any
future changes do not split individual locations into different
timezones.
E.g., prefer Europe/Paris to Europe/France,
since
France
has had multiple time zones.
Use mainstream English spelling, e.g., prefer
Europe/Rome to Europa/Roma, and
prefer Europe/Athens to the Greek
Ευρώπη/Αθήνα or the Romanized
Evrópi/Athína.
The POSIX file name restrictions encourage this guideline.
Use the most populous among locations in a region,
e.g., prefer Asia/Shanghai to
Asia/Beijing.
Among locations with similar populations, pick the best-known
location, e.g., prefer Europe/Rome to
Europe/Milan.
Use the singular form, e.g., prefer Atlantic/Canary to
Atlantic/Canaries.
Omit common suffixes like '_Islands' and
'_City', unless that would lead to ambiguity.
E.g., prefer America/Cayman to
America/Cayman_Islands and
America/Guatemala to
America/Guatemala_City, but prefer
America/Mexico_City to
America/Mexico
because the
country of Mexico has several time zones.
Use '_' to represent a space.
Omit '.' from abbreviations in names.
E.g., prefer Atlantic/St_Helena to
Atlantic/St._Helena.
Do not change established names if they only marginally violate
the above guidelines.
For example, do not change the existing name Europe/Rome to
Europe/Milan merely because Milan's population has grown
to be somewhat greater than Rome's.
If a name is changed, put its old spelling in the
'backward' file as a link to the new spelling.
This means old spellings will continue to work.
Ordinarily a name change should occur only in the rare case when
a location's consensus English-language spelling changes; for example,
in 2008 Asia/Calcutta was renamed to Asia/Kolkata
due to long-time widespread use of the new city name instead of the old.
Guidelines have evolved with time, and names following old versions of
these guidelines might not follow the current version. When guidelines
have changed, old names continue to be supported. Guideline changes
have included the following:
Older versions of this package used a different naming scheme.
See the file 'backward' for most of these older names
(e.g., 'US/Eastern' instead of 'America/New_York').
The other old-fashioned names still supported are
'WET', 'CET', 'MET', and
'EET' (see the file 'europe').
Older versions of this package defined legacy names that are
incompatible with the first guideline of location names, but which are
still supported.
These legacy names are mostly defined in the file
'etcetera'.
Also, the file 'backward' defines the legacy names
'Etc/GMT0', 'Etc/GMT-0', 'Etc/GMT+0',
'GMT0', 'GMT-0' and 'GMT+0',
and the file 'northamerica' defines the legacy names
'EST5EDT', 'CST6CDT',
'MST7MDT', and 'PST8PDT'.
Older versions of these guidelines said that
there should typically be at least one name for each ISO
3166-1 officially assigned two-letter code for an inhabited
country or territory.
This old guideline has been dropped, as it was not needed to handle
timestamps correctly and it increased maintenance burden.
The file zone1970.tab lists geographical locations used
to name timezones.
It is intended to be an exhaustive list of names for geographic
regions as described above; this is a subset of the timezones in the data.
Although a zone1970.tab location's
longitude
corresponds to
its local mean
time (LMT) offset with one hour for every 15°
east longitude, this relationship is not exact.
The backward-compatibility file zone.tab is similar
but conforms to the older-version guidelines related to ISO 3166-1;
it lists only one country code per entry and unlike zone1970.tab
it can list names defined in backward.
Applications that process only timestamps from now on can instead use the file
zonenow.tab, which partitions the world more coarsely,
into regions where clocks agree now and in the predicted future;
this file is smaller and simpler than zone1970.tab
and zone.tab.
The database defines each timezone name to be a zone, or a link to a zone.
The source file backward defines links for backward
compatibility; it does not define zones.
Although backward was originally designed to be optional,
nowadays distributions typically use it
and no great weight should be attached to whether a link
is defined in backward or in some other file.
The source file etcetera defines names that may be useful
on platforms that do not support proleptic TZ strings
like <+08>-8;
no other source file other than backward
contains links to its zones.
One of etcetera's names is Etc/UTC,
used by functions like gmtime to obtain leap
second information on platforms that support leap seconds.
Another etcetera name, GMT,
is used by older code releases.
Time zone abbreviations
When this package is installed, it generates time zone abbreviations
like 'EST' to be compatible with human tradition and POSIX.
Here are the general guidelines used for choosing time zone abbreviations,
in decreasing order of importance:
Use three to six characters that are ASCII alphanumerics or
'+' or '-'.
Previous editions of this database also used characters like
space and '?', but these characters have a
special meaning to the
UNIX shell
and cause commands like
'set
`date`'
to have unexpected effects.
Previous editions of this guideline required upper-case letters, but the
Congressman who introduced
Chamorro
Standard Time preferred "ChST", so lower-case letters are now
allowed.
Also, POSIX from 2001 on relaxed the rule to allow '-',
'+', and alphanumeric characters from the portable
character set in the current locale.
In practice ASCII alphanumerics and '+' and
'-' are safe in all locales.
In other words, in the C locale the POSIX extended regular
expression [-+[:alnum:]]{3,6} should match the
abbreviation.
This guarantees that all abbreviations could have been specified
explicitly by a POSIX proleptic TZ string.
Use abbreviations that are in common use among English-speakers,
e.g., 'EST' for Eastern Standard Time in North America.
We assume that applications translate them to other languages
as part of the normal localization process; for example,
a French application might translate 'EST' to 'HNE'.
These abbreviations (for standard/daylight/etc. time) are:
ACST/ACDT Australian Central,
AST/ADT/APT/AWT/ADDT Atlantic,
AEST/AEDT Australian Eastern,
AHST/AHDT Alaska-Hawaii,
AKST/AKDT Alaska,
AWST/AWDT Australian Western,
BST/BDT Bering,
CAT/CAST Central Africa,
CET/CEST/CEMT Central European,
ChST Chamorro,
CST/CDT/CWT/CPT Central [North America],
CST/CDT China,
GMT/BST/IST/BDST Greenwich,
EAT East Africa,
EST/EDT/EWT/EPT Eastern [North America],
EET/EEST Eastern European,
GST/GDT Guam,
HST/HDT/HWT/HPT Hawaii,
HKT/HKST/HKWT Hong Kong,
IST India,
IST/GMT Irish,
IST/IDT/IDDT Israel,
JST/JDT Japan,
KST/KDT Korea,
MET/MEST Middle European (a backward-compatibility alias for
Central European),
MSK/MSD Moscow,
MST/MDT/MWT/MPT Mountain,
NST/NDT/NWT/NPT/NDDT Newfoundland,
NST/NDT/NWT/NPT Nome,
NZMT/NZST New Zealand through 1945,
NZST/NZDT New Zealand 1946–present,
PKT/PKST Pakistan,
PST/PDT/PWT/PPT Pacific,
PST/PDT Philippine,
SAST South Africa,
SST Samoa,
UTC Universal,
WAT/WAST West Africa,
WET/WEST/WEMT Western European,
WIB Waktu Indonesia Barat,
WIT Waktu Indonesia Timur,
WITA Waktu Indonesia Tengah,
YST/YDT/YWT/YPT/YDDT Yukon.
For times taken from a city's longitude, use the
traditional xMT notation.
The only abbreviation like this in current use is 'GMT'.
The others are for timestamps before 1960,
except that Monrovia Mean Time persisted until 1972.
Typically, numeric abbreviations (e.g., '-004430' for
MMT) would cause trouble here, as the numeric strings would exceed
the POSIX length limit.
A few abbreviations also follow the pattern that
GMT/BST established for time in the UK.
They are:
BMT/BST for Bermuda 1890–1930,
CMT/BST for Calamarca Mean Time and Bolivian Summer Time
1890–1932,
DMT/IST for Dublin/Dunsink Mean Time and Irish Summer Time
1880–1916,
MMT/MST/MDST for Moscow 1880–1919, and
RMT/LST for Riga Mean Time and Latvian Summer time 1880–1926.
Use 'LMT' for local mean time of locations before the
introduction of standard time; see "Scope of the
tz database".
If there is no common English abbreviation, use numeric offsets like
-05 and +0530 that are generated
by zic's %z notation.
Use current abbreviations for older timestamps to avoid confusion.
For example, in 1910 a common English abbreviation for time
in central Europe was 'MEZ' (short for both "Middle European
Zone" and for "Mitteleuropäische Zeit" in German).
Nowadays 'CET' ("Central European Time") is more common in
English, and the database uses 'CET' even for circa-1910
timestamps as this is less confusing for modern users and avoids
the need for determining when 'CET' supplanted 'MEZ' in common
usage.
Use a consistent style in a timezone's history.
For example, if a history tends to use numeric
abbreviations and a particular entry could go either way, use a
numeric abbreviation.
Use
Universal Time
(UT) (with time zone abbreviation '-00') for
locations while uninhabited.
The leading '-' is a flag that the UT offset is in
some sense undefined; this notation is derived
from Internet
RFC 3339.
(The abbreviation 'Z' that
Internet
RFC 9557 uses for this concept
would violate the POSIX requirement
of at least three characters in an abbreviation.)
Application writers should note that these abbreviations are ambiguous
in practice: e.g., 'CST' means one thing in China and something else
in North America, and 'IST' can refer to time in India, Ireland or
Israel.
To avoid ambiguity, use numeric UT offsets like
'-0600' instead of time zone abbreviations like 'CST'.
Accuracy of the tz database
The tz database is not authoritative, and it
surely has errors.
Corrections are welcome and encouraged; see the file CONTRIBUTING.
Users requiring authoritative data should consult national standards
bodies and the references cited in the database's comments.
Errors in the tz database arise from many sources:
The tz database predicts future
timestamps, and current predictions
will be incorrect after future governments change the rules.
For example, if today someone schedules a meeting for 13:00 next
October 1, Casablanca time, and tomorrow Morocco changes its
daylight saving rules, software can mess up after the rule change
if it blithely relies on conversions made before the change.
The pre-1970 entries in this database cover only a tiny sliver of how
clocks actually behaved; the vast majority of the necessary
information was lost or never recorded.
Thousands more timezones would be needed if
the tz database's scope were extended to
cover even just the known or guessed history of standard time; for
example, the current single entry for France would need to split
into dozens of entries, perhaps hundreds.
And in most of the world even this approach would be misleading
due to widespread disagreement or indifference about what times
should be observed.
In her 2015 book
The
Global Transformation of Time, 1870–1950,
Vanessa Ogle writes
"Outside of Europe and North America there was no system of time
zones at all, often not even a stable landscape of mean times,
prior to the middle decades of the twentieth century".
See: Timothy Shenk, Booked:
A Global History of Time. Dissent 2015-12-17.
Most of the pre-1970 data entries come from unreliable sources, often
astrology books that lack citations and whose compilers evidently
invented entries when the true facts were unknown, without
reporting which entries were known and which were invented.
These books often contradict each other or give implausible entries,
and on the rare occasions when they are checked they are
typically found to be incorrect.
For the UK the tz database relies on
years of first-class work done by
Joseph Myers and others; see
"History of
legal time in Britain".
Other countries are not done nearly as well.
Sometimes, different people in the same city maintain clocks
that differ significantly.
Historically, railway time was used by railroad companies (which
did not always
agree with each other), church-clock time was used for birth
certificates, etc.
More recently, competing political groups might disagree about
clock settings. Often this is merely common practice, but
sometimes it is set by law.
For example, from 1891 to 1911 the UT offset in France
was legally UT +00:09:21 outside train stations and
UT +00:04:21 inside. Other examples include
Chillicothe in 1920, Palm Springs in 1946/7, and Jerusalem and
Ürümqi to this day.
Although a named location in the tz
database stands for the containing region, its pre-1970 data
entries are often accurate for only a small subset of that region.
For example, Europe/London stands for the United
Kingdom, but its pre-1847 times are valid only for locations that
have London's exact meridian, and its 1847 transition
to GMT is known to be valid only for the L&NW and
the Caledonian railways.
The tz database does not record the
earliest time for which a timezone's
data entries are thereafter valid for every location in the region.
For example, Europe/London is valid for all locations
in its region after GMT was made the standard time,
but the date of standardization (1880-08-02) is not in the
tz database, other than in commentary.
For many timezones the earliest time of
validity is unknown.
The tz database does not record a
region's boundaries, and in many cases the boundaries are not known.
For example, the timezone
America/Kentucky/Louisville represents a region
around the city of Louisville, the boundaries of which are
unclear.
Changes that are modeled as instantaneous transitions in the
tz
database were often spread out over hours, days, or even decades.
Even if the time is specified by law, locations sometimes
deliberately flout the law.
Early timekeeping practices, even assuming perfect clocks, were
often not specified to the accuracy that the
tz database requires.
The tz database cannot represent stopped clocks.
However, on 1911-03-11 at 00:00, some public-facing French clocks
were changed by stopping them for a few minutes to effect a transition.
The tz database models this via a
backward transition; the relevant French legislation does not
specify exactly how the transition was to occur.
Sometimes historical timekeeping was specified more precisely
than what the tz code can handle.
For example, from 1880 to 1916 clocks in Ireland observed Dublin Mean
Time (estimated to be UT
−00:25:21.1); although the tz
source data can represent the .1 second, TZif files and the code cannot.
In practice these old specifications were rarely if ever
implemented to subsecond precision.
Even when all the timestamp transitions recorded by the
tz database are correct, the
tz rules that generate them may not
faithfully reflect the historical rules.
For example, from 1922 until World War II the UK moved clocks
forward the day following the third Saturday in April unless that
was Easter, in which case it moved clocks forward the previous
Sunday.
Because the tz database has no
way to specify Easter, these exceptional years are entered as
separate tz Rule lines, even though the
legal rules did not change.
When transitions are known but the historical rules behind them are not,
the database contains Zone and Rule
entries that are intended to represent only the generated
transitions, not any underlying historical rules; however, this
intent is recorded at best only in commentary.
The tz database models time
using the proleptic
Gregorian calendar with days containing 24 equal-length hours
numbered 00 through 23, except when clock transitions occur.
Pre-standard time is modeled as local mean time.
However, historically many people used other calendars and other timescales.
For example, the Roman Empire used
the Julian
calendar,
and Roman
timekeeping had twelve varying-length daytime hours with a
non-hour-based system at night.
And even today, some local practices diverge from the Gregorian
calendar with 24-hour days. These divergences range from
relatively minor, such as Japanese bars giving times like "24:30" for the
wee hours of the morning, to more-significant differences such as the
east African practice of starting the day at dawn, renumbering
the Western 06:00 to be 12:00. These practices are largely outside
the scope of the tz code and data, which
provide only limited support for date and time localization
such as that required by POSIX.
If DST is not used a different time zone
can often do the trick; for example, in Kenya a TZ setting
like <-03>3 or America/Cayenne starts
the day six hours later than Africa/Nairobi does.
Early clocks were less reliable, and data entries do not represent
clock error.
The tz database assumes Universal Time
(UT) as an origin, even though UT is not
standardized for older timestamps.
In the tz database commentary,
UT denotes a family of time standards that includes
Coordinated Universal Time (UTC) along with other
variants such as UT1 and GMT,
with days starting at midnight.
Although UT equals UTC for modern
timestamps, UTC was not defined until 1960, so
commentary uses the more general abbreviation UT for
timestamps that might predate 1960.
Since UT, UT1, etc. disagree slightly,
and since pre-1972 UTC seconds varied in length,
interpretation of older timestamps can be problematic when
subsecond accuracy is needed.
The relationship between POSIX time (that is, UTC but
ignoring leap
seconds) and UTC is not agreed upon.
This affects time stamps during the leap second era (1972–2035).
Although the POSIX
clock officially stops during an inserted leap second, at least one
proposed standard has it jumping back a second instead; and in
practice POSIX clocks more typically either progress glacially during
a leap second, or are slightly slowed while near a leap second.
The tz database does not represent how
uncertain its information is.
Ideally it would contain information about when data entries are
incomplete or dicey.
Partial temporal knowledge is a field of active research, though,
and it is not clear how to apply it here.
In short, many, perhaps most, of the tz
database's pre-1970 and future timestamps are either wrong or
misleading.
Any attempt to pass the
tz database off as the definition of time
should be unacceptable to anybody who cares about the facts.
In particular, the tz database's
LMT offsets should not be considered meaningful, and
should not prompt creation of timezones
merely because two locations
differ in LMT or transitioned to standard time at
different dates.
Time and date functions
The tz code contains time and date functions
that are upwards compatible with those of POSIX.
Code compatible with this package is already
part of many platforms, where the
primary use of this package is to update obsolete time-related files.
To do this, you may need to compile the time zone compiler
zic supplied with this package instead of using the
system zic, since the format of zic's
input is occasionally extended, and a platform may still be shipping
an older zic.
In POSIX, time display in a process is controlled by the
environment variable TZ, which can have two forms:
A proleptic TZ value
like CET-1CEST,M3.5.0,M10.5.0/3 uses a complex
notation that specifies a single standard time along with daylight
saving rules that apply to all years past, present, and future.
A geographical TZ value
like Europe/Berlin names a location that stands for
civil time near that location, which can have more than
one standard time and more than one set of daylight saving rules,
to record timekeeping practice more accurately.
These names are defined by the tz database.
POSIX.1-2017 properties and limitations
Some platforms support only the features required by POSIX.1-2017,
and have not yet upgraded to POSIX.1-2024.
Code intended to be portable to these platforms must deal
with problems that were fixed in later POSIX editions.
POSIX.1-2017 does not require support for geographical TZ,
and there is no convenient and efficient way to determine
the UT offset and time zone abbreviation of arbitrary
timestamps, particularly for timezones
that do not fit into the POSIX model.
The proleptic TZ string,
which is all that POSIX.1-2017 requires,
has a format that is hard to describe and is error-prone in practice.
Also, proleptic TZ strings cannot deal with daylight
saving time rules not based on the Gregorian calendar (as in
Morocco), or with situations where more than two time zone
abbreviations or UT offsets are used in an area.
A proleptic TZ string has the following format:
stdoffset[dst[offset][,date[/time],date[/time]]]
where:
std and dst
are 3 or more characters specifying the standard
and daylight saving time (DST) zone abbreviations.
Starting with POSIX.1-2001, std and dst
may also be in a quoted form like '<+09>';
this allows "+" and "-" in the names.
offset
is of the form
'[±]hh:[mm[:ss]]'
and specifies the offset west of UT.
'hh' may be a single digit;
0≤hh≤24.
The default DST offset is one hour ahead of
standard time.
date[/time],date[/time]
specifies the beginning and end of DST.
If this is absent, the system supplies its own ruleset
for DST, typically current US
DST rules.
time
takes the form
'hh:[mm[:ss]]'
and defaults to 02:00.
This is the same format as the offset, except that a
leading '+' or '-' is not allowed.
date
takes one of the following forms:
Jn (1≤n≤365)
origin-1 day number not counting February 29
n (0≤n≤365)
origin-0 day number counting February 29 if present
Mm.n.d
(0[Sunday]≤d≤6[Saturday], 1≤n≤5,
1≤m≤12)
for the dth day of week n of
month m of the year, where week 1 is the first
week in which day d appears, and
'5' stands for the last week in which
day d appears (which may be either the 4th or
5th week).
Typically, this is the only useful form; the n
and Jn forms are rarely used.
Here is an example proleptic TZ string for New
Zealand after 2007.
It says that standard time (NZST) is 12 hours ahead
of UT, and that daylight saving time
(NZDT) is observed from September's last Sunday at
02:00 until April's first Sunday at 03:00:
TZ='NZST-12NZDT,M9.5.0,M4.1.0/3'
This proleptic TZ string is hard to remember, and
mishandles some timestamps before 2008.
With this package you can use a geographical TZ instead:
TZ='Pacific/Auckland'
POSIX.1-2017 also has the limitations of POSIX.1-2024,
discussed in the next section.
POSIX.1-2024 properties and limitations
POSIX.1-2024 extends POSIX.1-2017 in the following significant ways:
POSIX.1-2024 requires support for geographical TZ.
Earlier POSIX editions require support only for proleptic TZ.
POSIX.1-2024 requires struct tm
to have a UT offset member tm_gmtoff
and a time zone abbreviation member tm_zone.
Earlier POSIX editions lack this requirement.
DST transition times can range from −167:59:59
to 167:59:59 instead of merely from 00:00:00 to 24:59:59.
This allows for proleptic TZ strings
like "<-02>2<-01>,M3.5.0/-1,M10.5.0/0"
where the transition time −1:00 means 23:00 the previous day.
However POSIX.1-2024, like earlier POSIX editions, has some limitations:
The TZ environment variable is process-global, which
makes it hard to write efficient, thread-safe applications that
need access to multiple timezones.
In POSIX, there is no tamper-proof way for a process to learn the
system's best idea of local (wall clock) time.
This is important for applications that an administrator wants
used only at certain times – without regard to whether the
user has fiddled the
TZ environment variable.
While an administrator can "do everything in UT" to
get around the problem, doing so is inconvenient and precludes
handling daylight saving time shifts – as might be required to
limit phone calls to off-peak hours.
POSIX requires that time_t clock counts exclude leap
seconds.
POSIX does not define the DST transitions
for TZ values like
"EST5EDT".
Traditionally the current US DST rules
were used to interpret such values, but this meant that the
US DST rules were compiled into each
time conversion package, and when
US time conversion rules changed (as in the United
States in 1987 and again in 2007), all packages that
interpreted TZ values had to be updated
to ensure proper results.
Extensions to POSIX in the
tz code
The tz code defines some properties
left unspecified by POSIX, and attempts to support some
extensions to POSIX.
The tz code attempts to support all the
time_t implementations allowed by POSIX.
The time_t type represents a nonnegative count of seconds
since 1970-01-01 00:00:00 UTC, ignoring leap seconds.
In practice, time_t is usually a signed 64- or 32-bit
integer; 32-bit signed time_t values stop working after
2038-01-19 03:14:07 UTC, so new implementations these
days typically use a signed 64-bit integer.
Unsigned 32-bit integers are used on one or two platforms, and 36-bit
and 40-bit integers are also used occasionally.
Although earlier POSIX versions allowed time_t to be a
floating-point type, this was not supported by any practical system,
and POSIX.1-2013+ and the tz code both
require time_t to be an integer type.
If the TZ environment variable uses the geographical format,
it is used in generating
the name of a file from which time-related information is read.
The file's format is TZif,
a timezone information format that contains binary data; see
Internet
RFC 9636.
The daylight saving time rules to be used for a
particular timezone are encoded in the
TZif file; the format of the file allows US,
Australian, and other rules to be encoded, and
allows for situations where more than two time zone
abbreviations are used.
When the tz code was developed in the 1980s,
it was recognized that allowing the TZ environment
variable to take on values such as 'America/New_York'
might cause "old" programs (that expect TZ to have a
certain format) to operate incorrectly; consideration was given to using
some other environment variable (for example, TIMEZONE)
to hold the string used to generate the TZif file's name.
In the end, however, it was decided to continue using
TZ: it is widely used for time zone purposes;
separately maintaining both TZ
and TIMEZONE seemed a nuisance; and systems where
"new" forms of TZ might cause problems can simply
use legacy TZ values such as "EST5EDT" which
can be used by "new" programs as well as by "old" programs that
assume pre-POSIX TZ values.
Functions tzalloc, tzfree,
localtime_rz, and mktime_z for
more-efficient thread-safe applications that need to use multiple
timezones.
The tzalloc and tzfree functions
allocate and free objects of type timezone_t,
and localtime_rz and mktime_z are
like localtime_r and mktime with an
extra timezone_t argument.
The functions were inspired by NetBSD.
Negative time_t values are supported, on systems
where time_t is signed.
These functions can account for leap seconds;
see Leap seconds below.
POSIX features no longer needed
POSIX and ISO C
define some APIs that are vestigial:
they are not needed, and are relics of a too-simple model that does
not suffice to handle many real-world timestamps.
Although the tz code supports these
vestigial APIs for backwards compatibility, they should
be avoided in portable applications.
The vestigial APIs are:
The POSIX tzname variable does not suffice and is no
longer needed.
It is planned to be removed in a future edition of POSIX.
To get a timestamp's time zone abbreviation, consult
the tm_zone member if available; otherwise,
use strftime's "%Z" conversion
specification.
The POSIX daylight and timezone
variables do not suffice and are no longer needed.
They are planned to be removed in a future edition of POSIX.
To get a timestamp's UT offset, consult
the tm_gmtoff member if available; otherwise,
subtract values returned by localtime
and gmtime using the rules of the Gregorian calendar,
or use strftime's "%z" conversion
specification if a string like "+0900" suffices.
The tm_isdst member is almost never needed and most of
its uses should be discouraged in favor of the abovementioned
APIs.
It was intended as an index into the tzname variable,
but as mentioned previously that usage is obsolete.
Although it can still be used in arguments to
mktime to disambiguate timestamps near
a DST transition when the clock jumps back on
platforms lacking tm_gmtoff, this
disambiguation works only for proleptic TZ strings;
it does not work in general for geographical timezones,
such as when a location changes to a time zone with a
lesser UT offset.
Other portability notes
The 7th Edition
UNIXtimezone function is not present in this
package; it is impossible to reliably map timezone's
arguments (a "minutes west of GMT" value and a
"daylight saving time in effect" flag) to a time zone
abbreviation, and we refuse to guess.
Programs that in the past used the timezone function
may now examine localtime(&clock)->tm_zone
(if TM_ZONE is defined) or
use strftime with a %Z conversion specification
to learn the correct time
zone abbreviation to use.
The 4.2BSDgettimeofday function is not
used in this package.
This formerly let users obtain the current UTC offset
and DST flag, but this functionality was removed in
later versions of BSD.
In SVR2, time conversion fails for near-minimum or
near-maximum time_t values when doing conversions
for places that do not use UT.
This package takes care to do these conversions correctly.
A comment in the source code tells how to get compatibly wrong
results.
The functions that are conditionally compiled
if STD_INSPIRED is nonzero should, at this point, be
looked on primarily as food for thought.
They are not in any sense "standard compatible" – some are
not, in fact, specified in any standard.
They do, however, represent responses of various authors to
standardization proposals.
Other time conversion proposals, in particular those supported by the
Time Zone
Database Parser, offer a wider selection of functions
that provide capabilities beyond those provided here.
The absence of such functions from this package is not meant to
discourage the development, standardization, or use of such
functions.
Rather, their absence reflects the decision to make this package
contain valid extensions to POSIX, to ensure its broad
acceptability.
If more powerful time conversion functions can be standardized, so
much the better.
Interface stability
The tz code and data supply the following interfaces:
The programs tzselect, zdump,
and zic, documented in their man pages.
The format of zic input files, documented in
the zic man page.
The format of zic output files, documented in
the tzfile man page.
The format of zone table files, documented in zone1970.tab.
The format of the country code file, documented in iso3166.tab.
The version number of the code and data, as the first line of
the text file 'version' in each release.
Interface changes in a release attempt to preserve compatibility with
recent releases.
For example, tz data files typically do not
rely on recently added zic features, so that users can
run older zic versions to process newer data files.
Downloading
the tz database describes how releases
are tagged and distributed.
Interfaces not listed above are less stable.
For example, users should not rely on particular UT
offsets or abbreviations for timestamps, as data entries are often
based on guesswork and these guesses may be corrected or improved.
Timezone boundaries are not part of the stable interface.
For example, even though the Asia/Bangkok timezone
currently includes Chang Mai, Hanoi, and Phnom Penh, this is not part
of the stable interface and the timezone can split at any time.
If a calendar application records a future event in some location other
than Bangkok by putting "Asia/Bangkok" in the event's record,
the application should be robust in the presence of timezone splits
between now and the future time.
Leap seconds
Leap seconds were introduced in 1972 to accommodate the
difference between atomic time and the less regular rotation of the earth.
Unfortunately they have caused so many problems with civil
timekeeping that there are
plans
to discontinue them by 2035.
Even if these plans come to fruition, a record of leap seconds will still be
needed to resolve timestamps from 1972 through 2035,
and there may also be a need to record whatever mechanism replaces them.
The tz code and data can account for leap seconds,
thanks to code contributed by Bradley White.
However, the leap second support of this package is rarely used directly
because POSIX requires leap seconds to be excluded and many
software packages would mishandle leap seconds if they were present.
Instead, leap seconds are more commonly handled by occasionally adjusting
the operating system kernel clock as described in
Precision timekeeping,
and this package by default installs a leapseconds file
commonly used by
NTP
software that adjusts the kernel clock.
However, kernel-clock twiddling approximates UTC only roughly,
and systems needing more precise UTC can use this package's leap
second support directly.
The directly supported mechanism assumes that time_t
counts of seconds since the POSIX epoch normally include leap seconds,
as opposed to POSIX time_t counts which exclude leap seconds.
This modified timescale is converted to UTC
at the same point that time zone and DST
adjustments are applied –
namely, at calls to localtime and analogous functions –
and the process is driven by leap second information
stored in alternate versions of the TZif files.
Because a leap second adjustment may be needed even
if no time zone correction is desired,
calls to gmtime-like functions
also need to consult a TZif file,
conventionally named Etc/UTC
(GMT in previous versions),
to see whether leap second corrections are needed.
To convert an application's time_t timestamps to or from
POSIX time_t timestamps (for use when, say,
embedding or interpreting timestamps in portable
tar
files),
the application can call the utility functions
time2posix and posix2time
included with this package.
If the POSIX-compatible TZif file set is installed
in a directory whose basename is zoneinfo, the
leap-second-aware file set is by default installed in a separate
directory zoneinfo-leaps.
Although each process can have its own time zone by setting
its TZ environment variable, there is no support for some
processes being leap-second aware while other processes are
POSIX-compatible; the leap-second choice is system-wide.
So if you configure your kernel to count leap seconds, you should also
discard zoneinfo and rename zoneinfo-leaps
to zoneinfo.
Alternatively, you can install just one set of TZif files
in the first place; see the REDO variable in this package's
makefile.
Calendrical issues
Calendrical issues are a bit out of scope for a time zone database,
but they indicate the sort of problems that we would run into if we
extended the time zone database further into the past.
An excellent resource in this area is Edward M. Reingold
and Nachum Dershowitz, Calendrical
Calculations: The Ultimate Edition, Cambridge University Press (2018).
Other information and sources are given in the file 'calendars'
in the tz distribution.
They sometimes disagree.
Time and time zones off Earth
The European Space Agency is considering
the establishment of a reference timescale for the Moon, which has
days roughly equivalent to 29.5 Earth days, and where relativistic
effects cause clocks to tick slightly faster than on Earth.
Also, NASA
has been ordered
to consider the establishment of Coordinated Lunar Time (LTC).
It is not yet known whether the US and European efforts will result in
multiple timescales on the Moon.
Some people's work schedules have used
Mars time.
Jet Propulsion Laboratory (JPL) coordinators kept Mars time on
and off during the
Mars
Pathfinder mission (1997).
Some of their family members also adapted to Mars time.
Dozens of special Mars watches were built for JPL workers who kept
Mars time during the
Mars
Exploration Rovers (MER) mission (2004–2018).
These timepieces looked like normal Seikos and Citizens but were adjusted
to use Mars seconds rather than terrestrial seconds, although
unfortunately the adjusted watches were unreliable and appear to have
had only limited use.
A Mars solar day is called a "sol" and has a mean period equal to
about 24 hours 39 minutes 35.244 seconds in terrestrial time.
It is divided into a conventional 24-hour clock, so each Mars second
equals about 1.02749125 terrestrial seconds.
(One MER worker noted, "If I am working Mars hours, and Mars hours are
2.5% more than Earth hours, shouldn't I get an extra 2.5% pay raise?")
The prime
meridian of Mars goes through the center of the crater
Airy-0, named in
honor of the British astronomer who built the Greenwich telescope that
defines Earth's prime meridian.
Mean solar time on the Mars prime meridian is
called Mars Coordinated Time (MTC).
Each landed mission on Mars has adopted a different reference for
solar timekeeping, so there is no real standard for Mars time zones.
For example, the MER mission defined two time zones "Local
Solar Time A" and "Local Solar Time B" for its two missions, each zone
designed so that its time equals local true solar time at
approximately the middle of the nominal mission.
The A and B zones differ enough so that an MER worker assigned to
the A zone might suffer "Mars lag" when switching to work in the B zone.
Such a "time zone" is not particularly suited for any application
other than the mission itself.
Many calendars have been proposed for Mars, but none have achieved
wide acceptance.
Astronomers often use Mars Sol Date (MSD) which is a
sequential count of Mars solar days elapsed since about 1873-12-29
12:00 GMT.
In our solar system, Mars is the planet with time and calendar most
like Earth's.
On other planets, Sun-based time and calendars would work quite
differently.
For example, although Mercury's
sidereal
rotation period is 58.646 Earth days, Mercury revolves around the
Sun so rapidly that an observer on Mercury's equator would see a
sunrise only every 175.97 Earth days, i.e., a Mercury year is 0.5 of a
Mercury day.
Venus is more complicated, partly because its rotation is slightly
retrograde:
its year is 1.92 of its days.
Gas giants like Jupiter are trickier still, as their polar and
equatorial regions rotate at different rates, so that the length of a
day depends on latitude.
This effect is most pronounced on Neptune, where the day is about 12
hours at the poles and 18 hours at the equator.
Although the tz database does not support
time on other planets, it is documented here in the hopes that support
will be added eventually.