.. structure.rst

.. _structure:

********************
The AstroData Object
********************

The |AstroData| object represents the data and metadata of a single file on
disk.  As of this version, |AstroData| has a default implementation supporting
the FITS file format. If you wish to extend |AstroData| to support other file
formats, see :ref:`astrodata`.

The internal structure of the |AstroData| object makes uses of astropy's
:class:`~astropy.nddata.NDData`, :mod:`~astropy.table`, and
:class:`~astropy.io.fits.Header`, the latter simply because it is a convenient
ordered dictionary.

Walkthrough
-----------

Global vs Extension-specific
============================

At the top level, the |AstroData| structure is divided in two types of
information.  In the first category, there is the information that applies to
the data globally, for example the information that would be stored in a FITS
Primary Header Unit, a table from a catalog that matches the RA and DEC of the
field, etc.  In the second category, there is the information specific to
individual science pixel extensions, for example the gain of the amplifier, the
data themselves, the error on those data, etc.

.. todo:: Turn the below code blocks into an example

The composition and amount of information depends on the contents of the file
itself. This information varies dramatically between observatories, so ensure
that you have characterized your data well. Accessing the contents of an
|AstroData| object is done through the :meth:`~astrodata.AstroData.info`
method.

.. testsetup::

    import os


    example_fits_file = os.path.dirname(__file__)
    example_fits_file = os.path.join(
        example_fits_file,
        "../../examples/data/example_mef_file.fits"
    )

.. code::python

    >>> import astrodata

    # You can find the example file in the examples/data directory.
    >>> ad = astrodata.from_file(example_fits_file)
    >>> ad.info()

    Filename: example_mef_file.fits
    Tags: MY_TAG1 MY_TAG2 MY_TAG3

    Pixels Extensions
    Index  Content                  Type              Dimensions     Format
    [ 0]   science                  NDAstroData       (2112, 256)    float32
            .variance             ndarray           (2112, 256)    float32
    [ 1]   science                  NDAstroData       (2112, 256)    float32
            .variance             ndarray           (2112, 256)    float32
    [ 2]   science                  NDAstroData       (2112, 256)    float32
            .variance             ndarray           (2112, 256)    float32
    [ 3]   science                  NDAstroData       (2112, 256)    float32
            .variance             ndarray           (2112, 256)    float32

    Other Extensions
                Type        Dimensions
    .REFERENCE  Table      (245, 16)

..
    Let us look at an example.  The :meth:`~astrodata.AstroData.info` method shows
    the content of the |AstroData| object and its organization, from the user's
    perspective.::

        >>> import astrodata
        >>> import gemini_instruments

        >>> ad = astrodata.open('../playdata/N20170609S0154_varAdded.fits')
        >>> ad.info()
        Filename: N20170609S0154_varAdded.fits
        Tags: ACQUISITION GEMINI GMOS IMAGE NORTH OVERSCAN_SUBTRACTED OVERSCAN_TRIMMED
            PREPARED SIDEREAL

        Pixels Extensions
        Index  Content                  Type              Dimensions     Format
        [ 0]   science                  NDAstroData       (2112, 256)    float32
                .variance             ndarray           (2112, 256)    float32
                .mask                 ndarray           (2112, 256)    uint16
                .OBJCAT               Table             (6, 43)        n/a
                .OBJMASK              ndarray           (2112, 256)    uint8
        [ 1]   science                  NDAstroData       (2112, 256)    float32
                .variance             ndarray           (2112, 256)    float32
                .mask                 ndarray           (2112, 256)    uint16
                .OBJCAT               Table             (8, 43)        n/a
                .OBJMASK              ndarray           (2112, 256)    uint8
        [ 2]   science                  NDAstroData       (2112, 256)    float32
                .variance             ndarray           (2112, 256)    float32
                .mask                 ndarray           (2112, 256)    uint16
                .OBJCAT               Table             (7, 43)        n/a
                .OBJMASK              ndarray           (2112, 256)    uint8
        [ 3]   science                  NDAstroData       (2112, 256)    float32
                .variance             ndarray           (2112, 256)    float32
                .mask                 ndarray           (2112, 256)    uint16
                .OBJCAT               Table             (5, 43)        n/a
                .OBJMASK              ndarray           (2112, 256)    uint8

        Other Extensions
                    Type        Dimensions
        .REFCAT        Table       (245, 16)


The "Pixel Extensions" contain the pixel data (in this case, something specific
to our data type).  Each extension is represented individually in a list
(0-indexed like all Python lists).  The science pixel data, its associated
metadata (extension header), and any other pixel or table extensions directly
associated with that science pixel data are stored in a |NDAstroData| object
which subclasses astropy's |NDData|. An |AstroData| extension is accessed like
any list: ``ad[0]`` will return the first image. To access the science pixels,
one uses ``ad[0].data``; for the object mask of the first extension,
``ad[0].OBJMASK``; etc.

.. todo:: incorporate this into the example
    In the example above, the "Other Extensions" at the bottom of the
    :meth:`~astrodata.AstroData.info` display contains a ``REFCAT`` table which in
    this case is a list of stars from a catalog that overlaps the field of view
    covered by the pixel data. The "Other Extensions" are global extensions. They
    are not attached to any pixel extension in particular. To access a global
    extension one simply uses the name of that extension: ``ad.REFCAT``.


Organization of Global Information
==================================

All the global information can be accessed as attributes of the |AstroData|
object.  The global headers, or Primary Header Unit (PHU), is stored in the
``phu`` attribute as an :class:`astropy.io.fits.Header`.

.. todo:: Put in a link to a good gemini example below where it says
    GEMINI_EXAMPLE

Any global tables are stored in the private attribute ``_tables``. For example,
if we had a ``REFCAT`` global table as part of our data (see example
:needs_replacement:`GEMINI_EXAMPLE` a Python dictionary with the name (eg.
"REFCAT") as the key.  All tables are stored as :class:`astropy.table.Table`.
Access to those table is done using the key directly as if it were a normal
attribute, eg.  ``ad.REFCAT``. Header information for the table, if read in
from a FITS table, is stored in the ``meta`` attribute of the
:class:`astropy.table.Table`, eg.  ``ad.REFCAT.meta['header']``. It is for
information only, it is not used.


Organization of the Extension-specific Information
==================================================

The pixel data are stored in the |AstroData| attribute ``nddata`` as a list
of |NDAstroData| object. The |NDAstroData| object is a subclass of astropy
|NDData| and it is fully compatible with any function expecting an |NDData| as
input.  The pixel extensions are accessible through slicing, eg. ``ad[0]`` or
even ``ad[0:2]``. A slice of an AstroData object is an AstroData object, and
all the global attributes are kept. For example::

    >>> ad[0].info()
    Filename: N20170609S0154_varAdded.fits
    Tags: ACQUISITION GEMINI GMOS IMAGE NORTH OVERSCAN_SUBTRACTED OVERSCAN_TRIMMED
        PREPARED SIDEREAL

    Pixels Extensions
    Index  Content                  Type              Dimensions     Format
    [ 0]   science                  NDAstroData       (2112, 256)    float32
              .variance             ndarray           (2112, 256)    float32
              .mask                 ndarray           (2112, 256)    uint16
              .OBJCAT               Table             (6, 43)        n/a
              .OBJMASK              ndarray           (2112, 256)    uint8

    Other Extensions
                   Type        Dimensions
    .REFCAT        Table       (245, 16)

Note how ``REFCAT`` is still present.

The science data is accessed as ``ad[0].data``, the variance as ``ad[0].variance``,
and the data quality plane as ``ad[0].mask``.   Those familiar with astropy
|NDData| will recognize the structure "data, error, mask", and will notice
some differences. First |AstroData| uses the variance for the error plane, not
the standard deviation. Another difference will be evident only when one looks
at the content of the mask. |NDData| masks contain booleans, |AstroData| masks
are ``uint16`` bit mask that contains information about the type of bad pixels
rather than just flagging them a bad or not. Since ``0`` is equivalent to
``False`` (good pixel), the |AstroData| mask is fully compatible with the
|NDData| mask.

Header information for the extension is stored in the |NDAstroData| ``meta``
attribute.  All table and pixel extensions directly associated with the
science extension are also stored in the ``meta`` attribute.

Technically, an extension header is located in ``ad.nddata[0].meta['header']``.
However, for obviously needed convenience, the normal way to access that header
is ``ad[0].hdr``.

Tables and pixel arrays associated with a science extension are
stored in ``ad.nddata[0].meta['other']`` as a dictionary keyed on the array
name, eg. ``OBJCAT``, ``OBJMASK``.   As it is for global tables, astropy tables
are used for extension tables.  The extension tables and extra pixel arrays are
accessed, like the global tables, by using the table name rather than the long
format, for example ``ad[0].OBJCAT`` and ``ad[0].OBJMASK``.

When reading a FITS Table, the header information is stored in the
``meta['header']`` of the table, eg. ``ad[0].OBJCAT.meta['header']``.  That
information is not used, it is simply a place to store what was read from disk.

The header of a pixel extension directly associated with the science extension
should match that of the science extension.  Therefore such headers are not
stored in |AstroData|. For example, the header of ``ad[0].OBJMASK`` is the
same as that of the science, ``ad[0].hdr``.

The world coordinate system (WCS) is stored internally in the ``wcs`` attribute
of the |NDAstroData| object. It is constructed from the header keywords when
the FITS file is read from disk, or directly from the ``WCS`` extension if
present (see :ref:`the next chapter <fitskeys>`). If the WCS is modified (for
example, by refining the pointing or attaching a more accurate wavelength
calibration), the FITS header keywords are not updated and therefore they should
never be used to determine the world coordinates of any pixel. These keywords are
only updated when the object is written to disk as a FITS file.  The WCS is
retrieved as follows: ``ad[0].wcs``.


.. todo:: Need to rephrase or replace the following subsection
    A Note on Memory Usage
    ======================

    When an file is opened, the headers are loaded into memory, but the pixels
    are not. The pixel data are loaded into memory only when they are first
    needed. This is not real "memory mapping", more of a delayed loading. This
    is useful when someone is only interested in the metadata, especially when
    the files are very large.