metacsv.io package

Submodules

metacsv.io.converters module

Utilities for converting between metacsv-compatible data formats

metacsv.io.converters.to_csv(container, fp, attrs=None, coords=None, variables=None, header_file=None, *args, **kwargs)[source]

Write a CSV, Series, DataFrame, Panel, DataArray, or Dataset to a metacsv-formatted csv

Note

If a DataFrame is passed, columns will be stacked and treated as coordinates. to_dataset is not implemented for Panel data.

Parameters:
  • container (object) – A pandas or metacsv Series, DataFrame, or Panel, an xarray DataArray or Dataset, or a filepath to a csv or netcdf file.
  • fp (str) – Path to which to write the metacsv-formatted CSV
  • attrs (dict) – Container attributes
  • coords (dict) – Container coordinates
  • variables (dict) – Variable-specific attributes
  • header_file (str_or_buffer) – A separate metacsv-formatted header file
  • *args

    Additional positional arguments passed to metacsv.read_csv if container is a filepath

  • **kwargs

    Additional keyword arguments passed to metacsv.read_csv if container is a filepath

Example

>>> np.random.seed(1)
>>> index = pd.MultiIndex.from_tuples(
...     [('X', 1), ('X', 2), ('Y', 1)],
...     names=['alpha', 'beta'])
...
>>> df = pd.DataFrame(
...     np.random.random((3,4)),
...     index=index,
...     columns=list('ABCD'))
...
>>> to_csv(
...     df,
...     fp='my-metacsv-data.csv',
...     attrs={'author': 'my name'},
...     coords=['alpha', 'beta'])
...

This metacsv-formatted CSV can be then used by metacsv or converted using any of the converters in this module:

... code-block:: python

>>> to_xarray('my-metacsv-data.csv')
<xarray.Dataset>
Dimensions:  (alpha: 2, beta: 2)
Coordinates:
  * alpha    (alpha) object 'X' 'Y'
  * beta     (beta) int64 1 2
Data variables:
    A        (alpha, beta) float64 0.417 0.1468 0.3968 nan
    B        (alpha, beta) float64 0.7203 0.09234 0.5388 nan
    C        (alpha, beta) float64 0.0001144 0.1863 0.4192 nan
    D        (alpha, beta) float64 0.3023 0.3456 0.6852 nan
Attributes:
    author: my name
metacsv.io.converters.to_dataarray(container, attrs=None, coords=None, variables=None, *args, **kwargs)[source]

Convert a CSV, Series, DataFrame, Panel, DataArray, or Dataset to an xarray.DataArray

Note

If a DataFrame is passed, columns will be stacked and treated as coordinates. to_dataset is not implemented for Panel data.

Parameters:
  • container (object) – A pandas or metacsv Series, DataFrame, or Panel, an xarray DataArray or Dataset, or a filepath to a csv or netcdf file.
  • attrs (dict) – Container attributes
  • coords (dict) – Container coordinates
  • variables (dict) – Variable-specific attributes
  • *args

    Additional positional arguments passed to metacsv.read_csv if container is a filepath

  • **kwargs

    Additional keyword arguments passed to metacsv.read_csv if container is a filepath

Example

>>> np.random.seed(1)
>>> to_dataarray(
...     pd.DataFrame(np.random.random((3,4)), index=list('ABC')),
...     attrs={'author': 'my name'})  
...
<xarray.DataArray (ind_0: 3, coldim_0: 4)>
array([[  4.17022005e-01,   7.20324493e-01,   1.14374817e-04,
          3.02332573e-01],
       [  1.46755891e-01,   9.23385948e-02,   1.86260211e-01,
          3.45560727e-01],
       [  3.96767474e-01,   5.38816734e-01,   4.19194514e-01,
          6.85219500e-01]])
Coordinates:
  * ind_0     (ind_0) object 'A' 'B' 'C'
  * coldim_0  (coldim_0) int64 0 1 2 3
Attributes:
    author: my name
metacsv.io.converters.to_dataset(container, attrs=None, coords=None, variables=None, *args, **kwargs)[source]

Convert a CSV, Series, DataFrame, Panel, DataArray, or Dataset to an xarray.Dataset

Note

If a Series is passed, the variable will be named ‘data’. to_dataset is not implemented for Panel data.

Parameters:
  • container (object) – A pandas or metacsv Series, DataFrame, or Panel, an xarray DataArray or Dataset, or a filepath to a csv or netcdf file.
  • attrs (dict) – Container attributes
  • coords (dict) – Container coordinates
  • variables (dict) – Variable-specific attributes
  • *args

    Additional positional arguments passed to metacsv.read_csv if container is a filepath

  • **kwargs

    Additional keyword arguments passed to metacsv.read_csv if container is a filepath

Example

>>> np.random.seed(1)
>>>
>>> to_dataset(
...     pd.DataFrame(np.random.random((3,4))),
...     attrs={'author': 'my name'})
...
<xarray.Dataset>
Dimensions:  (index: 3)
Coordinates:
  * index    (index) int64 0 1 2
Data variables:
    0        (index) float64 0.417 0.1468 0.3968
    1        (index) float64 0.7203 0.09234 0.5388
    2        (index) float64 0.0001144 0.1863 0.4192
    3        (index) float64 0.3023 0.3456 0.6852
Attributes:
    author: my name
metacsv.io.converters.to_header(fp, container=None, attrs=None, coords=None, variables=None, *args, **kwargs)[source]

Write metacsv attributes directly to a metacsv-formatted header file

Parameters:
  • fp (str) – Path to which to write the metacsv-formatted header file
  • container (object) – A metacsv Series, DataFrame, or Panel, or a metacsv-formatted csv file from which to derive attrs, coords, and variables (optional)
  • attrs (dict) – Attributes to write to header file (optional). If container is also supplied, these attrs will update the attrs dict on the provided container.
  • coords (dict) – Coordinates to write to header file (optional). If container is also supplied, these coords will update the coords dict on the provided container.
  • variables (dict) – Variable metadata to write to header file (optional). If container is also supplied, these variable metadata will update the variables dict on the provided container.
  • *args

    Additional positional arguments passed to metacsv.read_csv if container is a filepath

  • **kwargs

    Additional keyword arguments passed to metacsv.read_csv if container is a filepath

Example

>>> to_header('mycsv.header', attrs={'author': 'me'}, coords='index')
metacsv.io.converters.to_netcdf(container, fp, attrs=None, coords=None, variables=None, *args, **kwargs)[source]

Convert a CSV, Series, DataFrame, Panel, DataArray, or Dataset to a NetCDF file

Note

If a DataFrame is passed, columns will be stacked and treated as coordinates. to_dataset is not implemented for Panel data.

Parameters:
  • container (object) – A pandas or metacsv Series, DataFrame, or Panel, an xarray DataArray or Dataset, or a filepath to a csv or netcdf file.
  • attrs (dict) – Container attributes
  • coords (dict) – Container coordinates
  • variables (dict) – Variable-specific attributes
  • *args

    Additional positional arguments passed to metacsv.read_csv if container is a filepath

  • **kwargs

    Additional keyword arguments passed to metacsv.read_csv if container is a filepath

Example

>>> np.random.seed(1)
>>>
>>> to_netcdf(
...     pd.DataFrame(np.random.random((3,4)), columns=list('ABCD')),
...     'test.nc',
...     attrs={'author': 'my name'})
...
>>> xr.open_dataset('test.nc')
<xarray.Dataset>
Dimensions:  (index: 3)
Coordinates:
  * index    (index) int64 0 1 2
Data variables:
    A        (index) float64 0.417 0.1468 0.3968
    B        (index) float64 0.7203 0.09234 0.5388
    C        (index) float64 0.0001144 0.1863 0.4192
    D        (index) float64 0.3023 0.3456 0.6852
Attributes:
    author: my name
metacsv.io.converters.to_pandas(container, *args, **kwargs)[source]

Write a metacsvobject to a pandas Series, DataFrame, or Panel

Parameters:
  • container (object) – A pandas or metacsv Series, DataFrame, or Panel, an xarray DataArray or Dataset, or a filepath to a csv or netcdf file.
  • *args

    Additional positional arguments passed to metacsv.to_csv

  • **kwargs

    Additional keyword arguments passed to metacsv.to_csv

Example

>>> import metacsv
>>> import numpy as np, pandas as pd
>>>
>>> np.random.seed(1)
>>>
>>> df = metacsv.DataFrame(
...     np.random.random((3,4)),
...     columns=['col'+str(i) for i in range(4)])
...
>>> df.index = pd.MultiIndex.from_tuples(
...     [('a','X'),('b','Y'),('c','Z')], names=['abc','xyz'])
...
>>> df.attrs={'author': 'my name'}
>>> df.coords = {'abc': None, 'xyz': ['abc']}
>>> df 
<metacsv.core.containers.DataFrame (3, 4)>
             col0      col1      col2      col3
abc xyz
a   X    0.328389  0.598790  0.299902  0.265052
b   Y    0.720712  0.617109  0.331346  0.558522
c   Z    0.954494  0.143843  0.058968  0.069010

Coordinates
  * abc        (abc) object a, b, c
    xyz        (abc) object X, Y, Z
Attributes
    author:    my name

>>> to_pandas(df) 
             col0      col1      col2      col3
abc xyz
a   X    0.328389  0.598790  0.299902  0.265052
b   Y    0.720712  0.617109  0.331346  0.558522
c   Z    0.954494  0.143843  0.058968  0.069010
metacsv.io.converters.to_xarray(container, attrs=None, coords=None, variables=None, *args, **kwargs)[source]

Convert a Series to an xarray.DataArray and a CSV or DataFrame to an xArray.Dataset

Note

If a DataFrame is passed, columns will be stacked and treated as coordinates. to_dataset is not implemented for Panel data.

Parameters:
  • container (object) – A pandas or metacsv Series, DataFrame, or Panel, an xarray DataArray or Dataset, or a filepath to a csv or netcdf file.
  • attrs (dict) – Container attributes
  • coords (dict) – Container coordinates
  • variables (dict) – Variable-specific attributes
  • *args

    Additional positional arguments passed to metacsv.read_csv if container is a filepath

  • **kwargs

    Additional keyword arguments passed to metacsv.read_csv if container is a filepath

Example

>>> import metacsv
>>> import numpy as np, pandas as pd
>>>
>>> np.random.seed(1)
>>>
>>> df = metacsv.DataFrame(
... np.random.random((3,4)), columns=['col'+str(i) for i in range(4)])
>>> df.index = pd.MultiIndex.from_tuples([('a','X'),('b','Y'),('c','Z')],
... names=['abc','xyz'])
>>> df.attrs={'author': 'my name'}
>>> df.coords = {'abc': None, 'xyz': ['abc']}
>>> df 
<metacsv.core.containers.DataFrame (3, 4)>
             col0      col1      col2      col3
abc xyz
a   X    0.417022  0.720324  0.000114  0.302333
b   Y    0.146756  0.092339  0.186260  0.345561
c   Z    0.396767  0.538817  0.419195  0.685220

Coordinates
  * abc        (abc) object a, b, c
    xyz        (abc) object X, Y, Z
Attributes
    author:         my name

>>> to_xarray(df) 
<xarray.Dataset>
Dimensions:  (abc: 3)
Coordinates:
  * abc      (abc) object 'a' 'b' 'c'
    xyz      (abc) object 'X' 'Y' 'Z'
Data variables:
    col0     (abc) float64 0.417 0.1468 0.3968
    col1     (abc) float64 0.7203 0.09234 0.5388
    col2     (abc) float64 0.0001144 0.1863 0.4192
    col3     (abc) float64 0.3023 0.3456 0.6852
Attributes:
    author: my name

metacsv.io.parsers module

metacsv.io.parsers.find_yaml_start(line)[source]
metacsv.io.parsers.find_yaml_stop(line)[source]
metacsv.io.parsers.read_csv(fp, header_file=None, parse_vars=False, assertions=None, *args, **kwargs)[source]

Read a csv or metacsv-formatted csv into a metacsv.DataFrame

Parameters:fp (str or buffer) – csv or metacsv-formatted filepath or buffer to read
Kwargs:
header_file (str or buffer): optional supplemental yaml header file parse_vars (bool): parse compact-style variable definitions (see example) assertions (dict-like): dictionary of values to assert in file header

*args, **kwargs passed to pandas.read_csv

Example

>>> import metacsv, numpy as np
>>> import StringIO as io # import io for python 3
>>> doc = io.StringIO('''
... ---
... author: A Person
... date:   2000-01-01
... variables:
...     pop:
...       name: Population
...       unit: millions
...     gdp:
...       name: Product
...       unit: 2005 $Bn
... ...
... region,year,pop,gdp
... USA,2010,309.3,13599.3
... USA,2011,311.7,13817.0
... CAN,2010,34.0,1240.0
... CAN,2011,34.3,1276.7
... ''')
>>> df = metacsv.read_csv(doc, index_col=[0,1])
>>> df 
<metacsv.core.containers.DataFrame (4, 2)>
               pop      gdp
region year
USA    2010  309.3  13599.3
       2011  311.7  13817.0
CAN    2010   34.0   1240.0
       2011   34.3   1276.7

Variables
    gdp:
        name            Product
        unit            2005 $Bn
    pop:
        name            Population
        unit            millions
Attributes
    author:         A Person
    date:           2000-01-01

parse_vars

The read-csv argument parse_vars allows parsing of one-line variable definitions in the format var: description [unit]:

Example

>>> doc = io.StringIO('''
... ---
... author: A Person
... date:   2000-01-01
... variables:
...     pop: Population [millions]
...     gdp: Product [2005 $Bn]
... ...
... region,year,pop,gdp
... USA,2010,309.3,13599.3
... USA,2011,311.7,13817.0
... CAN,2010,34.0,1240.0
... CAN,2011,34.3,1276.7
... ''')
>>> metacsv.read_csv(doc, index_col=0, parse_vars=True) 
<metacsv.core.containers.DataFrame (4, 3)>
        year    pop      gdp
region
USA     2010  309.3  13599.3
USA     2011  311.7  13817.0
CAN     2010   34.0   1240.0
CAN     2011   34.3   1276.7

Variables
    gdp:
        description     Product
        unit            2005 $Bn
    pop:
        description     Population
        unit            millions
Attributes
    author:         A Person
    date:           2000-01-01
metacsv.io.parsers.read_header(fp, header_file=None, parse_vars=False, assertions=None, *args, **kwargs)[source]

Read a metacsv-formatted header

Parameters:fp (str or buffer) – csv or metacsv-formatted filepath or buffer to read
Kwargs:
header_file (str or buffer): optional supplemental yaml header file parse_vars (bool): parse compact-style variable definitions (see example) assertions (dict-like): dictionary of values to assert in file header
Returns:args variables coords

Example

>>> import metacsv
>>> import StringIO as io # import io for python 3
>>> doc = io.StringIO('''
... ---
... author: A Person
... date:   2000-01-01
... variables:
...     pop:
...       name: Population
...       unit: millions
...     gdp:
...       name: Product
...       unit: 2005 $Bn
... ...
... other data, not csv-formatted
... ''')
>>> attrs, coords, variables = metacsv.read_header(doc, index_col=[0,1])
>>> variables 
Variables
    gdp:
        name            Product
        unit            2005 $Bn
    pop:
        name            Population
        unit            millions
>>> attrs 
Attributes
    author:         A Person
    date:           2000-01-01
>>> coords
<Empty Coordinates>

parse_vars

The read_header argument parse_vars allows parsing of one-line variable definitions in the format var: description [unit]:

Example

>>> doc = io.StringIO('''
... ---
... author: A Person
... date:   2000-01-01
... variables:
...     pop: Population [millions]
...     gdp: Product [2005 $Bn]
... ...
... region,year,pop,gdp
... USA,2010,309.3,13599.3
... USA,2011,311.7,13817.0
... CAN,2010,34.0,1240.0
... CAN,2011,34.3,1276.7
... ''')
>>> attrs, coords, variables = metacsv.read_header(doc, parse_vars=True)
>>> variables 
Variables
    gdp:
        description     Product
        unit            2005 $Bn
    pop:
        description     Population
        unit            millions
metacsv.io.parsers.read_pickle(fp, assertions=None, *args, **kwargs)[source]

Read a pandas or metacsv pickle file into a metacsv container

Parameters:fp (str or buffer) – ffilepath or buffer to read
Kwargs:
assertions (dict-like): dictionary of values to assert in file header

*args, **kwargs passed to pandas.read_pickle

metacsv.io.to_csv module

metacsv.io.to_csv.metacsv_to_csv(container, fp, header_file=None, *args, **kwargs)[source]
metacsv.io.to_csv.metacsv_to_header(fp, attrs=None, coords=None, variables=None)[source]

metacsv.io.to_xarray module

Utilities for converting metacsv Containers to xarray containers

metacsv.io.to_xarray.metacsv_dataframe_to_dataarray(dataframe, names=None, attrs=None)[source]
metacsv.io.to_xarray.metacsv_dataframe_to_dataset(dataframe, name='data', attrs=None)[source]
metacsv.io.to_xarray.metacsv_series_to_dataarray(series, attrs=None)[source]
metacsv.io.to_xarray.metacsv_series_to_dataset(series, name='data', attrs=None)[source]

metacsv.io.yaml_tools module

metacsv.io.yaml_tools.ordered_dump(data, stream=None, Dumper=<class 'yaml.dumper.SafeDumper'>, **kwds)[source]
metacsv.io.yaml_tools.ordered_load(stream, Loader=<class 'yaml.loader.SafeLoader'>, object_pairs_hook=<class 'collections.OrderedDict'>)[source]

Module contents