metacsv.io package¶
Submodules¶
metacsv.io.converters module¶
Utilities for converting between metacsv-compatible data formats
-
metacsv.io.converters.to_csv(container, fp, attrs=None, coords=None, variables=None, header_file=None, *args, **kwargs)[source]¶ Write a CSV, Series, DataFrame, Panel, DataArray, or Dataset to a metacsv-formatted csv
Note
If a DataFrame is passed, columns will be stacked and treated as coordinates. to_dataset is not implemented for Panel data.
Parameters: - container (object) – A pandas or metacsv Series, DataFrame, or Panel, an xarray DataArray or Dataset, or a filepath to a csv or netcdf file.
- fp (str) – Path to which to write the metacsv-formatted CSV
- attrs (dict) – Container attributes
- coords (dict) – Container coordinates
- variables (dict) – Variable-specific attributes
- header_file (str_or_buffer) – A separate metacsv-formatted header file
- *args –
Additional positional arguments passed to metacsv.read_csv if container is a filepath
- **kwargs –
Additional keyword arguments passed to metacsv.read_csv if container is a filepath
Example
>>> np.random.seed(1) >>> index = pd.MultiIndex.from_tuples( ... [('X', 1), ('X', 2), ('Y', 1)], ... names=['alpha', 'beta']) ... >>> df = pd.DataFrame( ... np.random.random((3,4)), ... index=index, ... columns=list('ABCD')) ... >>> to_csv( ... df, ... fp='my-metacsv-data.csv', ... attrs={'author': 'my name'}, ... coords=['alpha', 'beta']) ...
This metacsv-formatted CSV can be then used by metacsv or converted using any of the converters in this module:
... code-block:: python
>>> to_xarray('my-metacsv-data.csv') <xarray.Dataset> Dimensions: (alpha: 2, beta: 2) Coordinates: * alpha (alpha) object 'X' 'Y' * beta (beta) int64 1 2 Data variables: A (alpha, beta) float64 0.417 0.1468 0.3968 nan B (alpha, beta) float64 0.7203 0.09234 0.5388 nan C (alpha, beta) float64 0.0001144 0.1863 0.4192 nan D (alpha, beta) float64 0.3023 0.3456 0.6852 nan Attributes: author: my name
-
metacsv.io.converters.to_dataarray(container, attrs=None, coords=None, variables=None, *args, **kwargs)[source]¶ Convert a CSV, Series, DataFrame, Panel, DataArray, or Dataset to an
xarray.DataArrayNote
If a DataFrame is passed, columns will be stacked and treated as coordinates. to_dataset is not implemented for Panel data.
Parameters: - container (object) – A pandas or metacsv Series, DataFrame, or Panel, an xarray DataArray or Dataset, or a filepath to a csv or netcdf file.
- attrs (dict) – Container attributes
- coords (dict) – Container coordinates
- variables (dict) – Variable-specific attributes
- *args –
Additional positional arguments passed to metacsv.read_csv if container is a filepath
- **kwargs –
Additional keyword arguments passed to metacsv.read_csv if container is a filepath
Example
>>> np.random.seed(1) >>> to_dataarray( ... pd.DataFrame(np.random.random((3,4)), index=list('ABC')), ... attrs={'author': 'my name'}) ... <xarray.DataArray (ind_0: 3, coldim_0: 4)> array([[ 4.17022005e-01, 7.20324493e-01, 1.14374817e-04, 3.02332573e-01], [ 1.46755891e-01, 9.23385948e-02, 1.86260211e-01, 3.45560727e-01], [ 3.96767474e-01, 5.38816734e-01, 4.19194514e-01, 6.85219500e-01]]) Coordinates: * ind_0 (ind_0) object 'A' 'B' 'C' * coldim_0 (coldim_0) int64 0 1 2 3 Attributes: author: my name
-
metacsv.io.converters.to_dataset(container, attrs=None, coords=None, variables=None, *args, **kwargs)[source]¶ Convert a CSV, Series, DataFrame, Panel, DataArray, or Dataset to an
xarray.DatasetNote
If a Series is passed, the variable will be named ‘data’. to_dataset is not implemented for Panel data.
Parameters: - container (object) – A pandas or metacsv Series, DataFrame, or Panel, an xarray DataArray or Dataset, or a filepath to a csv or netcdf file.
- attrs (dict) – Container attributes
- coords (dict) – Container coordinates
- variables (dict) – Variable-specific attributes
- *args –
Additional positional arguments passed to metacsv.read_csv if container is a filepath
- **kwargs –
Additional keyword arguments passed to metacsv.read_csv if container is a filepath
Example
>>> np.random.seed(1) >>> >>> to_dataset( ... pd.DataFrame(np.random.random((3,4))), ... attrs={'author': 'my name'}) ... <xarray.Dataset> Dimensions: (index: 3) Coordinates: * index (index) int64 0 1 2 Data variables: 0 (index) float64 0.417 0.1468 0.3968 1 (index) float64 0.7203 0.09234 0.5388 2 (index) float64 0.0001144 0.1863 0.4192 3 (index) float64 0.3023 0.3456 0.6852 Attributes: author: my name
-
metacsv.io.converters.to_header(fp, container=None, attrs=None, coords=None, variables=None, *args, **kwargs)[source]¶ Write metacsv attributes directly to a metacsv-formatted header file
Parameters: - fp (str) – Path to which to write the metacsv-formatted header file
- container (object) – A metacsv Series, DataFrame, or Panel, or a metacsv-formatted csv file from which to derive attrs, coords, and variables (optional)
- attrs (dict) – Attributes to write to header file (optional). If container is also supplied, these attrs will update the attrs dict on the provided container.
- coords (dict) – Coordinates to write to header file (optional). If container is also supplied, these coords will update the coords dict on the provided container.
- variables (dict) – Variable metadata to write to header file (optional). If container is also supplied, these variable metadata will update the variables dict on the provided container.
- *args –
Additional positional arguments passed to metacsv.read_csv if container is a filepath
- **kwargs –
Additional keyword arguments passed to metacsv.read_csv if container is a filepath
Example
>>> to_header('mycsv.header', attrs={'author': 'me'}, coords='index')
-
metacsv.io.converters.to_netcdf(container, fp, attrs=None, coords=None, variables=None, *args, **kwargs)[source]¶ Convert a CSV, Series, DataFrame, Panel, DataArray, or Dataset to a NetCDF file
Note
If a DataFrame is passed, columns will be stacked and treated as coordinates. to_dataset is not implemented for Panel data.
Parameters: - container (object) – A pandas or metacsv Series, DataFrame, or Panel, an xarray DataArray or Dataset, or a filepath to a csv or netcdf file.
- attrs (dict) – Container attributes
- coords (dict) – Container coordinates
- variables (dict) – Variable-specific attributes
- *args –
Additional positional arguments passed to metacsv.read_csv if container is a filepath
- **kwargs –
Additional keyword arguments passed to metacsv.read_csv if container is a filepath
Example
>>> np.random.seed(1) >>> >>> to_netcdf( ... pd.DataFrame(np.random.random((3,4)), columns=list('ABCD')), ... 'test.nc', ... attrs={'author': 'my name'}) ... >>> xr.open_dataset('test.nc') <xarray.Dataset> Dimensions: (index: 3) Coordinates: * index (index) int64 0 1 2 Data variables: A (index) float64 0.417 0.1468 0.3968 B (index) float64 0.7203 0.09234 0.5388 C (index) float64 0.0001144 0.1863 0.4192 D (index) float64 0.3023 0.3456 0.6852 Attributes: author: my name
-
metacsv.io.converters.to_pandas(container, *args, **kwargs)[source]¶ Write a metacsvobject to a pandas
Series,DataFrame, orPanelParameters: - container (object) – A pandas or metacsv Series, DataFrame, or Panel, an xarray DataArray or Dataset, or a filepath to a csv or netcdf file.
- *args –
Additional positional arguments passed to metacsv.to_csv
- **kwargs –
Additional keyword arguments passed to metacsv.to_csv
Example
>>> import metacsv >>> import numpy as np, pandas as pd >>> >>> np.random.seed(1) >>> >>> df = metacsv.DataFrame( ... np.random.random((3,4)), ... columns=['col'+str(i) for i in range(4)]) ... >>> df.index = pd.MultiIndex.from_tuples( ... [('a','X'),('b','Y'),('c','Z')], names=['abc','xyz']) ... >>> df.attrs={'author': 'my name'} >>> df.coords = {'abc': None, 'xyz': ['abc']} >>> df <metacsv.core.containers.DataFrame (3, 4)> col0 col1 col2 col3 abc xyz a X 0.328389 0.598790 0.299902 0.265052 b Y 0.720712 0.617109 0.331346 0.558522 c Z 0.954494 0.143843 0.058968 0.069010 Coordinates * abc (abc) object a, b, c xyz (abc) object X, Y, Z Attributes author: my name >>> to_pandas(df) col0 col1 col2 col3 abc xyz a X 0.328389 0.598790 0.299902 0.265052 b Y 0.720712 0.617109 0.331346 0.558522 c Z 0.954494 0.143843 0.058968 0.069010
-
metacsv.io.converters.to_xarray(container, attrs=None, coords=None, variables=None, *args, **kwargs)[source]¶ Convert a Series to an xarray.DataArray and a CSV or DataFrame to an xArray.Dataset
Note
If a DataFrame is passed, columns will be stacked and treated as coordinates. to_dataset is not implemented for Panel data.
Parameters: - container (object) – A pandas or metacsv Series, DataFrame, or Panel, an xarray DataArray or Dataset, or a filepath to a csv or netcdf file.
- attrs (dict) – Container attributes
- coords (dict) – Container coordinates
- variables (dict) – Variable-specific attributes
- *args –
Additional positional arguments passed to metacsv.read_csv if container is a filepath
- **kwargs –
Additional keyword arguments passed to metacsv.read_csv if container is a filepath
Example
>>> import metacsv >>> import numpy as np, pandas as pd >>> >>> np.random.seed(1) >>> >>> df = metacsv.DataFrame( ... np.random.random((3,4)), columns=['col'+str(i) for i in range(4)]) >>> df.index = pd.MultiIndex.from_tuples([('a','X'),('b','Y'),('c','Z')], ... names=['abc','xyz']) >>> df.attrs={'author': 'my name'} >>> df.coords = {'abc': None, 'xyz': ['abc']} >>> df <metacsv.core.containers.DataFrame (3, 4)> col0 col1 col2 col3 abc xyz a X 0.417022 0.720324 0.000114 0.302333 b Y 0.146756 0.092339 0.186260 0.345561 c Z 0.396767 0.538817 0.419195 0.685220 Coordinates * abc (abc) object a, b, c xyz (abc) object X, Y, Z Attributes author: my name >>> to_xarray(df) <xarray.Dataset> Dimensions: (abc: 3) Coordinates: * abc (abc) object 'a' 'b' 'c' xyz (abc) object 'X' 'Y' 'Z' Data variables: col0 (abc) float64 0.417 0.1468 0.3968 col1 (abc) float64 0.7203 0.09234 0.5388 col2 (abc) float64 0.0001144 0.1863 0.4192 col3 (abc) float64 0.3023 0.3456 0.6852 Attributes: author: my name
metacsv.io.parsers module¶
-
metacsv.io.parsers.read_csv(fp, header_file=None, parse_vars=False, assertions=None, *args, **kwargs)[source]¶ Read a csv or metacsv-formatted csv into a metacsv.DataFrame
Parameters: fp (str or buffer) – csv or metacsv-formatted filepath or buffer to read - Kwargs:
- header_file (str or buffer): optional supplemental yaml header file parse_vars (bool): parse compact-style variable definitions (see example) assertions (dict-like): dictionary of values to assert in file header
*args, **kwargs passed to pandas.read_csv
Example
>>> import metacsv, numpy as np >>> import StringIO as io # import io for python 3 >>> doc = io.StringIO(''' ... --- ... author: A Person ... date: 2000-01-01 ... variables: ... pop: ... name: Population ... unit: millions ... gdp: ... name: Product ... unit: 2005 $Bn ... ... ... region,year,pop,gdp ... USA,2010,309.3,13599.3 ... USA,2011,311.7,13817.0 ... CAN,2010,34.0,1240.0 ... CAN,2011,34.3,1276.7 ... ''')
>>> df = metacsv.read_csv(doc, index_col=[0,1]) >>> df <metacsv.core.containers.DataFrame (4, 2)> pop gdp region year USA 2010 309.3 13599.3 2011 311.7 13817.0 CAN 2010 34.0 1240.0 2011 34.3 1276.7 Variables gdp: name Product unit 2005 $Bn pop: name Population unit millions Attributes author: A Person date: 2000-01-01
parse_vars
The read-csv argument
parse_varsallows parsing of one-line variable definitions in the formatvar: description [unit]:Example
>>> doc = io.StringIO(''' ... --- ... author: A Person ... date: 2000-01-01 ... variables: ... pop: Population [millions] ... gdp: Product [2005 $Bn] ... ... ... region,year,pop,gdp ... USA,2010,309.3,13599.3 ... USA,2011,311.7,13817.0 ... CAN,2010,34.0,1240.0 ... CAN,2011,34.3,1276.7 ... ''')
>>> metacsv.read_csv(doc, index_col=0, parse_vars=True) <metacsv.core.containers.DataFrame (4, 3)> year pop gdp region USA 2010 309.3 13599.3 USA 2011 311.7 13817.0 CAN 2010 34.0 1240.0 CAN 2011 34.3 1276.7 Variables gdp: description Product unit 2005 $Bn pop: description Population unit millions Attributes author: A Person date: 2000-01-01
-
metacsv.io.parsers.read_header(fp, header_file=None, parse_vars=False, assertions=None, *args, **kwargs)[source]¶ Read a metacsv-formatted header
Parameters: fp (str or buffer) – csv or metacsv-formatted filepath or buffer to read - Kwargs:
- header_file (str or buffer): optional supplemental yaml header file parse_vars (bool): parse compact-style variable definitions (see example) assertions (dict-like): dictionary of values to assert in file header
Returns: args variables coords Example
>>> import metacsv >>> import StringIO as io # import io for python 3 >>> doc = io.StringIO(''' ... --- ... author: A Person ... date: 2000-01-01 ... variables: ... pop: ... name: Population ... unit: millions ... gdp: ... name: Product ... unit: 2005 $Bn ... ... ... other data, not csv-formatted ... ''')
>>> attrs, coords, variables = metacsv.read_header(doc, index_col=[0,1]) >>> variables Variables gdp: name Product unit 2005 $Bn pop: name Population unit millions
>>> attrs Attributes author: A Person date: 2000-01-01
>>> coords <Empty Coordinates>
parse_vars
The read_header argument
parse_varsallows parsing of one-line variable definitions in the formatvar: description [unit]:Example
>>> doc = io.StringIO(''' ... --- ... author: A Person ... date: 2000-01-01 ... variables: ... pop: Population [millions] ... gdp: Product [2005 $Bn] ... ... ... region,year,pop,gdp ... USA,2010,309.3,13599.3 ... USA,2011,311.7,13817.0 ... CAN,2010,34.0,1240.0 ... CAN,2011,34.3,1276.7 ... ''')
>>> attrs, coords, variables = metacsv.read_header(doc, parse_vars=True) >>> variables Variables gdp: description Product unit 2005 $Bn pop: description Population unit millions
metacsv.io.to_csv module¶
metacsv.io.to_xarray module¶
Utilities for converting metacsv Containers to xarray containers