mirror of
https://github.com/wassname/greater_tables_project.git
synced 2026-06-27 17:30:44 +08:00
3.2.0 integration of tikz, better column width
More TeX snippets.
This commit is contained in:
@@ -1,17 +1,9 @@
|
||||
sort out the variety of readmes... this is the main one
|
||||
# `greater_tables` Project
|
||||
|
||||
https://shields.io/badges/read-the-docs
|
||||

|
||||
|
||||
|
||||
# v 3.0 update
|
||||
|
||||
* config files
|
||||
* unified col width and info dataframe
|
||||
* de-texing
|
||||
* cli for config and writeout a csv etc.
|
||||
* tests
|
||||
|
||||
# TODO
|
||||
## TODO
|
||||
|
||||
* Ratio cols with multi index columns
|
||||
* % in tex output - never allow comments?
|
||||
@@ -19,109 +11,14 @@ https://shields.io/badges/read-the-docs
|
||||
* ?Option to hide index
|
||||
* Bring over the roll your own logger
|
||||
|
||||
***
|
||||
|
||||
## Greater Tables
|
||||
|
||||
# from GPT
|
||||
Display graphics vs. data tables
|
||||
|
||||
no colors, sparklines, shading, ...
|
||||
|
||||
Absolutely—here’s a structured summary of everything we’ve covered, organized by topic.
|
||||
|
||||
---
|
||||
|
||||
## 🧱 Project Structure & Philosophy
|
||||
|
||||
* Your `GreaterTables` class formats a `pandas.DataFrame` to **HTML, text, or LaTeX**.
|
||||
* The class is **immutable**: formatting is fixed at construction time, like a pure value object.
|
||||
* You avoid branchy, incremental APIs (like `ggplot`) and prefer creating fresh objects.
|
||||
* You wanted a way to handle growing config complexity—→ led to a **YAML config + Pydantic schema** design.
|
||||
|
||||
---
|
||||
|
||||
## 📁 Project Layout
|
||||
|
||||
```
|
||||
greater_tables_project/
|
||||
├── greater_tables/
|
||||
│ ├── __init__.py
|
||||
│ ├── gtconfig.py ← config model + loader
|
||||
│ ├── gtcore.py ← GreaterTables class
|
||||
│ └── defaults/
|
||||
│ └── config_template.yaml
|
||||
├── tests/
|
||||
├── pyproject.toml
|
||||
```
|
||||
|
||||
* `GTConfigModel` = schema + default source of truth
|
||||
* `GTConfig` = singleton loader and validator
|
||||
* `config_template.yaml` = editable fallback + documentation base
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Config Management
|
||||
|
||||
* All defaults and types are declared in `GTConfigModel` (Pydantic).
|
||||
* Config is **loaded from YAML**, validated by `GTConfigModel`.
|
||||
* You can **generate** a valid config file from the model using `.model_dump() → YAML`.
|
||||
* Singleton pattern (`GTConfig.__new__`) caches the config at runtime.
|
||||
|
||||
### Helpers
|
||||
|
||||
* `GTConfig().get(overrides=...)` gives a safe, override-able config
|
||||
* `write_template(path)` writes a default config YAML for user to edit
|
||||
|
||||
---
|
||||
|
||||
## 🛠 Git Workflow (Solo Dev, Linear)
|
||||
|
||||
* Use **tags** (`git tag v0.2.0`) to label stable versions
|
||||
* Use **`git reset --hard <tag>`** to roll back and discard later commits
|
||||
* Avoid branches entirely—keep a **single linear history**
|
||||
* Tags let you bounce around safely, with names instead of hashes
|
||||
* Releases on GitHub are tags + metadata, optional for publishing
|
||||
|
||||
---
|
||||
|
||||
## ⚙️ CLI Tool
|
||||
|
||||
* Built with `click`, with subcommands:
|
||||
|
||||
* `gt render data.csv --format html`
|
||||
* `gt write-template`
|
||||
* Reads any Pandas-supported file (`.csv`, `.feather`, `.pkl`, etc.)
|
||||
* Outputs to console or to file
|
||||
* Uses current config by default, or override with `--config path.yaml`
|
||||
|
||||
---
|
||||
|
||||
## 🧠 Design Principles You’re Following
|
||||
|
||||
| Principle | Your Approach |
|
||||
| ---------------------------- | ------------------------------------------- |
|
||||
| Immutability | `GT(df, config)` is fixed once created |
|
||||
| Separation of concerns | `GTConfigModel` holds defaults/types |
|
||||
| Config as code/documentation | `config_template.yaml` generated from model |
|
||||
| CLI-first mindset | `click` used to expose functionality |
|
||||
| Linear Git workflow | Tags for rollback, no branches |
|
||||
|
||||
---
|
||||
|
||||
Let me know if you want me to generate:
|
||||
|
||||
* a Markdown doc for contributors
|
||||
* a `.bat` script to roll back to a tag
|
||||
* test scaffolding or release automation
|
||||
|
||||
You're in great shape. Gum-level perfection achieved.
|
||||
|
||||
|
||||
|
||||
***
|
||||
|
||||
# OLD
|
||||
|
||||
# Greater Tables
|
||||
|
||||
Creating presentation quality tables from pandas dataframes is frustrating. It is hard to left-align text and right-align numbers using pandas `display` or `df.to_html`. The `great_tables` package does a really nice job with pandas and polars dataframes but does not support indexes or TeX output.
|
||||
Creating presentation quality tables is difficult. It is hard to left-align text and right-align numbers using pandas `display` or `df.to_html`. The `great_tables` package does a really nice job with pandas and polars dataframes but does not support indexes or TeX output.
|
||||
|
||||
This package provides consistent HTML and TeX table output with flexible type-based formatting, and table rules. Neither output relies on the pandas `to_html` or `to_latex` functions. TeX output uses Tikz tables for very tight control over layout and grid lines. The package is designed for use in Jupyter Lab notebooks Quarto documents.
|
||||
|
||||
@@ -137,9 +34,19 @@ been maintaining a set of macros called
|
||||
reusable, extensible actuarial tools) in VBA and Python since the late
|
||||
1990s, and call all my macro packages "GREAT".
|
||||
|
||||
## Documentation
|
||||
|
||||

|
||||
|
||||
Available on
|
||||
[readthedocs](https://greater-tables-project.readthedocs.io/en/latest).
|
||||
|
||||
|
||||
## Installation
|
||||
|
||||
``` python
|
||||

|
||||
|
||||
```python
|
||||
pip install greater-tables
|
||||
```
|
||||
|
||||
@@ -148,7 +55,7 @@ pip install greater-tables
|
||||
The following example shows quite a hard table. It is formatted using
|
||||
the `sGT` class, which is a subclass of `GT` with a few defaults set.
|
||||
|
||||
``` {.python .cell-code}
|
||||
```python
|
||||
import pandas as pd
|
||||
import numpy as np
|
||||
from greater_tables import sGT
|
||||
@@ -196,21 +103,37 @@ The output illustrates:
|
||||
|
||||
More coming soon.
|
||||
|
||||
## Documentation
|
||||
|
||||

|
||||
## History
|
||||
|
||||
Available on
|
||||
[readthedocs](https://greater-tables-project.readthedocs.io/en/latest).
|
||||
3.2.0
|
||||
-------
|
||||
* Added more tex snippets!
|
||||
* Refactored tikz and column width behavior
|
||||
|
||||
## Versions
|
||||
3.1.0
|
||||
-------
|
||||
* adjustments for auto format
|
||||
* rearranged gtcore order of methods
|
||||
|
||||
3.0.0
|
||||
-------
|
||||
|
||||
* config files / pydantic config input
|
||||
* unified col width and info dataframe
|
||||
* de-texing
|
||||
* cli for config and writeout a csv etc.
|
||||
|
||||
* testdf suite
|
||||
* Automated TeX to SVG
|
||||
|
||||
2.0.0
|
||||
------
|
||||
|
||||
* **v2.0.0** solid release old-style, all-argument GT
|
||||
* Better column widths
|
||||
* Custom text output
|
||||
* Rich table output
|
||||
1.1.1
|
||||
-------
|
||||
* Added logo, updated docs.
|
||||
@@ -251,3 +174,65 @@ Early development
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
## 📁 Project Layout
|
||||
|
||||
```
|
||||
greater_tables_project/
|
||||
| LICENSE
|
||||
| pyproject.toml
|
||||
| README.md
|
||||
|
|
||||
+---dist
|
||||
|
|
||||
+---docs
|
||||
| | books.bib
|
||||
| | conf.py
|
||||
| | greater_tables.data.rst
|
||||
| | greater_tables.rst
|
||||
| | index.rst
|
||||
| | library.bib
|
||||
| | make.bat
|
||||
| | Makefile
|
||||
| | modules.rst
|
||||
| | start-server.bat
|
||||
| | style.csl
|
||||
| |
|
||||
+---greater_tables
|
||||
| | __init__.py
|
||||
| | cli.py
|
||||
| | gtconfig.py
|
||||
| | gtcore.py
|
||||
| | gtenums.py
|
||||
| | gtformats.py
|
||||
| | hasher.py
|
||||
| | testdf.py
|
||||
| | tex_svg.py
|
||||
| |
|
||||
| +---data
|
||||
| | | __init__.py
|
||||
| | | tex_list.csv
|
||||
| | | tex_list.py
|
||||
| | | words-12.md
|
||||
|
|
||||
+---greater_tables.egg-info
|
||||
|
|
||||
+---img
|
||||
| hard-html.png
|
||||
| hard-tex.png
|
||||
```
|
||||
|
||||
|
||||
|
||||
## 🧠 Design Principles You’re Following
|
||||
|
||||
| Principle | Your Approach |
|
||||
| ---------------------------- | ------------------------------------------- |
|
||||
| Immutability | `GT(df, config)` is fixed once created |
|
||||
| Separation of concerns | `GTConfigModel` holds defaults/types |
|
||||
| Config as code/documentation | `config_template.yaml` generated from model |
|
||||
| CLI-first mindset | `click` used to expose functionality |
|
||||
| Linear Git workflow | Tags for rollback, no branches |
|
||||
|
||||
|
||||
|
||||
+2
-5
@@ -14,11 +14,8 @@ Welcome to greater_tables's documentation!
|
||||
greater_tables.data
|
||||
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
.. mdinclude:: ../README.md
|
||||
|
||||
.. include:: ../README.md
|
||||
:parser: myst_parser.sphinx_
|
||||
|
||||
Other
|
||||
=======
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
__version__ = '3.1.0'
|
||||
__version__ = '3.2.0'
|
||||
__project__ = 'greater_tables'
|
||||
__author__ = 'Stephen J Mildenhall'
|
||||
|
||||
|
||||
+11
-5
@@ -1,18 +1,20 @@
|
||||
import click
|
||||
import pandas as pd
|
||||
from pathlib import Path
|
||||
from .gtconfig import GTConfig, write_template
|
||||
from .gtcore import GreaterTables
|
||||
from .gtconfig import GTConfigModel, write_template
|
||||
from .gtcore import GT
|
||||
|
||||
|
||||
@click.group()
|
||||
def cli():
|
||||
"""Greater Tables CLI tool"""
|
||||
pass
|
||||
|
||||
|
||||
@cli.command()
|
||||
@click.argument("input_file", type=click.Path(exists=True))
|
||||
@click.option("--output", "-o", type=click.Path(), help="Write rendered output to file")
|
||||
@click.option("--format", "-f", type=click.Choice(["html", "text", "latex"]), default="html")
|
||||
@click.option("--format", "-f", type=click.Choice(["html", "text", "latex", "svg", "pdf"]), default="html")
|
||||
@click.option("--config", type=click.Path(), help="Path to a YAML config file")
|
||||
def render(input_file, output, format, config):
|
||||
"""Render a table from a data file."""
|
||||
@@ -28,8 +30,8 @@ def render(input_file, output, format, config):
|
||||
else:
|
||||
raise click.UsageError(f"Unsupported extension: {ext}")
|
||||
|
||||
cfg = GTConfig(Path(config) if config else None).get()
|
||||
gt = GreaterTables(df, config=cfg)
|
||||
cfg = GTConfigModel(Path(config) if config else None).get()
|
||||
gt = GT(df, config=cfg)
|
||||
|
||||
rendered = (
|
||||
gt.render_html() if format == "html"
|
||||
@@ -37,11 +39,15 @@ def render(input_file, output, format, config):
|
||||
else gt.render_latex()
|
||||
)
|
||||
|
||||
if format in ('svg', 'pdf'):
|
||||
print('more work to do!!')
|
||||
|
||||
if output:
|
||||
Path(output).write_text(rendered, encoding="utf-8")
|
||||
else:
|
||||
print(rendered)
|
||||
|
||||
|
||||
@cli.command()
|
||||
@click.argument("path", type=click.Path(), default="config.yaml")
|
||||
def write_template(path):
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
+16719
-5848
File diff suppressed because it is too large
Load Diff
@@ -19,12 +19,15 @@ class TeXMacros():
|
||||
from great2.blog
|
||||
"""
|
||||
|
||||
_macros = r"""
|
||||
\def\AA{\mathcal{A}}
|
||||
_macros = r"""\def\AA{\mathcal{A}}
|
||||
\def\atan{\mathrm{atan}}
|
||||
\def\A{\mathcal{A}}
|
||||
\def\B{\mathcal{B}}
|
||||
\def\BB{\mathbb{B}}
|
||||
\def\AVaR{\mathsf{AVaR}}
|
||||
\def\bbeta{\mathbf{\beta}}
|
||||
\def\bb{\mathbf b}
|
||||
\def\bfx{\mathbf x}
|
||||
\def\bm{\mathbf }
|
||||
\def\biTVaR{\mathsf{biTVaR}}
|
||||
\def\corr{\mathsf{Corr}}
|
||||
@@ -38,9 +41,12 @@ class TeXMacros():
|
||||
\def\ecirc{\accentset{\circ} e}
|
||||
\def\EPD{\mathsf{EPD}}
|
||||
\def\ES{\mathsf{ES}}
|
||||
\def\esssup{\mathrm{ess\,sup}}
|
||||
\def\E{\mathsf{E}}
|
||||
\def\F{\mathscr{F}}
|
||||
\def\FFF{\mathscr{F}}
|
||||
\def\FF{\mathcal{F}}
|
||||
\def\G{\mathscr{G}}
|
||||
\def\HH{\mathbf{H}}
|
||||
\def\kpx{{{}_kp_x}}
|
||||
\def\MM{\mathcal{M}}
|
||||
@@ -50,10 +56,13 @@ class TeXMacros():
|
||||
\def\OO{\mathscr{O}}
|
||||
\def\PPP{\mathscr{P}}
|
||||
\def\PP{\mathsf{P}}
|
||||
\def\P{\mathsf{Pr}}
|
||||
\def\Pr{\mathsf{Pr}}
|
||||
\def\QQ{\mathsf{Q}}
|
||||
\def\Q{\mathbb{Q}}
|
||||
\def\RR{\mathbb{R}}
|
||||
\def\SD{\mathsf{SD}}
|
||||
\def\spcer{\ }
|
||||
\def\TCE{\mathsf{TCE}}
|
||||
\def\TVaR{\mathsf{TVaR}}
|
||||
\def\Var{\mathsf{Var}}
|
||||
@@ -66,8 +75,7 @@ class TeXMacros():
|
||||
\def\XX{\mathbf{X}}
|
||||
\def\yy{\mathbf{y}}
|
||||
\def\ZZZ{\mathcal{Z}}
|
||||
\def\ZZ{\mathbb{Z}}
|
||||
"""
|
||||
\def\ZZ{\mathbb{Z}}"""
|
||||
|
||||
@staticmethod
|
||||
def process_tex_macros(text):
|
||||
@@ -96,22 +104,37 @@ class TeXMacros():
|
||||
i = x.find('{')
|
||||
return x[:i], x[i + 1:-1]
|
||||
|
||||
def find_tex_snippeets(in_dir='\\S\\TELOS\\PIR\\docs',
|
||||
|
||||
def find_tex_snippets(in_dir='\\S\\TELOS\\PIR\\docs',
|
||||
out_file='tex_list.csv'):
|
||||
"""Ripgrep / TeX macro expand list of TeX snippets."""
|
||||
# prod run with \\s\\telos\\ (!)
|
||||
in_dir = str(Path(in_dir))
|
||||
cmd = ['rg', '-N', '-o', '--no-filename',
|
||||
'-g', '*.md',
|
||||
'-g', '*.qmd',
|
||||
r'\$.+?\$',
|
||||
in_dir]
|
||||
result = subprocess.run(
|
||||
['rg', '-N', '-o', '--no-filename', '-g', '*.md', r'\$.+?\$', in_dir],
|
||||
cmd,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
check=True
|
||||
check=True,
|
||||
encoding='utf-8'
|
||||
)
|
||||
output_text = result.stdout
|
||||
tm = TeXMacros()
|
||||
txt = tm.process_tex_macros(output_text)
|
||||
tex = txt.split('\n')
|
||||
stex = set(tex)
|
||||
stext = [i for i in stex if len(i) and i.find('\\PP') < 0 and i.find('$$') < 0]
|
||||
stext = [i for i in stex if len(i)
|
||||
and i.find('$$') < 0
|
||||
and i.find('lcroof') < 0
|
||||
and i.find('#') < 0
|
||||
and i.find(r'\\') < 0
|
||||
]
|
||||
df = pd.DataFrame({'expr': stext})
|
||||
print(f'Found {len(df)} snippets!')
|
||||
if out_file != '':
|
||||
p = Path(__file__).parent / out_file
|
||||
print(p)
|
||||
|
||||
+542
-109
@@ -30,6 +30,7 @@ from pandas.api.types import is_datetime64_any_dtype, is_integer_dtype, \
|
||||
from pydantic import ValidationError
|
||||
from rich import box
|
||||
from rich.table import Table
|
||||
from IPython.display import display, SVG
|
||||
|
||||
from . gtenums import Breakability, Alignment
|
||||
from . gtformats import GT_Format, TableFormat, Line, DataRow
|
||||
@@ -578,12 +579,19 @@ class GT(object):
|
||||
if tabs is None:
|
||||
self.tabs = None
|
||||
elif isinstance(tabs, (int, float)):
|
||||
self.tabs = (tabs,)
|
||||
elif isinstance(tabs, (np.ndarray, list, tuple)):
|
||||
self.tabs = tabs # Already iterable, self.tabs = as is
|
||||
self.tabs = (tabs,) * self.ncols
|
||||
elif isinstance(tabs, (np.ndarray, pd.Series, list, tuple)):
|
||||
if len(tabs) == self.ncols:
|
||||
self.tabs = tabs # Already iterable and right length, self.tabs = as is
|
||||
else:
|
||||
logger.error(
|
||||
f'{self.tabs=} has wrong length. Ignoring.')
|
||||
self.tabs = None
|
||||
else:
|
||||
self.tabs = [tabs] # Fallback for anything else
|
||||
# self.equal = equal
|
||||
logger.error(
|
||||
f'{self.tabs=} must be None, a single number, or a list of '
|
||||
'numbers of the correct length. Ignoring.')
|
||||
self.tabs = None
|
||||
|
||||
if self.config.padding_trbl is not None:
|
||||
padding_trbl = self.config_padding_trbl
|
||||
@@ -609,6 +617,7 @@ class GT(object):
|
||||
self._clean_tex = ''
|
||||
self._rich_table = None
|
||||
self._string = ''
|
||||
self._column_width_df = None
|
||||
# finally config.sparsify and then apply formaters
|
||||
# this radically alters the df, so keep a copy for now...
|
||||
self.df_pre_applying_formatters = self.df.copy()
|
||||
@@ -957,6 +966,32 @@ class GT(object):
|
||||
level_changes = tf.idxmax(axis=1)
|
||||
return level_changes
|
||||
|
||||
@property
|
||||
def column_width_df(self):
|
||||
"""
|
||||
The single source of truth for all info about column widths.
|
||||
|
||||
Adds `estimate_column_widths` columns to `make_column_width_df`.
|
||||
"""
|
||||
if self._column_width_df is None:
|
||||
self._column_width_df = self.make_column_width_df()
|
||||
tikz_colw, tabs, scaled_tabs = self.estimate_column_widths()
|
||||
self._column_width_df['tikz_colw'] = tikz_colw
|
||||
self._column_width_df['estimated_tabs'] = tabs
|
||||
self._column_width_df['estimated_scaled_tabs'] = scaled_tabs
|
||||
if self.tabs is not None:
|
||||
self._column_width_df['input_tabs'] = self.tabs
|
||||
else:
|
||||
self._column_width_df['input_tabs'] = -1
|
||||
# this column should be used in place of tabs from estimate_column_widths
|
||||
# in make html and make tikz
|
||||
self._column_width_df['tabs'] = np.maximum(self._column_width_df['input_tabs'],
|
||||
self._column_width_df['estimated_tabs'])
|
||||
self._column_width_df['scaled_tabs'] = np.maximum(self._column_width_df['input_tabs'],
|
||||
self._column_width_df['estimated_scaled_tabs'])
|
||||
|
||||
return self._column_width_df
|
||||
|
||||
def make_column_width_df(self):
|
||||
"""
|
||||
Return dataframe of width information.
|
||||
@@ -1276,7 +1311,7 @@ class GT(object):
|
||||
For example, a column named "location category (float)" might take 3 lines with 0 extra space, but perhaps only 2 lines with 2 extra space, and 1 line with 5 extra space. The relationship is provided in a Pandas DataFrame with columns `col`, `extra`, and `num_lines`.
|
||||
|
||||
Algorithm: Binary Search on the Answer
|
||||
-------------------------------------
|
||||
----------------------------------------
|
||||
|
||||
The problem exhibits a monotonic property: if a table layout can be achieved with a maximum height of `X` lines, it can also be achieved with any maximum height `Y > X` lines (by simply using the same or more `extra` space). This property makes binary search on the *minimum possible maximum lines* an efficient solution.
|
||||
|
||||
@@ -1386,13 +1421,12 @@ class GT(object):
|
||||
The function returns the `optimal_max_lines` and the `best_allocation` dictionary, mapping each column name to the minimal `extra_space` it needs to achieve that optimal height.
|
||||
|
||||
Why this approach is effective:
|
||||
------------------------------
|
||||
---------------------------------
|
||||
|
||||
* **Optimal Solution:** The binary search guarantees finding the absolute minimum possible `max_lines` because it systematically explores the entire solution space.
|
||||
* **Efficiency:** The `check` function runs in time proportional to the number of columns times the average number of `extra` options per column. The binary search itself performs `log(range_of_num_lines)` iterations. This makes the overall complexity efficient for typical table sizes.
|
||||
* **Flexibility:** It does not assume any particular mathematical function relating `extra` space to `num_lines`. It works with arbitrary discrete relationships provided in the input DataFrame, as long as `num_lines` is non-increasing as `extra` increases (which is the natural expectation for this problem).
|
||||
|
||||
|
||||
"""
|
||||
# Pre-processing
|
||||
unique_cols = input_df['col'].unique().tolist()
|
||||
@@ -1465,10 +1499,14 @@ class GT(object):
|
||||
|
||||
return optimal_max_lines, best_allocation
|
||||
|
||||
@staticmethod
|
||||
def estimate_column_widths(df, target_width, nc_index, scale, equal=False):
|
||||
def estimate_column_widths(self):
|
||||
"""
|
||||
Estimate sensible column widths for the dataframe [in what units?]
|
||||
Estimate sensible column widths for the dataframe in character units.
|
||||
|
||||
Used by HTML and TeX output. returns tikz_colw used by TeX output to print
|
||||
the tikz (no impact on output, just makes the produced TeX align nicely),
|
||||
tabs and scaled_tabs (reflecting scale). These three columns are added
|
||||
to the column_width_df.
|
||||
|
||||
Internal variables:
|
||||
mxmn affects alignment: are all columns the same width?
|
||||
@@ -1480,18 +1518,28 @@ class GT(object):
|
||||
:param nc_index: number of columns in the index...these are not counted as "data columns"
|
||||
:param config.equal: if True, try to make all data columns the same width (hint can be rejected)
|
||||
:return:
|
||||
colw affects how the tex is printed to ensure it "looks neat" (actual width of data elements)
|
||||
tikz_colw affects how the tex is printed to ensure it "looks neat" (actual width of data elements)
|
||||
tabs affects the actual output
|
||||
"""
|
||||
# this
|
||||
# local variables (conversion from global method)
|
||||
df = self.df
|
||||
target_width = self.config.max_table_width
|
||||
nc_index = self.nindex
|
||||
scale = self.config.tikz_scale
|
||||
equal = self.config.equal
|
||||
|
||||
# tabs from _tabs, an estimate column widths, determines the size of the table columns as displayed
|
||||
# print(f'{nc_index=}, {scale=}, {config.equal=}')
|
||||
colw = dict.fromkeys(df.columns, 0)
|
||||
# without tex adjustment
|
||||
tikz_colw = dict.fromkeys(df.columns, 0)
|
||||
# with tex adjustment
|
||||
tex_colw = dict.fromkeys(df.columns, 0)
|
||||
headw = dict.fromkeys(df.columns, 0)
|
||||
tabs = []
|
||||
scaled_tabs = []
|
||||
mxmn = {}
|
||||
if df.empty:
|
||||
return colw, tabs
|
||||
return tikz_colw, tabs, scaled_tabs
|
||||
nl = nc_index
|
||||
for i, c in enumerate(df.columns):
|
||||
# figure width of the column labels; if index c= str, if MI then c = tuple
|
||||
@@ -1527,21 +1575,26 @@ class GT(object):
|
||||
try:
|
||||
lens = df.iloc[:, i].map(
|
||||
lambda x: GT.text_display_len(str(x)))
|
||||
colw[c] = lens.max()
|
||||
tex_colw[c] = lens.max()
|
||||
mxmn[c] = (lens.max(), lens.min())
|
||||
raw_lens = df.iloc[:, i].map(len)
|
||||
tikz_colw[c] = raw_lens.max()
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
f'{c} error {e} DO SOMETHING ABOUT THIS...if it never occurs dont need the if')
|
||||
colw[c] = df[c].str.len().max()
|
||||
mxmn[c] = (df[c].str.len().max(), df[c].str.len().min())
|
||||
raise
|
||||
# logger.error(
|
||||
# f'{c} error {e} DO SOMETHING ABOUT THIS...if it never occurs dont need the if')
|
||||
# tikz_colw[c] = df[c].str.len().max()
|
||||
# mxmn[c] = (df[c].str.len().max(), df[c].str.len().min())
|
||||
else:
|
||||
lens = df.iloc[:, i].map(lambda x: GT.text_display_len(str(x)))
|
||||
colw[c] = lens.max()
|
||||
tex_colw[c] = lens.max()
|
||||
mxmn[c] = (lens.max(), lens.min())
|
||||
# print(f'{headw[c]=}, {colw[c]=}, {mxmn[c]=}, {c=}')
|
||||
raw_lens = df.iloc[:, i].map(len)
|
||||
tikz_colw[c] = raw_lens.max()
|
||||
# print(tikz_colw)
|
||||
# now know all column widths...decide what to do
|
||||
# are all the data columns about the same width?
|
||||
data_cols = np.array([colw[k] for k in df.columns[nl:]])
|
||||
data_cols = np.array([tex_colw[k] for k in df.columns[nl:]])
|
||||
same_size = (data_cols.std() <= 0.1 * data_cols.mean())
|
||||
# print(f'same size test requires {data_cols.std()} <= {0.1 * data_cols.mean()}')
|
||||
common_size = 0
|
||||
@@ -1552,7 +1605,7 @@ class GT(object):
|
||||
for i, c in enumerate(df.columns):
|
||||
if i < nl or not same_size:
|
||||
# index columns
|
||||
tabs.append(int(max(colw[c], headw[c])))
|
||||
tabs.append(int(max(tex_colw[c], headw[c])))
|
||||
else:
|
||||
# data all seems about the same width
|
||||
tabs.append(common_size)
|
||||
@@ -1574,11 +1627,14 @@ class GT(object):
|
||||
if data_width and data_width / target_width < 0.9:
|
||||
# don't rescale above 1:1 - don't want too large
|
||||
rescale = min(1 / scale, target_width / data_width)
|
||||
tabs = [w if i < nl else w * rescale for i, w in enumerate(tabs)]
|
||||
scaled_tabs = [w if i < nl else
|
||||
int(w * rescale) for i, w in enumerate(tabs)]
|
||||
logger.info(f'Rescale {rescale} applied; tabs = {tabs}')
|
||||
else:
|
||||
scaled_tabs = tabs
|
||||
# print(f'Rescale {rescale} applied; tabs = {tabs}')
|
||||
# print(f'{colw.values()=}\n{tabs=}')
|
||||
return colw, tabs
|
||||
# print(f'{tikz_colw.values()=}\n{tabs=}')
|
||||
return tikz_colw, tabs, scaled_tabs
|
||||
|
||||
@staticmethod
|
||||
def text_display_len(s: str) -> int:
|
||||
@@ -1672,6 +1728,14 @@ class GT(object):
|
||||
.greater-table > table {{
|
||||
display: inline-table;
|
||||
}} */
|
||||
/* try to turn off Jupyter and other formats for greater-table
|
||||
all: unset => reset all inherited styles
|
||||
display: revert -> put back to defaults
|
||||
#greater-table * {{
|
||||
all: unset;
|
||||
display: revert;
|
||||
}}
|
||||
*/
|
||||
/* tag formats */
|
||||
#{self.df_id} caption {{
|
||||
padding: {2 * padt}px {padr}px {padb}px {padl}px;
|
||||
@@ -1774,19 +1838,8 @@ class GT(object):
|
||||
idx_header = bit.iloc[:self.nindex, :self.ncolumns]
|
||||
columns = bit.iloc[self.nindex:, :self.ncolumns]
|
||||
|
||||
colw, tabs = GT.estimate_column_widths(
|
||||
self.df, self.config.max_table_width, nc_index=self.nindex, scale=1, equal=self.config.equal)
|
||||
if self.config.debug:
|
||||
print(f'Make html Input {self.tabs=}\nComputed {tabs=}')
|
||||
if self.tabs is not None:
|
||||
if len(tabs) == len(self.tabs):
|
||||
tabs = self.tabs
|
||||
elif len(self.tabs) == 1:
|
||||
tabs = self.tabs * len(tabs)
|
||||
else:
|
||||
logger.error(
|
||||
f'{self.tabs=} must be None, a single number, or a list of numbers of the correct length. Ignoring.')
|
||||
# print('HTML ' + ', '.join([f'{c:,.2f}' for c in tabs]))
|
||||
# figure appropriate widths
|
||||
tabs = self.column_width_df['tabs']
|
||||
|
||||
# set column widths; tabs returns lengths of strings in each column
|
||||
# for proportional fonts, average char is 0.4 to 0.5 em but numbers with
|
||||
@@ -1926,7 +1979,7 @@ class GT(object):
|
||||
return self.df_html
|
||||
|
||||
def clean_style(self, soup):
|
||||
"""Minify CSS inside <style> blocks and remove /* ... */ comments."""
|
||||
"""Minify CSS inside <style> blocks and remove slash-star comments."""
|
||||
if not self.config.debug:
|
||||
for style_tag in soup.find_all("style"):
|
||||
if style_tag.string:
|
||||
@@ -1966,6 +2019,12 @@ class GT(object):
|
||||
latex=None,
|
||||
sparsify=1):
|
||||
"""
|
||||
Write DataFrame to custom tikz matrix.
|
||||
|
||||
Updated version that uses self.df and does not need to
|
||||
reapply formatters or sparsify. Various HTML->TeX replacements
|
||||
are still needed, e.g., dealing with % and _ outside formulas.
|
||||
|
||||
Write DataFrame to custom tikz matrix to allow greater control of
|
||||
formatting and insertion of horizontal and vertical divider lines
|
||||
|
||||
@@ -2012,7 +2071,6 @@ class GT(object):
|
||||
lines lines below these rows, -1 for next to last row
|
||||
etc.; list of ints
|
||||
post_process e.g., non-line commands put at bottom of table
|
||||
|
||||
latex arguments after \begin{table}[latex]
|
||||
caption text for caption
|
||||
|
||||
@@ -2028,7 +2086,7 @@ class GT(object):
|
||||
:return:
|
||||
"""
|
||||
# local variable - with all formatters already applied
|
||||
df = self.apply_formatters(self.raw_df.copy(), mode='raw')
|
||||
df = self.df.copy() # self.apply_formatters(self.raw_df.copy(), mode='raw')
|
||||
caption = self.caption
|
||||
label = self.label
|
||||
# prepare label and caption
|
||||
@@ -2079,31 +2137,16 @@ class GT(object):
|
||||
# always a good idea to do this...need to deal with underscores, %
|
||||
# and it handles index types that are not strings
|
||||
df = GT.clean_index(df)
|
||||
if not np.all([i == 'object' for i in df.dtypes]) and not df.empty:
|
||||
logger.warning('cols of df not all objects (expect all obs at this '
|
||||
'point): ', df.dtypes, sep='\n')
|
||||
# make sure percents are escaped, but not if already escaped
|
||||
df = df.replace(r"(?<!\\)%", r"\%", regex=True)
|
||||
|
||||
# pres_maker.df_to_tikz code line 1931
|
||||
if self.show_index:
|
||||
nc_index = df.index.nlevels
|
||||
with warnings.catch_warnings():
|
||||
warnings.simplefilter(
|
||||
"ignore", category=pd.errors.PerformanceWarning)
|
||||
df = df.reset_index(
|
||||
drop=False, col_level=df.columns.nlevels - 1)
|
||||
if sparsify:
|
||||
if hrule is None:
|
||||
hrule = set()
|
||||
for i in range(sparsify):
|
||||
# TODO update to new sparsify!!
|
||||
df.iloc[:, i], rules = GT.sparsify_old(df.iloc[:, i])
|
||||
# don't want lines everywhere
|
||||
if len(rules) < len(df) - 1:
|
||||
hrule = set(hrule).union(rules)
|
||||
else:
|
||||
nc_index = 0
|
||||
# set in init
|
||||
# self.nindex = self.df.index.nlevels if self.show_index else 0
|
||||
# self.ncolumns = self.df.columns.nlevels
|
||||
# self.ncols = self.df.shape[1]
|
||||
|
||||
nc_index = self.nindex
|
||||
nr_columns = self.ncolumns
|
||||
|
||||
if vrule is None:
|
||||
vrule = set()
|
||||
@@ -2112,7 +2155,6 @@ class GT(object):
|
||||
# to the left of... +1
|
||||
vrule.add(nc_index + 1)
|
||||
|
||||
nr_columns = df.columns.nlevels
|
||||
logger.info(
|
||||
f'rows in columns {nr_columns}, columns in index {nc_index}')
|
||||
|
||||
@@ -2123,21 +2165,14 @@ class GT(object):
|
||||
# number of index columns
|
||||
# have also converted everything to formatted strings
|
||||
# estimate... originally called guess_column_widths, with more parameters
|
||||
colw, tabs = GT.estimate_column_widths(df, self.config.max_table_width, nc_index=nc_index, scale=self.config.tikz_scale, equal=self.config.equal) # noqa
|
||||
if self.config.debug:
|
||||
print(f'Make TikZ Input {self.tabs=}\nComputed {tabs=}')
|
||||
if self.tabs is not None:
|
||||
if len(tabs) == len(self.tabs):
|
||||
tabs = self.tabs
|
||||
elif len(self.tabs) == 1:
|
||||
tabs = self.tabs * len(tabs)
|
||||
else:
|
||||
logger.error(
|
||||
f'{self.tabs=} must be None, a single number, or a list of numbers of the correct length. Ignoring.')
|
||||
# print('TIKZ ' + ', '.join([f'{c:,.2f}' for c in tabs]))
|
||||
# print(f'TIKZ {colw=}, {tabs=}')
|
||||
logger.info(f'tabs: {tabs}')
|
||||
logger.info(f'colw: {colw}')
|
||||
colw = self.column_width_df['tikz_colw']
|
||||
tabs = self.column_width_df['scaled_tabs']
|
||||
# these are indexed with pre-TeX mangling names
|
||||
colw.index = df.columns
|
||||
tabs.index = df.columns
|
||||
|
||||
logger.info('colw: %', colw)
|
||||
logger.info('tabs: %', tabs)
|
||||
|
||||
# alignment dictionaries - these are still used below
|
||||
ad = {'l': 'left', 'r': 'right', 'c': 'center'}
|
||||
@@ -2261,6 +2296,7 @@ class GT(object):
|
||||
# function to convert row numbers to TeX table format (edge case on last row -1 is nr and is caught, -2
|
||||
# is below second to last row = above last row)
|
||||
# shift down extra 1 for the spacer row at the top
|
||||
|
||||
def python_2_tex(x): return x + nr_columns + \
|
||||
2 if x >= 0 else nr + x + 3
|
||||
tb_rules = [nr_columns + 1, python_2_tex(-1)]
|
||||
@@ -2344,6 +2380,389 @@ class GT(object):
|
||||
|
||||
return sio.getvalue()
|
||||
|
||||
# def make_tikz_original(self,
|
||||
# column_sep=4 / 8, # was 3/8
|
||||
# row_sep=1 / 8,
|
||||
# container_env='table',
|
||||
# extra_defs='',
|
||||
# hrule=None,
|
||||
# vrule=None,
|
||||
# post_process='',
|
||||
# label='',
|
||||
# latex=None,
|
||||
# sparsify=1):
|
||||
# """
|
||||
# Write DataFrame to custom tikz matrix to allow greater control of
|
||||
# formatting and insertion of horizontal and vertical divider lines
|
||||
|
||||
# Estimates tabs from text width of fields (not so great if includes
|
||||
# a lot of TeX macros) with a manual override available. Tabs gives
|
||||
# the widths of each field in em (width of M)
|
||||
|
||||
# Standard row height = 1.5em seems to work - set in meta.
|
||||
|
||||
# first and last thick rules
|
||||
# others below (Python, zero-based) row number, excluding title row
|
||||
|
||||
# keyword arguments : value (no newlines in value) escape back slashes!
|
||||
# ``#keyword...`` rows ignored
|
||||
# passed in as a string to facilitate using them with %%pmt?
|
||||
|
||||
# **Rules**
|
||||
|
||||
# * hrule at i means below row i of the table. (1-based) Top, bottom and
|
||||
# below index lines are inserted automatically. Top and bottom lines
|
||||
# are thicker.
|
||||
# * vrule at i means to the left of table column i (1-based); there will
|
||||
# never be a rule to the far right...it looks plebby; remember you must
|
||||
# include the index columns!
|
||||
|
||||
# config.sparsify number of cols of multi index to config.sparsify
|
||||
|
||||
# Issue: column with floats and spaces or missing causes problems (VaR,
|
||||
# TVaR, EPD, mean and CV table)
|
||||
|
||||
# From great.pres_maker.df_to_tikz
|
||||
|
||||
# keyword args:
|
||||
|
||||
# scale picks up self.config.tikz_scale; scale applied to whole
|
||||
# table - default 0.717
|
||||
# height row height, rec. 1 (em)
|
||||
# column_sep col sep in em
|
||||
# row_sep row sep in em
|
||||
# container_env table, figure or sidewaysfigure
|
||||
# color color for text boxes (helps config.debugging)
|
||||
# extra_defs TeX defintions and commands put at top of table,
|
||||
# e.g., \\centering
|
||||
# lines lines below these rows, -1 for next to last row
|
||||
# etc.; list of ints
|
||||
# post_process e.g., non-line commands put at bottom of table
|
||||
# latex arguments after \begin{table}[latex]
|
||||
# caption text for caption
|
||||
|
||||
# Previous version see great.pres_maker
|
||||
# Original version see: C:\\S\\TELOS\\CAS\\AR_Min_Bias\\cvs_to_md.py
|
||||
|
||||
# :param column_sep:
|
||||
# :param row_sep:
|
||||
# :param figure:
|
||||
# :param extra_defs:
|
||||
# :param post_process:
|
||||
# :param label:
|
||||
# :return:
|
||||
# """
|
||||
# # local variable - with all formatters already applied
|
||||
# df = self.apply_formatters(self.raw_df.copy(), mode='raw')
|
||||
# caption = self.caption
|
||||
# label = self.label
|
||||
# # prepare label and caption
|
||||
# if label == '':
|
||||
# lt = ''
|
||||
# label = ''
|
||||
# else:
|
||||
# lt = label
|
||||
# label = f'\\label{{{label}}}'
|
||||
# if caption == '':
|
||||
# if lt != '':
|
||||
# logger.info(
|
||||
# f'You have a label but no caption; the label {label} will be ignored.')
|
||||
# caption = '% caption placeholder'
|
||||
# else:
|
||||
# caption = f'\\caption{{{self.caption}}}\n{label}'
|
||||
|
||||
# if not df.columns.is_unique:
|
||||
# # possible index/body column interaction
|
||||
# raise ValueError('tikz routine requires unique column names')
|
||||
# # {extra_defs}
|
||||
# # centering handled by quarto
|
||||
# header = """
|
||||
# \\begin{{{container_env}}}{latex}
|
||||
# {caption}
|
||||
# \\centering{{
|
||||
# \\begin{{tikzpicture}}[
|
||||
# auto,
|
||||
# transform shape,
|
||||
# nosep/.style={{inner sep=0}},
|
||||
# table/.style={{
|
||||
# matrix of nodes,
|
||||
# row sep={row_sep}em,
|
||||
# column sep={column_sep}em,
|
||||
# nodes in empty cells,
|
||||
# nodes={{rectangle, scale={scale}, text badly ragged {debug}}},
|
||||
# """
|
||||
# # put draw=blue!10 or so in nodes to see the node
|
||||
|
||||
# footer = """
|
||||
# {post_process}
|
||||
|
||||
# \\end{{tikzpicture}}
|
||||
# }} % close centering
|
||||
# \\end{{{container_env}}}
|
||||
# """
|
||||
|
||||
# # always a good idea to do this...need to deal with underscores, %
|
||||
# # and it handles index types that are not strings
|
||||
# df = GT.clean_index(df)
|
||||
# if not np.all([i == 'object' for i in df.dtypes]) and not df.empty:
|
||||
# logger.warning('cols of df not all objects (expect all obs at this '
|
||||
# 'point): ', df.dtypes, sep='\n')
|
||||
# # make sure percents are escaped, but not if already escaped
|
||||
# df = df.replace(r"(?<!\\)%", r"\%", regex=True)
|
||||
|
||||
# # pres_maker.df_to_tikz code line 1931
|
||||
# if self.show_index:
|
||||
# nc_index = df.index.nlevels
|
||||
# with warnings.catch_warnings():
|
||||
# warnings.simplefilter(
|
||||
# "ignore", category=pd.errors.PerformanceWarning)
|
||||
# df = df.reset_index(
|
||||
# drop=False, col_level=df.columns.nlevels - 1)
|
||||
# if sparsify:
|
||||
# if hrule is None:
|
||||
# hrule = set()
|
||||
# for i in range(sparsify):
|
||||
# # TODO update to new sparsify!!
|
||||
# df.iloc[:, i], rules = GT.sparsify_old(df.iloc[:, i])
|
||||
# # don't want lines everywhere
|
||||
# if len(rules) < len(df) - 1:
|
||||
# hrule = set(hrule).union(rules)
|
||||
# else:
|
||||
# nc_index = 0
|
||||
|
||||
# if vrule is None:
|
||||
# vrule = set()
|
||||
# else:
|
||||
# vrule = set(vrule)
|
||||
# # to the left of... +1
|
||||
# vrule.add(nc_index + 1)
|
||||
|
||||
# nr_columns = df.columns.nlevels
|
||||
# logger.info(
|
||||
# f'rows in columns {nr_columns}, columns in index {nc_index}')
|
||||
|
||||
# # internal TeX code (same as HTML code)
|
||||
# matrix_name = self.df_id
|
||||
|
||||
# # note this happens AFTER you have reset the index...need to pass
|
||||
# # number of index columns
|
||||
# # have also converted everything to formatted strings
|
||||
# # estimate... originally called guess_column_widths, with more parameters
|
||||
|
||||
# colw, tabs = GT.estimate_column_widths(df, self.config.max_table_width, nc_index=nc_index, scale=self.config.tikz_scale, equal=self.config.equal) # noqa
|
||||
|
||||
# if self.config.debug:
|
||||
# print(f'Make TikZ Input {self.tabs=}\nComputed {tabs=}')
|
||||
# # print('TIKZ ' + ', '.join([f'{c:,.2f}' for c in tabs]))
|
||||
# # print(f'TIKZ {colw=}, {tabs=}')
|
||||
# logger.info(f'tabs: {tabs}')
|
||||
# logger.info(f'colw: {colw}')
|
||||
|
||||
# # alignment dictionaries - these are still used below
|
||||
# ad = {'l': 'left', 'r': 'right', 'c': 'center'}
|
||||
# ad2 = {'l': '<', 'r': '>', 'c': '^'}
|
||||
# # use df_aligners, at this point the index has been reset
|
||||
# align = []
|
||||
# for n, i in zip(df.columns, self.df_aligners):
|
||||
# if i == 'grt-left':
|
||||
# align.append('l')
|
||||
# elif i == 'grt-right':
|
||||
# align.append('r')
|
||||
# elif i == 'grt-center':
|
||||
# align.append('c')
|
||||
# else:
|
||||
# align.append('l')
|
||||
|
||||
# # start writing
|
||||
# sio = StringIO()
|
||||
# if latex is None:
|
||||
# latex = ''
|
||||
# else:
|
||||
# latex = f'[{latex}]'
|
||||
# debug = ''
|
||||
# if self.config.debug:
|
||||
# # color all boxes
|
||||
# debug = ', draw=blue!10'
|
||||
# else:
|
||||
# debug = ''
|
||||
# sio.write(header.format(container_env=container_env,
|
||||
# caption=caption,
|
||||
# extra_defs=extra_defs,
|
||||
# scale=self.config.tikz_scale,
|
||||
# column_sep=column_sep,
|
||||
# row_sep=row_sep,
|
||||
# latex=latex,
|
||||
# debug=debug))
|
||||
|
||||
# # table header
|
||||
# # title rows, start with the empty spacer row
|
||||
# i = 1
|
||||
# sio.write(
|
||||
# f'\trow {i}/.style={{nodes={{text=black, anchor=north, inner ysep=0, text height=0, text depth=0}}}},\n')
|
||||
# for i in range(2, nr_columns + 2):
|
||||
# sio.write(
|
||||
# f'\trow {i}/.style={{nodes={{text=black, anchor=south, inner ysep=.2em, minimum height=1.3em, font=\\bfseries}}}},\n')
|
||||
|
||||
# # write column spec
|
||||
# for i, w, al in zip(range(1, len(align) + 1), tabs, align):
|
||||
# # average char is only 0.48 of M
|
||||
# # https://en.wikipedia.org/wiki/Em_(gtypography)
|
||||
# if i == 1:
|
||||
# # first column sets row height for entire row
|
||||
# sio.write(f'\tcolumn {i:>2d}/.style={{'
|
||||
# f'nodes={{align={ad[al]:<6s}}}, '
|
||||
# 'text height=0.9em, text depth=0.2em, '
|
||||
# f'inner xsep={column_sep}em, inner ysep=0, '
|
||||
# f'text width={max(2, 0.6 * w):.2f}em}},\n')
|
||||
# else:
|
||||
# sio.write(f'\tcolumn {i:>2d}/.style={{'
|
||||
# f'nodes={{align={ad[al]:<6s}}}, nosep, text width={max(2, 0.6 * w):.2f}em}},\n')
|
||||
# # extra col to right which enforces row height
|
||||
# sio.write(
|
||||
# f'\tcolumn {i+1:>2d}/.style={{text height=0.9em, text depth=0.2em, nosep, text width=0em}}')
|
||||
# sio.write('\t}]\n')
|
||||
|
||||
# sio.write("\\matrix ({matrix_name}) [table, ampersand replacement=\\&]{{\n".format(
|
||||
# matrix_name=matrix_name))
|
||||
|
||||
# # body of table, starting with the column headers
|
||||
# # spacer row
|
||||
# nl = ''
|
||||
# for cn, al in zip(df.columns, align):
|
||||
# s = f'{nl} {{cell:{ad2[al]}{colw[cn]}s}} '
|
||||
# nl = '\\&'
|
||||
# sio.write(s.format(cell=' '))
|
||||
# # include the blank extra last column
|
||||
# sio.write('\\& \\\\\n')
|
||||
# # write header rows (again, issues with multi index)
|
||||
# mi_vrules = {}
|
||||
# sparse_columns = {}
|
||||
# if isinstance(df.columns, pd.MultiIndex):
|
||||
# for lvl in range(len(df.columns.levels)):
|
||||
# nl = ''
|
||||
# sparse_columns[lvl], mi_vrules[lvl] = GT.sparsify_mi(df.columns.get_level_values(lvl),
|
||||
# lvl == len(df.columns.levels) - 1)
|
||||
# for cn, c, al in zip(df.columns, sparse_columns[lvl], align):
|
||||
# # c = wfloat_format(c)
|
||||
# s = f'{nl} {{cell:{ad2[al]}{colw[cn]}s}} '
|
||||
# nl = '\\&'
|
||||
# sio.write(s.format(cell=c + '\\grtspacer'))
|
||||
# # include the blank extra last column
|
||||
# sio.write('\\& \\\\\n')
|
||||
# else:
|
||||
# nl = ''
|
||||
# for c, al in zip(df.columns, align):
|
||||
# # c = wfloat_format(c)
|
||||
# s = f'{nl} {{cell:{ad2[al]}{colw[c]}s}} '
|
||||
# nl = '\\&'
|
||||
# sio.write(s.format(cell=c + '\\grtspacer'))
|
||||
# sio.write('\\& \\\\\n')
|
||||
|
||||
# # write table entries
|
||||
# for idx, row in df.iterrows():
|
||||
# nl = ''
|
||||
# for c, cell, al in zip(df.columns, row, align):
|
||||
# # cell = wfloat_format(cell)
|
||||
# s = f'{nl} {{cell:{ad2[al]}{colw[c]}s}} '
|
||||
# nl = '\\&'
|
||||
# sio.write(s.format(cell=cell))
|
||||
# # if c=='p':
|
||||
# # print('COLp', cell, type(cell), s, s.format(cell=cell))
|
||||
# sio.write('\\& \\\\\n')
|
||||
# sio.write(f'}};\n\n')
|
||||
|
||||
# # decorations and post processing - horizontal and vertical lines
|
||||
# nr, nc = df.shape
|
||||
# # add for the index and the last row plus 1 for the added spacer row at the top
|
||||
# nr += nr_columns + 1
|
||||
# # always include top and bottom
|
||||
# # you input a table row number and get a line below it; it is implemented as a line ABOVE the next row
|
||||
# # function to convert row numbers to TeX table format (edge case on last row -1 is nr and is caught, -2
|
||||
# # is below second to last row = above last row)
|
||||
# # shift down extra 1 for the spacer row at the top
|
||||
# def python_2_tex(x): return x + nr_columns + \
|
||||
# 2 if x >= 0 else nr + x + 3
|
||||
# tb_rules = [nr_columns + 1, python_2_tex(-1)]
|
||||
# if hrule:
|
||||
# hrule = set(map(python_2_tex, hrule)).union(tb_rules)
|
||||
# else:
|
||||
# hrule = list(tb_rules)
|
||||
# logger.debug(f'hlines: {hrule}')
|
||||
|
||||
# # why
|
||||
# yshift = row_sep / 2
|
||||
# xshift = -column_sep / 2
|
||||
# descender_proportion = 0.25
|
||||
|
||||
# # top rule is special
|
||||
# ls = 'thick'
|
||||
# ln = 1
|
||||
# sio.write(
|
||||
# f'\\path[draw, {ls}] ({matrix_name}-{ln}-1.south west) -- ({matrix_name}-{ln}-{nc+1}.south east);\n')
|
||||
|
||||
# for ln in hrule:
|
||||
# ls = 'thick' if ln == nr + nr_columns + \
|
||||
# 1 else ('semithick' if ln == 1 + nr_columns else 'very thin')
|
||||
# if ln < nr:
|
||||
# # line above TeX row ln+1 that exists
|
||||
# sio.write(f'\\path[draw, {ls}] ([yshift={-yshift}em]{matrix_name}-{ln}-1.south west) -- '
|
||||
# f'([yshift={-yshift}em]{matrix_name}-{ln}-{nc+1}.south east);\n')
|
||||
# else:
|
||||
# # line above row below bottom = line below last row
|
||||
# # descenders are 200 to 300 below baseline
|
||||
# ln = nr
|
||||
# sio.write(f'\\path[draw, thick] ([yshift={-descender_proportion-yshift}em]{matrix_name}-{ln}-1.base west) -- '
|
||||
# f'([yshift={-descender_proportion-yshift}em]{matrix_name}-{ln}-{nc+1}.base east);\n')
|
||||
|
||||
# # if multi index put in lines within the index TODO make this better!
|
||||
# if nr_columns > 1:
|
||||
# for ln in range(2, nr_columns + 1):
|
||||
# sio.write(f'\\path[draw, very thin] ([xshift={xshift}em, yshift={-yshift}em]'
|
||||
# f'{matrix_name}-{ln}-{nc_index+1}.south west) -- '
|
||||
# f'([yshift={-yshift}em]{matrix_name}-{ln}-{nc+1}.south east);\n')
|
||||
|
||||
# written = set(range(1, nc_index + 1))
|
||||
# if vrule and self.show_index:
|
||||
# # to left of col, 1 based, includes index
|
||||
# # write these first
|
||||
# # TODO fix madness vrule is to the left, mi_vrules are to the right...
|
||||
# ls = 'very thin'
|
||||
# for cn in vrule:
|
||||
# if cn not in written:
|
||||
# sio.write(f'\\path[draw, {ls}] ([xshift={xshift}em]{matrix_name}-1-{cn}.south west) -- '
|
||||
# f'([yshift={-descender_proportion-yshift}em, xshift={xshift}em]{matrix_name}-{nr}-{cn}.base west);\n')
|
||||
# written.add(cn - 1)
|
||||
|
||||
# if len(mi_vrules) > 0:
|
||||
# logger.debug(
|
||||
# f'Generated vlines {mi_vrules}; already written {written}')
|
||||
# # vertical rules for the multi index
|
||||
# # these go to the RIGHT of the relevant column and reflect the index columns already
|
||||
# # mi_vrules = {level of index: [list of vrule columns]
|
||||
# # written keeps track of which vrules have been done already; start by cutting out the index columns
|
||||
# ls = 'ultra thin'
|
||||
# for k, cols in mi_vrules.items():
|
||||
# # don't write the lowest level
|
||||
# if k == len(mi_vrules) - 1:
|
||||
# break
|
||||
# for cn in cols:
|
||||
# if cn in written:
|
||||
# pass
|
||||
# else:
|
||||
# written.add(cn)
|
||||
# top = k + 1
|
||||
# if top == 0:
|
||||
# sio.write(f'\\path[draw, {ls}] ([xshift={-xshift}em]{matrix_name}-{top}-{cn}.south east) -- '
|
||||
# f'([yshift={-descender_proportion-yshift}em, xshift={-xshift}em]{matrix_name}-{nr}-{cn}.base east);\n')
|
||||
# else:
|
||||
# sio.write(f'\\path[draw, {ls}] ([xshift={-xshift}em, yshift={-yshift}em]{matrix_name}-{top}-{cn}.south east) -- '
|
||||
# f'([yshift={-descender_proportion-yshift}em, xshift={-xshift}em]{matrix_name}-{nr}-{cn}.base east);\n')
|
||||
|
||||
# sio.write(footer.format(container_env=container_env,
|
||||
# post_process=post_process))
|
||||
|
||||
# return sio.getvalue()
|
||||
|
||||
@staticmethod
|
||||
def sparsify(df, cs):
|
||||
out = df.copy()
|
||||
@@ -2417,15 +2836,16 @@ class GT(object):
|
||||
except:
|
||||
return str(n)
|
||||
|
||||
# @staticmethod
|
||||
# def clean_underscores(s):
|
||||
# """
|
||||
# check s for unescaped _s
|
||||
# returns true if all _ escaped else false
|
||||
# :param s:
|
||||
# :return:
|
||||
# """
|
||||
# return np.all([s[x.start() - 1] == '\\' for x in re.finditer('_', s)])
|
||||
@staticmethod
|
||||
def clean_index(df):
|
||||
"""
|
||||
escape _ for columns and index, being careful about subscripts
|
||||
in TeX formulas.
|
||||
|
||||
:param df:
|
||||
:return:
|
||||
"""
|
||||
return df.rename(index=GT.clean_name, columns=GT.clean_name)
|
||||
|
||||
@staticmethod
|
||||
def clean_html_tex(text):
|
||||
@@ -2439,17 +2859,6 @@ class GT(object):
|
||||
text = re.sub(r'(?<!\$)\$(.*?)(?<!\\)\$(?!\$)', r'\\(\1\\)', text)
|
||||
return text
|
||||
|
||||
@staticmethod
|
||||
def clean_index(df):
|
||||
"""
|
||||
escape _ for columns and index, being careful about subscripts
|
||||
in TeX formulas.
|
||||
|
||||
:param df:
|
||||
:return:
|
||||
"""
|
||||
return df.rename(index=GT.clean_name, columns=GT.clean_name)
|
||||
|
||||
@staticmethod
|
||||
def md_to_df(txt):
|
||||
"""Convert markdown text string table to DataFrame."""
|
||||
@@ -2523,9 +2932,8 @@ class GT(object):
|
||||
if self.df.empty:
|
||||
return ""
|
||||
if self._string == "":
|
||||
cw_df = self.make_column_width_df()
|
||||
cw = cw_df['recommended']
|
||||
aligners = cw_df['alignment']
|
||||
cw = self.column_width_df['recommended']
|
||||
aligners = self.column_width_df['alignment']
|
||||
self._string = GT.make_text_table(
|
||||
self.df, cw, aligners, index_levels=self.nindex)
|
||||
return self._string
|
||||
@@ -2728,9 +3136,8 @@ class GT(object):
|
||||
self.config.table_width_mode = 'explicit'
|
||||
self.config.max_table_width = console.width
|
||||
# figure col widths and aligners
|
||||
cw_df = self.make_column_width_df()
|
||||
cw = cw_df['recommended']
|
||||
aligners = cw_df['alignment']
|
||||
cw = self.column_widths['recommended']
|
||||
aligners = self.column_widths['alignment']
|
||||
show_lines = self.config.hrule_widths[0] > 0
|
||||
|
||||
self._rich_table = table = GT.make_rich_table(self.df,
|
||||
@@ -2747,13 +3154,27 @@ class GT(object):
|
||||
|
||||
def make_svg(self):
|
||||
"""Render tikz into svg text."""
|
||||
tz = TikzProcessor(self._repr_latex_(), file_name=self.df_id)
|
||||
tz = TikzProcessor(self._repr_latex_(),
|
||||
file_name=self.df_id, debug=self.config.debug)
|
||||
p = tz.file_path.with_suffix('.svg')
|
||||
if not p.exists():
|
||||
tz.process_tikz(verbose=False)
|
||||
try:
|
||||
tz.process_tikz()
|
||||
except ValueError as e:
|
||||
print(e)
|
||||
return "no svg output"
|
||||
|
||||
txt = p.read_text()
|
||||
return txt
|
||||
|
||||
def show_svg(self):
|
||||
"""Display svg in Jupyter."""
|
||||
svg = self.make_svg()
|
||||
if svg != 'no svg output':
|
||||
display(SVG(svg))
|
||||
else:
|
||||
print('No SVG file available (TeX compile error).')
|
||||
|
||||
def save_html(self, fn):
|
||||
"""Save HTML to file."""
|
||||
p = Path(fn)
|
||||
@@ -2762,3 +3183,15 @@ class GT(object):
|
||||
soup = BeautifulSoup(self.html, 'html.parser')
|
||||
p.write_text(soup.prettify(), encoding='utf-8')
|
||||
logger.info(f'Saved to {p}')
|
||||
|
||||
@staticmethod
|
||||
def uber_test(df, **kwargs):
|
||||
"""Print various diagnostics and all the formats."""
|
||||
f = GT(df, **kwargs)
|
||||
display(f)
|
||||
print(f)
|
||||
f.show_svg()
|
||||
display(df)
|
||||
display(f.column_width_df)
|
||||
print(f.make_tikz())
|
||||
return f
|
||||
|
||||
@@ -4,6 +4,7 @@ Make fake dataframes for testing.
|
||||
GPT from SJMM design.
|
||||
"""
|
||||
|
||||
from collections import deque
|
||||
from datetime import datetime, timedelta
|
||||
from importlib.resources import files
|
||||
from itertools import cycle, chain
|
||||
@@ -117,6 +118,14 @@ class TestDataFrameFactory:
|
||||
|
||||
# lengths of index (word count) sampled from:
|
||||
self.index_value_lengths = [1]*10 + [2] * 4 + [3]
|
||||
self.cache = deque(maxlen=10)
|
||||
|
||||
# def cache(self, n=0):
|
||||
# """Get nth item ago from cache, default = 0, latest."""
|
||||
# if n < len(self._cache):
|
||||
# return self._cache[n]
|
||||
# else:
|
||||
# print(f'Cache only contains {len(self._cache)} < {n} items.')
|
||||
|
||||
def make(self, rows: int, columns: Union[int, str], index: Union[int, str] = 0,
|
||||
col_index: Union[int, str] = 0, missing: float = 0.0) -> pd.DataFrame:
|
||||
@@ -168,14 +177,15 @@ class TestDataFrameFactory:
|
||||
self.rng = np.random.default_rng(self.seed)
|
||||
return self._generate(**self._last_args)
|
||||
|
||||
def random(self, index_levels: int = 0, column_levels: int = 0) -> pd.DataFrame:
|
||||
def random(self, index_levels: int = 0, column_levels: int = 0, omit: str = 'p') -> pd.DataFrame:
|
||||
"""
|
||||
Generate a DataFrame with randomly chosen settings.
|
||||
|
||||
|
||||
Args:
|
||||
index_levels: Number of index levels to use.
|
||||
column_levels: Number of column MultiIndex levels.
|
||||
|
||||
omit: omit column datatypes in omit
|
||||
Returns:
|
||||
DataFrame
|
||||
"""
|
||||
@@ -184,8 +194,10 @@ class TestDataFrameFactory:
|
||||
if column_levels == 0:
|
||||
column_levels = random.choice([1, 1, 1, 1, 1, 2, 2, 3])
|
||||
rows = self.rng.integers(5 * index_levels, 10 * index_levels)
|
||||
valid_types = [i for i in ['d', 'f', 'i', 's3', 'l', 'h', 't', 'p', 'x', 'r', 'y']
|
||||
if i not in omit]
|
||||
col_types = self.rng.choice(
|
||||
['d', 'f', 'i', 's3', 'l', 'h', 't', 'p'], size=self.rng.integers(3, 7))
|
||||
valid_types, size=self.rng.integers(3, 7))
|
||||
missing = round(float(self.rng.uniform(0, 0.15)), 2)
|
||||
index = ''.join(self.rng.choice(
|
||||
['t', 'd', 'y', 'i', 's2'], size=index_levels))
|
||||
@@ -223,6 +235,7 @@ class TestDataFrameFactory:
|
||||
df.columns = col_idx
|
||||
df.index = self._make_index(index, rows)
|
||||
df = self._insert_missing(df, missing)
|
||||
self.cache.appendleft(df)
|
||||
return df
|
||||
|
||||
def _parse_colspec(self, spec: str) -> list[str]:
|
||||
@@ -278,7 +291,7 @@ class TestDataFrameFactory:
|
||||
if len(desc) == 1:
|
||||
if desc[0] == 'i':
|
||||
return pd.RangeIndex(n, name=self.index_name())
|
||||
elif desc[0] in ('d', 't', 'x'):
|
||||
elif desc[0] in ('d', 't', 'x', 'y'):
|
||||
vals = self._generate_column(desc[0], n)
|
||||
return pd.Index(vals, name=self.index_name())
|
||||
elif not all(i[0] == 's' for i in desc):
|
||||
|
||||
+36
-23
@@ -15,6 +15,7 @@ from .hasher import txt_short_hash
|
||||
|
||||
|
||||
class TikzProcessor:
|
||||
"""Create PDF and SVG files from Tikz blocks."""
|
||||
# Full TeX preamble to generate a .fmt if needed
|
||||
_tex_template_full = r"""\documentclass[10pt, border=5mm]{standalone}
|
||||
\usepackage{amsfonts}
|
||||
@@ -41,7 +42,8 @@ class TikzProcessor:
|
||||
"""
|
||||
|
||||
|
||||
def __init__(self, txt, file_name='', base_path='.', tex_engine='pdflatex'):
|
||||
def __init__(self, txt, file_name='', base_path='.', tex_engine='pdflatex', debug=False):
|
||||
"""Create object from txt, a TeX blob containing a tikzpicture."""
|
||||
self.txt = txt
|
||||
self.tex_engine = tex_engine
|
||||
self.base_path = Path(base_path).resolve()
|
||||
@@ -50,6 +52,7 @@ class TikzProcessor:
|
||||
file_name = file_name or txt_short_hash(txt)
|
||||
self.file_path = self.out_path / file_name
|
||||
self.format_file = self.out_path / 'tikz_format.fmt'
|
||||
self.debug = debug
|
||||
|
||||
def split_tikz(self):
|
||||
"""Split text to extract the TikZ picture."""
|
||||
@@ -59,7 +62,7 @@ class TikzProcessor:
|
||||
"""Create format file for faster compilation if missing."""
|
||||
if self.format_file.exists():
|
||||
return
|
||||
print('building format file...')
|
||||
print('TikzProcessor: building TeX format fmt file...', end ='')
|
||||
tmp = self.out_path / 'tikz_format.tex'
|
||||
tmp.write_text(self._tex_template_full, encoding='utf-8')
|
||||
self.run_command([
|
||||
@@ -71,9 +74,9 @@ class TikzProcessor:
|
||||
], raise_on_error=True, cwd=self.out_path)
|
||||
# tmp.unlink()
|
||||
(self.out_path / f'{self.format_file.stem}.log').unlink()
|
||||
print('building format file...success', self.format_file.resolve())
|
||||
print('...success...format file built', self.format_file.resolve())
|
||||
|
||||
def process_tikz(self, verbose=False):
|
||||
def process_tikz(self):
|
||||
"""Compile TikZ to PDF and convert to SVG."""
|
||||
tikz_begin, tikz_code, tikz_end = self.split_tikz()[1:4]
|
||||
tex_code = self._tex_template.format(
|
||||
@@ -95,40 +98,50 @@ class TikzProcessor:
|
||||
f'--output-directory={str(tex_path.parent)}',
|
||||
str(tex_path)
|
||||
]
|
||||
if verbose:
|
||||
if self.debug:
|
||||
print("Running:", " ".join(tex_cmd))
|
||||
self.run_command(tex_cmd)
|
||||
if self.run_command(tex_cmd):
|
||||
raise ValueError('TeX failed to compile, not pdf or svg output.')
|
||||
# no tidying up
|
||||
else:
|
||||
# continue
|
||||
|
||||
(tex_path.parent / 'make_tikz.bat').write_text(" ".join(tex_cmd), encoding='utf-8')
|
||||
(tex_path.parent / 'make_tikz.bat').write_text(" ".join(tex_cmd), encoding='utf-8')
|
||||
|
||||
svg_cmd = [
|
||||
'C:\\temp\\pdf2svg-windows\\dist-64bits\\pdf2svg',
|
||||
str(pdf_path),
|
||||
str(svg_path)
|
||||
]
|
||||
if verbose:
|
||||
print("Running:", " ".join(svg_cmd))
|
||||
self.run_command(svg_cmd, raise_on_error=False)
|
||||
svg_cmd = [
|
||||
# 'C:\\temp\\pdf2svg-windows\\dist-64bits\\pdf2svg',
|
||||
'pdf2svg',
|
||||
str(pdf_path),
|
||||
str(svg_path)
|
||||
]
|
||||
if self.debug:
|
||||
print("Running:", " ".join(svg_cmd))
|
||||
self.run_command(svg_cmd, raise_on_error=True)
|
||||
|
||||
if not verbose:
|
||||
for ext in ('.tex', '.aux', '.log', '.pdf'):
|
||||
path = tex_path.with_suffix(ext)
|
||||
if path.exists():
|
||||
path.unlink()
|
||||
if not self.debug:
|
||||
for ext in ('.tex', '.aux', '.log', '.pdf'):
|
||||
path = tex_path.with_suffix(ext)
|
||||
if path.exists():
|
||||
path.unlink()
|
||||
|
||||
def display(self):
|
||||
"""Display the SVG in Jupyter."""
|
||||
display(SVG(self.file_path.with_suffix('.svg')))
|
||||
|
||||
@staticmethod
|
||||
def run_command(command, raise_on_error=True, cwd=None):
|
||||
def run_command(self, command, raise_on_error=True, cwd=None):
|
||||
"""Run command with subprocess and show output."""
|
||||
with Popen(command, cwd=cwd, stdout=PIPE, stderr=PIPE, universal_newlines=True) as p:
|
||||
stdout, stderr = p.communicate()
|
||||
if stdout and self.debug:
|
||||
print('Run command output ends\n', stdout.strip()[-250:])
|
||||
if stdout:
|
||||
print(stdout.strip()[-250:])
|
||||
if stdout.find('no output PDF file produced') > 0:
|
||||
print("ERROR no pdf output\n"*5)
|
||||
return -1
|
||||
if stderr:
|
||||
if raise_on_error:
|
||||
raise RuntimeError(stderr.strip())
|
||||
else:
|
||||
print(stderr.strip())
|
||||
return -2
|
||||
return 0
|
||||
|
||||
Reference in New Issue
Block a user