3.2.0 integration of tikz, better column width

More TeX snippets.
This commit is contained in:
Stephen Mildenhall
2025-06-15 12:10:48 +01:00
parent 85a11fb61b
commit f1643f0bda
12 changed files with 30257 additions and 6121 deletions
+103 -118
View File
@@ -1,17 +1,9 @@
sort out the variety of readmes... this is the main one
# `greater_tables` Project
https://shields.io/badges/read-the-docs
![GitHub commit activity](https://img.shields.io/github/commit-activity/y/mynl/greater_tables_project)
# v 3.0 update
* config files
* unified col width and info dataframe
* de-texing
* cli for config and writeout a csv etc.
* tests
# TODO
## TODO
* Ratio cols with multi index columns
* % in tex output - never allow comments?
@@ -19,109 +11,14 @@ https://shields.io/badges/read-the-docs
* ?Option to hide index
* Bring over the roll your own logger
***
## Greater Tables
# from GPT
Display graphics vs. data tables
no colors, sparklines, shading, ...
Absolutely—heres a structured summary of everything weve covered, organized by topic.
---
## 🧱 Project Structure & Philosophy
* Your `GreaterTables` class formats a `pandas.DataFrame` to **HTML, text, or LaTeX**.
* The class is **immutable**: formatting is fixed at construction time, like a pure value object.
* You avoid branchy, incremental APIs (like `ggplot`) and prefer creating fresh objects.
* You wanted a way to handle growing config complexity—→ led to a **YAML config + Pydantic schema** design.
---
## 📁 Project Layout
```
greater_tables_project/
├── greater_tables/
│ ├── __init__.py
│ ├── gtconfig.py ← config model + loader
│ ├── gtcore.py ← GreaterTables class
│ └── defaults/
│ └── config_template.yaml
├── tests/
├── pyproject.toml
```
* `GTConfigModel` = schema + default source of truth
* `GTConfig` = singleton loader and validator
* `config_template.yaml` = editable fallback + documentation base
---
## 🔧 Config Management
* All defaults and types are declared in `GTConfigModel` (Pydantic).
* Config is **loaded from YAML**, validated by `GTConfigModel`.
* You can **generate** a valid config file from the model using `.model_dump() → YAML`.
* Singleton pattern (`GTConfig.__new__`) caches the config at runtime.
### Helpers
* `GTConfig().get(overrides=...)` gives a safe, override-able config
* `write_template(path)` writes a default config YAML for user to edit
---
## 🛠 Git Workflow (Solo Dev, Linear)
* Use **tags** (`git tag v0.2.0`) to label stable versions
* Use **`git reset --hard <tag>`** to roll back and discard later commits
* Avoid branches entirely—keep a **single linear history**
* Tags let you bounce around safely, with names instead of hashes
* Releases on GitHub are tags + metadata, optional for publishing
---
## ⚙️ CLI Tool
* Built with `click`, with subcommands:
* `gt render data.csv --format html`
* `gt write-template`
* Reads any Pandas-supported file (`.csv`, `.feather`, `.pkl`, etc.)
* Outputs to console or to file
* Uses current config by default, or override with `--config path.yaml`
---
## 🧠 Design Principles Youre Following
| Principle | Your Approach |
| ---------------------------- | ------------------------------------------- |
| Immutability | `GT(df, config)` is fixed once created |
| Separation of concerns | `GTConfigModel` holds defaults/types |
| Config as code/documentation | `config_template.yaml` generated from model |
| CLI-first mindset | `click` used to expose functionality |
| Linear Git workflow | Tags for rollback, no branches |
---
Let me know if you want me to generate:
* a Markdown doc for contributors
* a `.bat` script to roll back to a tag
* test scaffolding or release automation
You're in great shape. Gum-level perfection achieved.
***
# OLD
# Greater Tables
Creating presentation quality tables from pandas dataframes is frustrating. It is hard to left-align text and right-align numbers using pandas `display` or `df.to_html`. The `great_tables` package does a really nice job with pandas and polars dataframes but does not support indexes or TeX output.
Creating presentation quality tables is difficult. It is hard to left-align text and right-align numbers using pandas `display` or `df.to_html`. The `great_tables` package does a really nice job with pandas and polars dataframes but does not support indexes or TeX output.
This package provides consistent HTML and TeX table output with flexible type-based formatting, and table rules. Neither output relies on the pandas `to_html` or `to_latex` functions. TeX output uses Tikz tables for very tight control over layout and grid lines. The package is designed for use in Jupyter Lab notebooks Quarto documents.
@@ -137,9 +34,19 @@ been maintaining a set of macros called
reusable, extensible actuarial tools) in VBA and Python since the late
1990s, and call all my macro packages "GREAT".
## Documentation
![](https://img.shields.io/readthedocs/greater_tables_project)
Available on
[readthedocs](https://greater-tables-project.readthedocs.io/en/latest).
## Installation
``` python
![](https://img.shields.io/pypi/format/greater_tables)
```python
pip install greater-tables
```
@@ -148,7 +55,7 @@ pip install greater-tables
The following example shows quite a hard table. It is formatted using
the `sGT` class, which is a subclass of `GT` with a few defaults set.
``` {.python .cell-code}
```python
import pandas as pd
import numpy as np
from greater_tables import sGT
@@ -196,21 +103,37 @@ The output illustrates:
More coming soon.
## Documentation
![](https://img.shields.io/readthedocs/greater_tables_project)
## History
Available on
[readthedocs](https://greater-tables-project.readthedocs.io/en/latest).
3.2.0
-------
* Added more tex snippets!
* Refactored tikz and column width behavior
## Versions
3.1.0
-------
* adjustments for auto format
* rearranged gtcore order of methods
3.0.0
-------
* config files / pydantic config input
* unified col width and info dataframe
* de-texing
* cli for config and writeout a csv etc.
* testdf suite
* Automated TeX to SVG
2.0.0
------
* **v2.0.0** solid release old-style, all-argument GT
* Better column widths
* Custom text output
* Rich table output
1.1.1
-------
* Added logo, updated docs.
@@ -251,3 +174,65 @@ Early development
## 📁 Project Layout
```
greater_tables_project/
| LICENSE
| pyproject.toml
| README.md
|
+---dist
|
+---docs
| | books.bib
| | conf.py
| | greater_tables.data.rst
| | greater_tables.rst
| | index.rst
| | library.bib
| | make.bat
| | Makefile
| | modules.rst
| | start-server.bat
| | style.csl
| |
+---greater_tables
| | __init__.py
| | cli.py
| | gtconfig.py
| | gtcore.py
| | gtenums.py
| | gtformats.py
| | hasher.py
| | testdf.py
| | tex_svg.py
| |
| +---data
| | | __init__.py
| | | tex_list.csv
| | | tex_list.py
| | | words-12.md
|
+---greater_tables.egg-info
|
+---img
| hard-html.png
| hard-tex.png
```
## 🧠 Design Principles Youre Following
| Principle | Your Approach |
| ---------------------------- | ------------------------------------------- |
| Immutability | `GT(df, config)` is fixed once created |
| Separation of concerns | `GTConfigModel` holds defaults/types |
| Config as code/documentation | `config_template.yaml` generated from model |
| CLI-first mindset | `click` used to expose functionality |
| Linear Git workflow | Tags for rollback, no branches |
+2 -5
View File
@@ -14,11 +14,8 @@ Welcome to greater_tables's documentation!
greater_tables.data
Introduction
============
.. mdinclude:: ../README.md
.. include:: ../README.md
:parser: myst_parser.sphinx_
Other
=======
+1 -1
View File
@@ -1,4 +1,4 @@
__version__ = '3.1.0'
__version__ = '3.2.0'
__project__ = 'greater_tables'
__author__ = 'Stephen J Mildenhall'
+11 -5
View File
@@ -1,18 +1,20 @@
import click
import pandas as pd
from pathlib import Path
from .gtconfig import GTConfig, write_template
from .gtcore import GreaterTables
from .gtconfig import GTConfigModel, write_template
from .gtcore import GT
@click.group()
def cli():
"""Greater Tables CLI tool"""
pass
@cli.command()
@click.argument("input_file", type=click.Path(exists=True))
@click.option("--output", "-o", type=click.Path(), help="Write rendered output to file")
@click.option("--format", "-f", type=click.Choice(["html", "text", "latex"]), default="html")
@click.option("--format", "-f", type=click.Choice(["html", "text", "latex", "svg", "pdf"]), default="html")
@click.option("--config", type=click.Path(), help="Path to a YAML config file")
def render(input_file, output, format, config):
"""Render a table from a data file."""
@@ -28,8 +30,8 @@ def render(input_file, output, format, config):
else:
raise click.UsageError(f"Unsupported extension: {ext}")
cfg = GTConfig(Path(config) if config else None).get()
gt = GreaterTables(df, config=cfg)
cfg = GTConfigModel(Path(config) if config else None).get()
gt = GT(df, config=cfg)
rendered = (
gt.render_html() if format == "html"
@@ -37,11 +39,15 @@ def render(input_file, output, format, config):
else gt.render_latex()
)
if format in ('svg', 'pdf'):
print('more work to do!!')
if output:
Path(output).write_text(rendered, encoding="utf-8")
else:
print(rendered)
@cli.command()
@click.argument("path", type=click.Path(), default="config.yaml")
def write_template(path):
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
+31 -8
View File
@@ -19,12 +19,15 @@ class TeXMacros():
from great2.blog
"""
_macros = r"""
\def\AA{\mathcal{A}}
_macros = r"""\def\AA{\mathcal{A}}
\def\atan{\mathrm{atan}}
\def\A{\mathcal{A}}
\def\B{\mathcal{B}}
\def\BB{\mathbb{B}}
\def\AVaR{\mathsf{AVaR}}
\def\bbeta{\mathbf{\beta}}
\def\bb{\mathbf b}
\def\bfx{\mathbf x}
\def\bm{\mathbf }
\def\biTVaR{\mathsf{biTVaR}}
\def\corr{\mathsf{Corr}}
@@ -38,9 +41,12 @@ class TeXMacros():
\def\ecirc{\accentset{\circ} e}
\def\EPD{\mathsf{EPD}}
\def\ES{\mathsf{ES}}
\def\esssup{\mathrm{ess\,sup}}
\def\E{\mathsf{E}}
\def\F{\mathscr{F}}
\def\FFF{\mathscr{F}}
\def\FF{\mathcal{F}}
\def\G{\mathscr{G}}
\def\HH{\mathbf{H}}
\def\kpx{{{}_kp_x}}
\def\MM{\mathcal{M}}
@@ -50,10 +56,13 @@ class TeXMacros():
\def\OO{\mathscr{O}}
\def\PPP{\mathscr{P}}
\def\PP{\mathsf{P}}
\def\P{\mathsf{Pr}}
\def\Pr{\mathsf{Pr}}
\def\QQ{\mathsf{Q}}
\def\Q{\mathbb{Q}}
\def\RR{\mathbb{R}}
\def\SD{\mathsf{SD}}
\def\spcer{\ }
\def\TCE{\mathsf{TCE}}
\def\TVaR{\mathsf{TVaR}}
\def\Var{\mathsf{Var}}
@@ -66,8 +75,7 @@ class TeXMacros():
\def\XX{\mathbf{X}}
\def\yy{\mathbf{y}}
\def\ZZZ{\mathcal{Z}}
\def\ZZ{\mathbb{Z}}
"""
\def\ZZ{\mathbb{Z}}"""
@staticmethod
def process_tex_macros(text):
@@ -96,22 +104,37 @@ class TeXMacros():
i = x.find('{')
return x[:i], x[i + 1:-1]
def find_tex_snippeets(in_dir='\\S\\TELOS\\PIR\\docs',
def find_tex_snippets(in_dir='\\S\\TELOS\\PIR\\docs',
out_file='tex_list.csv'):
"""Ripgrep / TeX macro expand list of TeX snippets."""
# prod run with \\s\\telos\\ (!)
in_dir = str(Path(in_dir))
cmd = ['rg', '-N', '-o', '--no-filename',
'-g', '*.md',
'-g', '*.qmd',
r'\$.+?\$',
in_dir]
result = subprocess.run(
['rg', '-N', '-o', '--no-filename', '-g', '*.md', r'\$.+?\$', in_dir],
cmd,
capture_output=True,
text=True,
check=True
check=True,
encoding='utf-8'
)
output_text = result.stdout
tm = TeXMacros()
txt = tm.process_tex_macros(output_text)
tex = txt.split('\n')
stex = set(tex)
stext = [i for i in stex if len(i) and i.find('\\PP') < 0 and i.find('$$') < 0]
stext = [i for i in stex if len(i)
and i.find('$$') < 0
and i.find('lcroof') < 0
and i.find('#') < 0
and i.find(r'\\') < 0
]
df = pd.DataFrame({'expr': stext})
print(f'Found {len(df)} snippets!')
if out_file != '':
p = Path(__file__).parent / out_file
print(p)
+542 -109
View File
@@ -30,6 +30,7 @@ from pandas.api.types import is_datetime64_any_dtype, is_integer_dtype, \
from pydantic import ValidationError
from rich import box
from rich.table import Table
from IPython.display import display, SVG
from . gtenums import Breakability, Alignment
from . gtformats import GT_Format, TableFormat, Line, DataRow
@@ -578,12 +579,19 @@ class GT(object):
if tabs is None:
self.tabs = None
elif isinstance(tabs, (int, float)):
self.tabs = (tabs,)
elif isinstance(tabs, (np.ndarray, list, tuple)):
self.tabs = tabs # Already iterable, self.tabs = as is
self.tabs = (tabs,) * self.ncols
elif isinstance(tabs, (np.ndarray, pd.Series, list, tuple)):
if len(tabs) == self.ncols:
self.tabs = tabs # Already iterable and right length, self.tabs = as is
else:
logger.error(
f'{self.tabs=} has wrong length. Ignoring.')
self.tabs = None
else:
self.tabs = [tabs] # Fallback for anything else
# self.equal = equal
logger.error(
f'{self.tabs=} must be None, a single number, or a list of '
'numbers of the correct length. Ignoring.')
self.tabs = None
if self.config.padding_trbl is not None:
padding_trbl = self.config_padding_trbl
@@ -609,6 +617,7 @@ class GT(object):
self._clean_tex = ''
self._rich_table = None
self._string = ''
self._column_width_df = None
# finally config.sparsify and then apply formaters
# this radically alters the df, so keep a copy for now...
self.df_pre_applying_formatters = self.df.copy()
@@ -957,6 +966,32 @@ class GT(object):
level_changes = tf.idxmax(axis=1)
return level_changes
@property
def column_width_df(self):
"""
The single source of truth for all info about column widths.
Adds `estimate_column_widths` columns to `make_column_width_df`.
"""
if self._column_width_df is None:
self._column_width_df = self.make_column_width_df()
tikz_colw, tabs, scaled_tabs = self.estimate_column_widths()
self._column_width_df['tikz_colw'] = tikz_colw
self._column_width_df['estimated_tabs'] = tabs
self._column_width_df['estimated_scaled_tabs'] = scaled_tabs
if self.tabs is not None:
self._column_width_df['input_tabs'] = self.tabs
else:
self._column_width_df['input_tabs'] = -1
# this column should be used in place of tabs from estimate_column_widths
# in make html and make tikz
self._column_width_df['tabs'] = np.maximum(self._column_width_df['input_tabs'],
self._column_width_df['estimated_tabs'])
self._column_width_df['scaled_tabs'] = np.maximum(self._column_width_df['input_tabs'],
self._column_width_df['estimated_scaled_tabs'])
return self._column_width_df
def make_column_width_df(self):
"""
Return dataframe of width information.
@@ -1276,7 +1311,7 @@ class GT(object):
For example, a column named "location category (float)" might take 3 lines with 0 extra space, but perhaps only 2 lines with 2 extra space, and 1 line with 5 extra space. The relationship is provided in a Pandas DataFrame with columns `col`, `extra`, and `num_lines`.
Algorithm: Binary Search on the Answer
-------------------------------------
----------------------------------------
The problem exhibits a monotonic property: if a table layout can be achieved with a maximum height of `X` lines, it can also be achieved with any maximum height `Y > X` lines (by simply using the same or more `extra` space). This property makes binary search on the *minimum possible maximum lines* an efficient solution.
@@ -1386,13 +1421,12 @@ class GT(object):
The function returns the `optimal_max_lines` and the `best_allocation` dictionary, mapping each column name to the minimal `extra_space` it needs to achieve that optimal height.
Why this approach is effective:
------------------------------
---------------------------------
* **Optimal Solution:** The binary search guarantees finding the absolute minimum possible `max_lines` because it systematically explores the entire solution space.
* **Efficiency:** The `check` function runs in time proportional to the number of columns times the average number of `extra` options per column. The binary search itself performs `log(range_of_num_lines)` iterations. This makes the overall complexity efficient for typical table sizes.
* **Flexibility:** It does not assume any particular mathematical function relating `extra` space to `num_lines`. It works with arbitrary discrete relationships provided in the input DataFrame, as long as `num_lines` is non-increasing as `extra` increases (which is the natural expectation for this problem).
"""
# Pre-processing
unique_cols = input_df['col'].unique().tolist()
@@ -1465,10 +1499,14 @@ class GT(object):
return optimal_max_lines, best_allocation
@staticmethod
def estimate_column_widths(df, target_width, nc_index, scale, equal=False):
def estimate_column_widths(self):
"""
Estimate sensible column widths for the dataframe [in what units?]
Estimate sensible column widths for the dataframe in character units.
Used by HTML and TeX output. returns tikz_colw used by TeX output to print
the tikz (no impact on output, just makes the produced TeX align nicely),
tabs and scaled_tabs (reflecting scale). These three columns are added
to the column_width_df.
Internal variables:
mxmn affects alignment: are all columns the same width?
@@ -1480,18 +1518,28 @@ class GT(object):
:param nc_index: number of columns in the index...these are not counted as "data columns"
:param config.equal: if True, try to make all data columns the same width (hint can be rejected)
:return:
colw affects how the tex is printed to ensure it "looks neat" (actual width of data elements)
tikz_colw affects how the tex is printed to ensure it "looks neat" (actual width of data elements)
tabs affects the actual output
"""
# this
# local variables (conversion from global method)
df = self.df
target_width = self.config.max_table_width
nc_index = self.nindex
scale = self.config.tikz_scale
equal = self.config.equal
# tabs from _tabs, an estimate column widths, determines the size of the table columns as displayed
# print(f'{nc_index=}, {scale=}, {config.equal=}')
colw = dict.fromkeys(df.columns, 0)
# without tex adjustment
tikz_colw = dict.fromkeys(df.columns, 0)
# with tex adjustment
tex_colw = dict.fromkeys(df.columns, 0)
headw = dict.fromkeys(df.columns, 0)
tabs = []
scaled_tabs = []
mxmn = {}
if df.empty:
return colw, tabs
return tikz_colw, tabs, scaled_tabs
nl = nc_index
for i, c in enumerate(df.columns):
# figure width of the column labels; if index c= str, if MI then c = tuple
@@ -1527,21 +1575,26 @@ class GT(object):
try:
lens = df.iloc[:, i].map(
lambda x: GT.text_display_len(str(x)))
colw[c] = lens.max()
tex_colw[c] = lens.max()
mxmn[c] = (lens.max(), lens.min())
raw_lens = df.iloc[:, i].map(len)
tikz_colw[c] = raw_lens.max()
except Exception as e:
logger.error(
f'{c} error {e} DO SOMETHING ABOUT THIS...if it never occurs dont need the if')
colw[c] = df[c].str.len().max()
mxmn[c] = (df[c].str.len().max(), df[c].str.len().min())
raise
# logger.error(
# f'{c} error {e} DO SOMETHING ABOUT THIS...if it never occurs dont need the if')
# tikz_colw[c] = df[c].str.len().max()
# mxmn[c] = (df[c].str.len().max(), df[c].str.len().min())
else:
lens = df.iloc[:, i].map(lambda x: GT.text_display_len(str(x)))
colw[c] = lens.max()
tex_colw[c] = lens.max()
mxmn[c] = (lens.max(), lens.min())
# print(f'{headw[c]=}, {colw[c]=}, {mxmn[c]=}, {c=}')
raw_lens = df.iloc[:, i].map(len)
tikz_colw[c] = raw_lens.max()
# print(tikz_colw)
# now know all column widths...decide what to do
# are all the data columns about the same width?
data_cols = np.array([colw[k] for k in df.columns[nl:]])
data_cols = np.array([tex_colw[k] for k in df.columns[nl:]])
same_size = (data_cols.std() <= 0.1 * data_cols.mean())
# print(f'same size test requires {data_cols.std()} <= {0.1 * data_cols.mean()}')
common_size = 0
@@ -1552,7 +1605,7 @@ class GT(object):
for i, c in enumerate(df.columns):
if i < nl or not same_size:
# index columns
tabs.append(int(max(colw[c], headw[c])))
tabs.append(int(max(tex_colw[c], headw[c])))
else:
# data all seems about the same width
tabs.append(common_size)
@@ -1574,11 +1627,14 @@ class GT(object):
if data_width and data_width / target_width < 0.9:
# don't rescale above 1:1 - don't want too large
rescale = min(1 / scale, target_width / data_width)
tabs = [w if i < nl else w * rescale for i, w in enumerate(tabs)]
scaled_tabs = [w if i < nl else
int(w * rescale) for i, w in enumerate(tabs)]
logger.info(f'Rescale {rescale} applied; tabs = {tabs}')
else:
scaled_tabs = tabs
# print(f'Rescale {rescale} applied; tabs = {tabs}')
# print(f'{colw.values()=}\n{tabs=}')
return colw, tabs
# print(f'{tikz_colw.values()=}\n{tabs=}')
return tikz_colw, tabs, scaled_tabs
@staticmethod
def text_display_len(s: str) -> int:
@@ -1672,6 +1728,14 @@ class GT(object):
.greater-table > table {{
display: inline-table;
}} */
/* try to turn off Jupyter and other formats for greater-table
all: unset => reset all inherited styles
display: revert -> put back to defaults
#greater-table * {{
all: unset;
display: revert;
}}
*/
/* tag formats */
#{self.df_id} caption {{
padding: {2 * padt}px {padr}px {padb}px {padl}px;
@@ -1774,19 +1838,8 @@ class GT(object):
idx_header = bit.iloc[:self.nindex, :self.ncolumns]
columns = bit.iloc[self.nindex:, :self.ncolumns]
colw, tabs = GT.estimate_column_widths(
self.df, self.config.max_table_width, nc_index=self.nindex, scale=1, equal=self.config.equal)
if self.config.debug:
print(f'Make html Input {self.tabs=}\nComputed {tabs=}')
if self.tabs is not None:
if len(tabs) == len(self.tabs):
tabs = self.tabs
elif len(self.tabs) == 1:
tabs = self.tabs * len(tabs)
else:
logger.error(
f'{self.tabs=} must be None, a single number, or a list of numbers of the correct length. Ignoring.')
# print('HTML ' + ', '.join([f'{c:,.2f}' for c in tabs]))
# figure appropriate widths
tabs = self.column_width_df['tabs']
# set column widths; tabs returns lengths of strings in each column
# for proportional fonts, average char is 0.4 to 0.5 em but numbers with
@@ -1926,7 +1979,7 @@ class GT(object):
return self.df_html
def clean_style(self, soup):
"""Minify CSS inside <style> blocks and remove /* ... */ comments."""
"""Minify CSS inside <style> blocks and remove slash-star comments."""
if not self.config.debug:
for style_tag in soup.find_all("style"):
if style_tag.string:
@@ -1966,6 +2019,12 @@ class GT(object):
latex=None,
sparsify=1):
"""
Write DataFrame to custom tikz matrix.
Updated version that uses self.df and does not need to
reapply formatters or sparsify. Various HTML->TeX replacements
are still needed, e.g., dealing with % and _ outside formulas.
Write DataFrame to custom tikz matrix to allow greater control of
formatting and insertion of horizontal and vertical divider lines
@@ -2012,7 +2071,6 @@ class GT(object):
lines lines below these rows, -1 for next to last row
etc.; list of ints
post_process e.g., non-line commands put at bottom of table
latex arguments after \begin{table}[latex]
caption text for caption
@@ -2028,7 +2086,7 @@ class GT(object):
:return:
"""
# local variable - with all formatters already applied
df = self.apply_formatters(self.raw_df.copy(), mode='raw')
df = self.df.copy() # self.apply_formatters(self.raw_df.copy(), mode='raw')
caption = self.caption
label = self.label
# prepare label and caption
@@ -2079,31 +2137,16 @@ class GT(object):
# always a good idea to do this...need to deal with underscores, %
# and it handles index types that are not strings
df = GT.clean_index(df)
if not np.all([i == 'object' for i in df.dtypes]) and not df.empty:
logger.warning('cols of df not all objects (expect all obs at this '
'point): ', df.dtypes, sep='\n')
# make sure percents are escaped, but not if already escaped
df = df.replace(r"(?<!\\)%", r"\%", regex=True)
# pres_maker.df_to_tikz code line 1931
if self.show_index:
nc_index = df.index.nlevels
with warnings.catch_warnings():
warnings.simplefilter(
"ignore", category=pd.errors.PerformanceWarning)
df = df.reset_index(
drop=False, col_level=df.columns.nlevels - 1)
if sparsify:
if hrule is None:
hrule = set()
for i in range(sparsify):
# TODO update to new sparsify!!
df.iloc[:, i], rules = GT.sparsify_old(df.iloc[:, i])
# don't want lines everywhere
if len(rules) < len(df) - 1:
hrule = set(hrule).union(rules)
else:
nc_index = 0
# set in init
# self.nindex = self.df.index.nlevels if self.show_index else 0
# self.ncolumns = self.df.columns.nlevels
# self.ncols = self.df.shape[1]
nc_index = self.nindex
nr_columns = self.ncolumns
if vrule is None:
vrule = set()
@@ -2112,7 +2155,6 @@ class GT(object):
# to the left of... +1
vrule.add(nc_index + 1)
nr_columns = df.columns.nlevels
logger.info(
f'rows in columns {nr_columns}, columns in index {nc_index}')
@@ -2123,21 +2165,14 @@ class GT(object):
# number of index columns
# have also converted everything to formatted strings
# estimate... originally called guess_column_widths, with more parameters
colw, tabs = GT.estimate_column_widths(df, self.config.max_table_width, nc_index=nc_index, scale=self.config.tikz_scale, equal=self.config.equal) # noqa
if self.config.debug:
print(f'Make TikZ Input {self.tabs=}\nComputed {tabs=}')
if self.tabs is not None:
if len(tabs) == len(self.tabs):
tabs = self.tabs
elif len(self.tabs) == 1:
tabs = self.tabs * len(tabs)
else:
logger.error(
f'{self.tabs=} must be None, a single number, or a list of numbers of the correct length. Ignoring.')
# print('TIKZ ' + ', '.join([f'{c:,.2f}' for c in tabs]))
# print(f'TIKZ {colw=}, {tabs=}')
logger.info(f'tabs: {tabs}')
logger.info(f'colw: {colw}')
colw = self.column_width_df['tikz_colw']
tabs = self.column_width_df['scaled_tabs']
# these are indexed with pre-TeX mangling names
colw.index = df.columns
tabs.index = df.columns
logger.info('colw: %', colw)
logger.info('tabs: %', tabs)
# alignment dictionaries - these are still used below
ad = {'l': 'left', 'r': 'right', 'c': 'center'}
@@ -2261,6 +2296,7 @@ class GT(object):
# function to convert row numbers to TeX table format (edge case on last row -1 is nr and is caught, -2
# is below second to last row = above last row)
# shift down extra 1 for the spacer row at the top
def python_2_tex(x): return x + nr_columns + \
2 if x >= 0 else nr + x + 3
tb_rules = [nr_columns + 1, python_2_tex(-1)]
@@ -2344,6 +2380,389 @@ class GT(object):
return sio.getvalue()
# def make_tikz_original(self,
# column_sep=4 / 8, # was 3/8
# row_sep=1 / 8,
# container_env='table',
# extra_defs='',
# hrule=None,
# vrule=None,
# post_process='',
# label='',
# latex=None,
# sparsify=1):
# """
# Write DataFrame to custom tikz matrix to allow greater control of
# formatting and insertion of horizontal and vertical divider lines
# Estimates tabs from text width of fields (not so great if includes
# a lot of TeX macros) with a manual override available. Tabs gives
# the widths of each field in em (width of M)
# Standard row height = 1.5em seems to work - set in meta.
# first and last thick rules
# others below (Python, zero-based) row number, excluding title row
# keyword arguments : value (no newlines in value) escape back slashes!
# ``#keyword...`` rows ignored
# passed in as a string to facilitate using them with %%pmt?
# **Rules**
# * hrule at i means below row i of the table. (1-based) Top, bottom and
# below index lines are inserted automatically. Top and bottom lines
# are thicker.
# * vrule at i means to the left of table column i (1-based); there will
# never be a rule to the far right...it looks plebby; remember you must
# include the index columns!
# config.sparsify number of cols of multi index to config.sparsify
# Issue: column with floats and spaces or missing causes problems (VaR,
# TVaR, EPD, mean and CV table)
# From great.pres_maker.df_to_tikz
# keyword args:
# scale picks up self.config.tikz_scale; scale applied to whole
# table - default 0.717
# height row height, rec. 1 (em)
# column_sep col sep in em
# row_sep row sep in em
# container_env table, figure or sidewaysfigure
# color color for text boxes (helps config.debugging)
# extra_defs TeX defintions and commands put at top of table,
# e.g., \\centering
# lines lines below these rows, -1 for next to last row
# etc.; list of ints
# post_process e.g., non-line commands put at bottom of table
# latex arguments after \begin{table}[latex]
# caption text for caption
# Previous version see great.pres_maker
# Original version see: C:\\S\\TELOS\\CAS\\AR_Min_Bias\\cvs_to_md.py
# :param column_sep:
# :param row_sep:
# :param figure:
# :param extra_defs:
# :param post_process:
# :param label:
# :return:
# """
# # local variable - with all formatters already applied
# df = self.apply_formatters(self.raw_df.copy(), mode='raw')
# caption = self.caption
# label = self.label
# # prepare label and caption
# if label == '':
# lt = ''
# label = ''
# else:
# lt = label
# label = f'\\label{{{label}}}'
# if caption == '':
# if lt != '':
# logger.info(
# f'You have a label but no caption; the label {label} will be ignored.')
# caption = '% caption placeholder'
# else:
# caption = f'\\caption{{{self.caption}}}\n{label}'
# if not df.columns.is_unique:
# # possible index/body column interaction
# raise ValueError('tikz routine requires unique column names')
# # {extra_defs}
# # centering handled by quarto
# header = """
# \\begin{{{container_env}}}{latex}
# {caption}
# \\centering{{
# \\begin{{tikzpicture}}[
# auto,
# transform shape,
# nosep/.style={{inner sep=0}},
# table/.style={{
# matrix of nodes,
# row sep={row_sep}em,
# column sep={column_sep}em,
# nodes in empty cells,
# nodes={{rectangle, scale={scale}, text badly ragged {debug}}},
# """
# # put draw=blue!10 or so in nodes to see the node
# footer = """
# {post_process}
# \\end{{tikzpicture}}
# }} % close centering
# \\end{{{container_env}}}
# """
# # always a good idea to do this...need to deal with underscores, %
# # and it handles index types that are not strings
# df = GT.clean_index(df)
# if not np.all([i == 'object' for i in df.dtypes]) and not df.empty:
# logger.warning('cols of df not all objects (expect all obs at this '
# 'point): ', df.dtypes, sep='\n')
# # make sure percents are escaped, but not if already escaped
# df = df.replace(r"(?<!\\)%", r"\%", regex=True)
# # pres_maker.df_to_tikz code line 1931
# if self.show_index:
# nc_index = df.index.nlevels
# with warnings.catch_warnings():
# warnings.simplefilter(
# "ignore", category=pd.errors.PerformanceWarning)
# df = df.reset_index(
# drop=False, col_level=df.columns.nlevels - 1)
# if sparsify:
# if hrule is None:
# hrule = set()
# for i in range(sparsify):
# # TODO update to new sparsify!!
# df.iloc[:, i], rules = GT.sparsify_old(df.iloc[:, i])
# # don't want lines everywhere
# if len(rules) < len(df) - 1:
# hrule = set(hrule).union(rules)
# else:
# nc_index = 0
# if vrule is None:
# vrule = set()
# else:
# vrule = set(vrule)
# # to the left of... +1
# vrule.add(nc_index + 1)
# nr_columns = df.columns.nlevels
# logger.info(
# f'rows in columns {nr_columns}, columns in index {nc_index}')
# # internal TeX code (same as HTML code)
# matrix_name = self.df_id
# # note this happens AFTER you have reset the index...need to pass
# # number of index columns
# # have also converted everything to formatted strings
# # estimate... originally called guess_column_widths, with more parameters
# colw, tabs = GT.estimate_column_widths(df, self.config.max_table_width, nc_index=nc_index, scale=self.config.tikz_scale, equal=self.config.equal) # noqa
# if self.config.debug:
# print(f'Make TikZ Input {self.tabs=}\nComputed {tabs=}')
# # print('TIKZ ' + ', '.join([f'{c:,.2f}' for c in tabs]))
# # print(f'TIKZ {colw=}, {tabs=}')
# logger.info(f'tabs: {tabs}')
# logger.info(f'colw: {colw}')
# # alignment dictionaries - these are still used below
# ad = {'l': 'left', 'r': 'right', 'c': 'center'}
# ad2 = {'l': '<', 'r': '>', 'c': '^'}
# # use df_aligners, at this point the index has been reset
# align = []
# for n, i in zip(df.columns, self.df_aligners):
# if i == 'grt-left':
# align.append('l')
# elif i == 'grt-right':
# align.append('r')
# elif i == 'grt-center':
# align.append('c')
# else:
# align.append('l')
# # start writing
# sio = StringIO()
# if latex is None:
# latex = ''
# else:
# latex = f'[{latex}]'
# debug = ''
# if self.config.debug:
# # color all boxes
# debug = ', draw=blue!10'
# else:
# debug = ''
# sio.write(header.format(container_env=container_env,
# caption=caption,
# extra_defs=extra_defs,
# scale=self.config.tikz_scale,
# column_sep=column_sep,
# row_sep=row_sep,
# latex=latex,
# debug=debug))
# # table header
# # title rows, start with the empty spacer row
# i = 1
# sio.write(
# f'\trow {i}/.style={{nodes={{text=black, anchor=north, inner ysep=0, text height=0, text depth=0}}}},\n')
# for i in range(2, nr_columns + 2):
# sio.write(
# f'\trow {i}/.style={{nodes={{text=black, anchor=south, inner ysep=.2em, minimum height=1.3em, font=\\bfseries}}}},\n')
# # write column spec
# for i, w, al in zip(range(1, len(align) + 1), tabs, align):
# # average char is only 0.48 of M
# # https://en.wikipedia.org/wiki/Em_(gtypography)
# if i == 1:
# # first column sets row height for entire row
# sio.write(f'\tcolumn {i:>2d}/.style={{'
# f'nodes={{align={ad[al]:<6s}}}, '
# 'text height=0.9em, text depth=0.2em, '
# f'inner xsep={column_sep}em, inner ysep=0, '
# f'text width={max(2, 0.6 * w):.2f}em}},\n')
# else:
# sio.write(f'\tcolumn {i:>2d}/.style={{'
# f'nodes={{align={ad[al]:<6s}}}, nosep, text width={max(2, 0.6 * w):.2f}em}},\n')
# # extra col to right which enforces row height
# sio.write(
# f'\tcolumn {i+1:>2d}/.style={{text height=0.9em, text depth=0.2em, nosep, text width=0em}}')
# sio.write('\t}]\n')
# sio.write("\\matrix ({matrix_name}) [table, ampersand replacement=\\&]{{\n".format(
# matrix_name=matrix_name))
# # body of table, starting with the column headers
# # spacer row
# nl = ''
# for cn, al in zip(df.columns, align):
# s = f'{nl} {{cell:{ad2[al]}{colw[cn]}s}} '
# nl = '\\&'
# sio.write(s.format(cell=' '))
# # include the blank extra last column
# sio.write('\\& \\\\\n')
# # write header rows (again, issues with multi index)
# mi_vrules = {}
# sparse_columns = {}
# if isinstance(df.columns, pd.MultiIndex):
# for lvl in range(len(df.columns.levels)):
# nl = ''
# sparse_columns[lvl], mi_vrules[lvl] = GT.sparsify_mi(df.columns.get_level_values(lvl),
# lvl == len(df.columns.levels) - 1)
# for cn, c, al in zip(df.columns, sparse_columns[lvl], align):
# # c = wfloat_format(c)
# s = f'{nl} {{cell:{ad2[al]}{colw[cn]}s}} '
# nl = '\\&'
# sio.write(s.format(cell=c + '\\grtspacer'))
# # include the blank extra last column
# sio.write('\\& \\\\\n')
# else:
# nl = ''
# for c, al in zip(df.columns, align):
# # c = wfloat_format(c)
# s = f'{nl} {{cell:{ad2[al]}{colw[c]}s}} '
# nl = '\\&'
# sio.write(s.format(cell=c + '\\grtspacer'))
# sio.write('\\& \\\\\n')
# # write table entries
# for idx, row in df.iterrows():
# nl = ''
# for c, cell, al in zip(df.columns, row, align):
# # cell = wfloat_format(cell)
# s = f'{nl} {{cell:{ad2[al]}{colw[c]}s}} '
# nl = '\\&'
# sio.write(s.format(cell=cell))
# # if c=='p':
# # print('COLp', cell, type(cell), s, s.format(cell=cell))
# sio.write('\\& \\\\\n')
# sio.write(f'}};\n\n')
# # decorations and post processing - horizontal and vertical lines
# nr, nc = df.shape
# # add for the index and the last row plus 1 for the added spacer row at the top
# nr += nr_columns + 1
# # always include top and bottom
# # you input a table row number and get a line below it; it is implemented as a line ABOVE the next row
# # function to convert row numbers to TeX table format (edge case on last row -1 is nr and is caught, -2
# # is below second to last row = above last row)
# # shift down extra 1 for the spacer row at the top
# def python_2_tex(x): return x + nr_columns + \
# 2 if x >= 0 else nr + x + 3
# tb_rules = [nr_columns + 1, python_2_tex(-1)]
# if hrule:
# hrule = set(map(python_2_tex, hrule)).union(tb_rules)
# else:
# hrule = list(tb_rules)
# logger.debug(f'hlines: {hrule}')
# # why
# yshift = row_sep / 2
# xshift = -column_sep / 2
# descender_proportion = 0.25
# # top rule is special
# ls = 'thick'
# ln = 1
# sio.write(
# f'\\path[draw, {ls}] ({matrix_name}-{ln}-1.south west) -- ({matrix_name}-{ln}-{nc+1}.south east);\n')
# for ln in hrule:
# ls = 'thick' if ln == nr + nr_columns + \
# 1 else ('semithick' if ln == 1 + nr_columns else 'very thin')
# if ln < nr:
# # line above TeX row ln+1 that exists
# sio.write(f'\\path[draw, {ls}] ([yshift={-yshift}em]{matrix_name}-{ln}-1.south west) -- '
# f'([yshift={-yshift}em]{matrix_name}-{ln}-{nc+1}.south east);\n')
# else:
# # line above row below bottom = line below last row
# # descenders are 200 to 300 below baseline
# ln = nr
# sio.write(f'\\path[draw, thick] ([yshift={-descender_proportion-yshift}em]{matrix_name}-{ln}-1.base west) -- '
# f'([yshift={-descender_proportion-yshift}em]{matrix_name}-{ln}-{nc+1}.base east);\n')
# # if multi index put in lines within the index TODO make this better!
# if nr_columns > 1:
# for ln in range(2, nr_columns + 1):
# sio.write(f'\\path[draw, very thin] ([xshift={xshift}em, yshift={-yshift}em]'
# f'{matrix_name}-{ln}-{nc_index+1}.south west) -- '
# f'([yshift={-yshift}em]{matrix_name}-{ln}-{nc+1}.south east);\n')
# written = set(range(1, nc_index + 1))
# if vrule and self.show_index:
# # to left of col, 1 based, includes index
# # write these first
# # TODO fix madness vrule is to the left, mi_vrules are to the right...
# ls = 'very thin'
# for cn in vrule:
# if cn not in written:
# sio.write(f'\\path[draw, {ls}] ([xshift={xshift}em]{matrix_name}-1-{cn}.south west) -- '
# f'([yshift={-descender_proportion-yshift}em, xshift={xshift}em]{matrix_name}-{nr}-{cn}.base west);\n')
# written.add(cn - 1)
# if len(mi_vrules) > 0:
# logger.debug(
# f'Generated vlines {mi_vrules}; already written {written}')
# # vertical rules for the multi index
# # these go to the RIGHT of the relevant column and reflect the index columns already
# # mi_vrules = {level of index: [list of vrule columns]
# # written keeps track of which vrules have been done already; start by cutting out the index columns
# ls = 'ultra thin'
# for k, cols in mi_vrules.items():
# # don't write the lowest level
# if k == len(mi_vrules) - 1:
# break
# for cn in cols:
# if cn in written:
# pass
# else:
# written.add(cn)
# top = k + 1
# if top == 0:
# sio.write(f'\\path[draw, {ls}] ([xshift={-xshift}em]{matrix_name}-{top}-{cn}.south east) -- '
# f'([yshift={-descender_proportion-yshift}em, xshift={-xshift}em]{matrix_name}-{nr}-{cn}.base east);\n')
# else:
# sio.write(f'\\path[draw, {ls}] ([xshift={-xshift}em, yshift={-yshift}em]{matrix_name}-{top}-{cn}.south east) -- '
# f'([yshift={-descender_proportion-yshift}em, xshift={-xshift}em]{matrix_name}-{nr}-{cn}.base east);\n')
# sio.write(footer.format(container_env=container_env,
# post_process=post_process))
# return sio.getvalue()
@staticmethod
def sparsify(df, cs):
out = df.copy()
@@ -2417,15 +2836,16 @@ class GT(object):
except:
return str(n)
# @staticmethod
# def clean_underscores(s):
# """
# check s for unescaped _s
# returns true if all _ escaped else false
# :param s:
# :return:
# """
# return np.all([s[x.start() - 1] == '\\' for x in re.finditer('_', s)])
@staticmethod
def clean_index(df):
"""
escape _ for columns and index, being careful about subscripts
in TeX formulas.
:param df:
:return:
"""
return df.rename(index=GT.clean_name, columns=GT.clean_name)
@staticmethod
def clean_html_tex(text):
@@ -2439,17 +2859,6 @@ class GT(object):
text = re.sub(r'(?<!\$)\$(.*?)(?<!\\)\$(?!\$)', r'\\(\1\\)', text)
return text
@staticmethod
def clean_index(df):
"""
escape _ for columns and index, being careful about subscripts
in TeX formulas.
:param df:
:return:
"""
return df.rename(index=GT.clean_name, columns=GT.clean_name)
@staticmethod
def md_to_df(txt):
"""Convert markdown text string table to DataFrame."""
@@ -2523,9 +2932,8 @@ class GT(object):
if self.df.empty:
return ""
if self._string == "":
cw_df = self.make_column_width_df()
cw = cw_df['recommended']
aligners = cw_df['alignment']
cw = self.column_width_df['recommended']
aligners = self.column_width_df['alignment']
self._string = GT.make_text_table(
self.df, cw, aligners, index_levels=self.nindex)
return self._string
@@ -2728,9 +3136,8 @@ class GT(object):
self.config.table_width_mode = 'explicit'
self.config.max_table_width = console.width
# figure col widths and aligners
cw_df = self.make_column_width_df()
cw = cw_df['recommended']
aligners = cw_df['alignment']
cw = self.column_widths['recommended']
aligners = self.column_widths['alignment']
show_lines = self.config.hrule_widths[0] > 0
self._rich_table = table = GT.make_rich_table(self.df,
@@ -2747,13 +3154,27 @@ class GT(object):
def make_svg(self):
"""Render tikz into svg text."""
tz = TikzProcessor(self._repr_latex_(), file_name=self.df_id)
tz = TikzProcessor(self._repr_latex_(),
file_name=self.df_id, debug=self.config.debug)
p = tz.file_path.with_suffix('.svg')
if not p.exists():
tz.process_tikz(verbose=False)
try:
tz.process_tikz()
except ValueError as e:
print(e)
return "no svg output"
txt = p.read_text()
return txt
def show_svg(self):
"""Display svg in Jupyter."""
svg = self.make_svg()
if svg != 'no svg output':
display(SVG(svg))
else:
print('No SVG file available (TeX compile error).')
def save_html(self, fn):
"""Save HTML to file."""
p = Path(fn)
@@ -2762,3 +3183,15 @@ class GT(object):
soup = BeautifulSoup(self.html, 'html.parser')
p.write_text(soup.prettify(), encoding='utf-8')
logger.info(f'Saved to {p}')
@staticmethod
def uber_test(df, **kwargs):
"""Print various diagnostics and all the formats."""
f = GT(df, **kwargs)
display(f)
print(f)
f.show_svg()
display(df)
display(f.column_width_df)
print(f.make_tikz())
return f
+17 -4
View File
@@ -4,6 +4,7 @@ Make fake dataframes for testing.
GPT from SJMM design.
"""
from collections import deque
from datetime import datetime, timedelta
from importlib.resources import files
from itertools import cycle, chain
@@ -117,6 +118,14 @@ class TestDataFrameFactory:
# lengths of index (word count) sampled from:
self.index_value_lengths = [1]*10 + [2] * 4 + [3]
self.cache = deque(maxlen=10)
# def cache(self, n=0):
# """Get nth item ago from cache, default = 0, latest."""
# if n < len(self._cache):
# return self._cache[n]
# else:
# print(f'Cache only contains {len(self._cache)} < {n} items.')
def make(self, rows: int, columns: Union[int, str], index: Union[int, str] = 0,
col_index: Union[int, str] = 0, missing: float = 0.0) -> pd.DataFrame:
@@ -168,14 +177,15 @@ class TestDataFrameFactory:
self.rng = np.random.default_rng(self.seed)
return self._generate(**self._last_args)
def random(self, index_levels: int = 0, column_levels: int = 0) -> pd.DataFrame:
def random(self, index_levels: int = 0, column_levels: int = 0, omit: str = 'p') -> pd.DataFrame:
"""
Generate a DataFrame with randomly chosen settings.
Args:
index_levels: Number of index levels to use.
column_levels: Number of column MultiIndex levels.
omit: omit column datatypes in omit
Returns:
DataFrame
"""
@@ -184,8 +194,10 @@ class TestDataFrameFactory:
if column_levels == 0:
column_levels = random.choice([1, 1, 1, 1, 1, 2, 2, 3])
rows = self.rng.integers(5 * index_levels, 10 * index_levels)
valid_types = [i for i in ['d', 'f', 'i', 's3', 'l', 'h', 't', 'p', 'x', 'r', 'y']
if i not in omit]
col_types = self.rng.choice(
['d', 'f', 'i', 's3', 'l', 'h', 't', 'p'], size=self.rng.integers(3, 7))
valid_types, size=self.rng.integers(3, 7))
missing = round(float(self.rng.uniform(0, 0.15)), 2)
index = ''.join(self.rng.choice(
['t', 'd', 'y', 'i', 's2'], size=index_levels))
@@ -223,6 +235,7 @@ class TestDataFrameFactory:
df.columns = col_idx
df.index = self._make_index(index, rows)
df = self._insert_missing(df, missing)
self.cache.appendleft(df)
return df
def _parse_colspec(self, spec: str) -> list[str]:
@@ -278,7 +291,7 @@ class TestDataFrameFactory:
if len(desc) == 1:
if desc[0] == 'i':
return pd.RangeIndex(n, name=self.index_name())
elif desc[0] in ('d', 't', 'x'):
elif desc[0] in ('d', 't', 'x', 'y'):
vals = self._generate_column(desc[0], n)
return pd.Index(vals, name=self.index_name())
elif not all(i[0] == 's' for i in desc):
+36 -23
View File
@@ -15,6 +15,7 @@ from .hasher import txt_short_hash
class TikzProcessor:
"""Create PDF and SVG files from Tikz blocks."""
# Full TeX preamble to generate a .fmt if needed
_tex_template_full = r"""\documentclass[10pt, border=5mm]{standalone}
\usepackage{amsfonts}
@@ -41,7 +42,8 @@ class TikzProcessor:
"""
def __init__(self, txt, file_name='', base_path='.', tex_engine='pdflatex'):
def __init__(self, txt, file_name='', base_path='.', tex_engine='pdflatex', debug=False):
"""Create object from txt, a TeX blob containing a tikzpicture."""
self.txt = txt
self.tex_engine = tex_engine
self.base_path = Path(base_path).resolve()
@@ -50,6 +52,7 @@ class TikzProcessor:
file_name = file_name or txt_short_hash(txt)
self.file_path = self.out_path / file_name
self.format_file = self.out_path / 'tikz_format.fmt'
self.debug = debug
def split_tikz(self):
"""Split text to extract the TikZ picture."""
@@ -59,7 +62,7 @@ class TikzProcessor:
"""Create format file for faster compilation if missing."""
if self.format_file.exists():
return
print('building format file...')
print('TikzProcessor: building TeX format fmt file...', end ='')
tmp = self.out_path / 'tikz_format.tex'
tmp.write_text(self._tex_template_full, encoding='utf-8')
self.run_command([
@@ -71,9 +74,9 @@ class TikzProcessor:
], raise_on_error=True, cwd=self.out_path)
# tmp.unlink()
(self.out_path / f'{self.format_file.stem}.log').unlink()
print('building format file...success', self.format_file.resolve())
print('...success...format file built', self.format_file.resolve())
def process_tikz(self, verbose=False):
def process_tikz(self):
"""Compile TikZ to PDF and convert to SVG."""
tikz_begin, tikz_code, tikz_end = self.split_tikz()[1:4]
tex_code = self._tex_template.format(
@@ -95,40 +98,50 @@ class TikzProcessor:
f'--output-directory={str(tex_path.parent)}',
str(tex_path)
]
if verbose:
if self.debug:
print("Running:", " ".join(tex_cmd))
self.run_command(tex_cmd)
if self.run_command(tex_cmd):
raise ValueError('TeX failed to compile, not pdf or svg output.')
# no tidying up
else:
# continue
(tex_path.parent / 'make_tikz.bat').write_text(" ".join(tex_cmd), encoding='utf-8')
(tex_path.parent / 'make_tikz.bat').write_text(" ".join(tex_cmd), encoding='utf-8')
svg_cmd = [
'C:\\temp\\pdf2svg-windows\\dist-64bits\\pdf2svg',
str(pdf_path),
str(svg_path)
]
if verbose:
print("Running:", " ".join(svg_cmd))
self.run_command(svg_cmd, raise_on_error=False)
svg_cmd = [
# 'C:\\temp\\pdf2svg-windows\\dist-64bits\\pdf2svg',
'pdf2svg',
str(pdf_path),
str(svg_path)
]
if self.debug:
print("Running:", " ".join(svg_cmd))
self.run_command(svg_cmd, raise_on_error=True)
if not verbose:
for ext in ('.tex', '.aux', '.log', '.pdf'):
path = tex_path.with_suffix(ext)
if path.exists():
path.unlink()
if not self.debug:
for ext in ('.tex', '.aux', '.log', '.pdf'):
path = tex_path.with_suffix(ext)
if path.exists():
path.unlink()
def display(self):
"""Display the SVG in Jupyter."""
display(SVG(self.file_path.with_suffix('.svg')))
@staticmethod
def run_command(command, raise_on_error=True, cwd=None):
def run_command(self, command, raise_on_error=True, cwd=None):
"""Run command with subprocess and show output."""
with Popen(command, cwd=cwd, stdout=PIPE, stderr=PIPE, universal_newlines=True) as p:
stdout, stderr = p.communicate()
if stdout and self.debug:
print('Run command output ends\n', stdout.strip()[-250:])
if stdout:
print(stdout.strip()[-250:])
if stdout.find('no output PDF file produced') > 0:
print("ERROR no pdf output\n"*5)
return -1
if stderr:
if raise_on_error:
raise RuntimeError(stderr.strip())
else:
print(stderr.strip())
return -2
return 0