Build a CLI to Process Nested Image Libraries with Mirrored or Flat Output

Flexible logging modes, ASCII-only filenames, conflict handling, and resume support

·Matija Žiberna·
Build a CLI to Process Nested Image Libraries with Mirrored or Flat Output

I recently needed to process large, deeply nested image libraries and rename them using AI-generated, descriptive filenames and alt text. The catch: some clients want to keep their folder hierarchy intact; others prefer a single flat output directory with one consolidated log. Along the way, I also needed to ensure filenames only contain ASCII letters (e.g., Č/Š/Ž → c/s/z).

This guide walks through a practical, production-friendly CLI that solves exactly that.

What we'll build

  • A Python CLI that reads images from a nested input directory and, depending on your needs, either preserves the structure or flattens all outputs into one folder.
  • Flexible logging: per-folder, per-project, or a single central JSON/CSV.
  • ASCII-only filenames via diacritic replacement.
  • Conflict-safe saves in flat mode and resume/skip behavior using existing logs.

By the end, you'll be able to point the CLI at any image library and produce clean, AI-renamed outputs—mirrored or flattened—with consolidated logs.


1) Add flexible output and logging modes

We introduce a --log-mode option that controls both logging placement and the output directory structure. Options:

  • per_folder: mirror folders; put logs in each folder (default)
  • project_level: mirror folders; one log per top-level folder
  • central: mirror folders; single log in output root
  • flat: flatten outputs; single log in output root
# File: cli.py
# (Argument parser excerpt)
parser.add_argument(
    "--log-mode",
    type=str,
    default="per_folder",
    choices=["central", "project_level", "per_folder", "flat"],
    help="Logging mode for results files.",
)

Under the hood, the processing function derives output subdirectories and log destinations from log_mode.

# File: cli.py
# (Inside process_images)
# Decide log file targets
if log_mode == "central" or log_mode == "flat":
    log_json_file = output_dir / "results.json"
    log_csv_file = output_dir / "results.csv"
elif log_mode == "project_level":
    # One results.json/csv per top-level folder
    # ... build project_logs[project_name]
else:  # per_folder
    log_json_file = output_subdir / "results.json"
    log_csv_file = output_subdir / "results.csv"

# Decide output directory layout
relative_path = original_path.relative_to(input_dir)
if log_mode == "flat":
    output_subdir = output_dir
else:
    output_subdir = output_dir / relative_path.parent
output_subdir.mkdir(parents=True, exist_ok=True)

Conceptually: central/project-level modes consolidate logs while preserving structure; flat mode consolidates both logs and files into a single directory.


2) Include filenames in results (JSON/CSV)

We log both the paths and the filenames to make downstream processing and deduping easier.

# File: app/utils/file_utils.py
# (Inside log_results)
log_entry = {
    "timestamp": time.strftime("%Y-%m-%d %H:%M:%S"),
    "original_path": str(original_path),
    "new_path": str(new_path),
    "original_filename": original_path.name,
    "new_filename": new_path.name,
    "alt_text": alt_text,
}

This ensures the CSV header and JSON entries always contain both filename and path fields.


3) Enforce ASCII-only filenames

To keep filenames portable, we sanitize names by replacing diacritics with ASCII equivalents.

# File: app/utils/file_utils.py
# (Inside sanitize_filename)
replacements = {
    # Slovenian
    "č": "c", "Č": "C", "š": "s", "Š": "S", "ž": "z", "Ž": "Z",
    # German
    "ä": "a", "Ä": "A", "ö": "o", "Ö": "O", "ü": "u", "Ü": "U", "ß": "ss",
    # French/Latin variants (subset)
    "à": "a", "á": "a", "â": "a", "ã": "a", "Å": "A",
    "è": "e", "é": "e", "ê": "e", "ë": "e",
    "ì": "i", "í": "i", "î": "i", "ï": "i",
    "ò": "o", "ó": "o", "ô": "o", "õ": "o", "ø": "o",
    "ù": "u", "ú": "u", "û": "u",
    "ç": "c", "Ç": "C",
    # Others
    "ñ": "n", "Ñ": "N", "ł": "l", "Ł": "L",
}
for old, new in replacements.items():
    name = name.replace(old, new)
name = re.sub(r"[\\/*?:\"<>|]", "", name)
name = re.sub(r"\s+", "-", name).lower()
name = re.sub(r"-+", "-", name).strip("-")

With this, AI outputs like Čista fasada – žleb ščisti become cista-fasada-zleb-scisti safely.


4) Handle conflicts in flat mode

Flattening a large tree can produce name collisions. We avoid overwriting by adding a numeric suffix only when needed.

# File: cli.py
# (After building new_path)
if log_mode == "flat":
    counter = 1
    original_stem = new_filename_stem
    while new_path.exists():
        new_filename_stem = f"{original_stem}-{counter}"
        new_path = output_subdir / f"{new_filename_stem}{output_extension}"
        counter += 1

This keeps flat outputs deterministic without clobbering prior results.


5) Make runs resume-safe with logs

We skip already processed files by loading prior results.json files based on log_mode.

# File: cli.py
processed_files = load_processed_files(output_dir, log_mode)
# Later, per file
if original_filename in processed_files:
    print(f"Skipping already processed: {original_filename}")
    continue

This is especially useful for huge libraries or when you hit rate limits and need to restart later.


6) Usage examples

Mirror structure with per-folder logs (default):

python cli.py \
  --input-dir input/laneks \
  --output-dir output/laneks \
  --lang sl

Mirror structure with a single central log:

python cli.py \
  --input-dir input/laneks \
  --output-dir output/laneks \
  --lang sl \
  --log-mode central

Mirror structure with per-project logs (top-level folder):

python cli.py \
  --input-dir input/laneks \
  --output-dir output/laneks \
  --lang sl \
  --log-mode project_level

Flatten outputs to a single directory with one log:

python cli.py \
  --input-dir input/laneks \
  --output-dir output/laneks-flat \
  --lang sl \
  --format webp \
  --max-width 1600 \
  --log-mode flat

Wrapping up

We built a practical CLI that handles real-world image libraries: AI-derived names and alt text, flexible folder strategies (mirrored or flat), consolidated logging, safe ASCII filenames, conflict handling, and resume support. Point it at any nested source and confidently generate clean, production-ready assets.

Let me know in the comments if you have questions, and subscribe for more practical development guides.

Thanks, Matija

0

Comments

Enjoyed this article?
Subscribe to my newsletter for more insights and tutorials.
Matija Žiberna
Matija Žiberna
Full-stack developer, co-founder

I'm Matija Žiberna, a self-taught full-stack developer and co-founder passionate about building products, writing clean code, and figuring out how to turn ideas into businesses. I write about web development with Next.js, lessons from entrepreneurship, and the journey of learning by doing. My goal is to provide value through code—whether it's through tools, content, or real-world software.

You might be interested in