# Redaction Text Layer Forensic Analysis
## [EFTA00000476](https://www.justice.gov/epstein/files/DataSet%201/EFTA00000476.pdf).pdf and [EFTA00001932](https://www.justice.gov/epstein/files/DataSet%201/EFTA00001932.pdf).pdf - December 2025 vs. Re-Release Versions

**Date:** 2026-02-08
**Analyst:** Forensic PDF structure investigation
**Subject:** Determining whether "exposed text" from poorly-redacted PDFs represents hidden readable text behind black rectangles, garbled OCR, encoding corruption, or missed text layers

---

## EXECUTIVE SUMMARY

**The "exposed text" is garbled OCR of low-resolution scanned images. There is NO hidden readable text behind black rectangle overlays in these PDFs.**

Both [EFTA00000476](https://www.justice.gov/epstein/files/DataSet%201/EFTA00000476.pdf).pdf and [EFTA00001932](https://www.justice.gov/epstein/files/DataSet%201/EFTA00001932.pdf).pdf are **image-based scanned documents** with invisible OCR text layers. The OCR text layer uses PDF Text Rendering Mode 3 (invisible) and is positioned BEHIND the scanned image in the rendering order. The text appears garbled because OCR software attempted to read:
- A photograph of a financial ledger on a manila envelope ([EFTA00000476](https://www.justice.gov/epstein/files/DataSet%201/EFTA00000476.pdf)) at only 96 DPI
- A handwritten letter in cursive blue ink on decorative paper ([EFTA00001932](https://www.justice.gov/epstein/files/DataSet%201/EFTA00001932.pdf)) at only 96 DPI

Neither document contains text-based PDF content with black rectangle overlays hiding selectable text underneath. The viral claim of "poorly redacted" documents exposing hidden text behind copy-paste-removable black bars is **not supported** by the PDF structure of these specific files.

---

## METHODOLOGY

### Files Analyzed

| File | Version | Path | Size |
|------|---------|------|------|
| [EFTA00000476](https://www.justice.gov/epstein/files/DataSet%201/EFTA00000476.pdf).pdf | Original (Dec 19) | local analysis file `originals/december_2025/VOL00001/IMAGES/0001/EFTA00000476.pdf` | 365,781 bytes |
| [EFTA00000476](https://www.justice.gov/epstein/files/DataSet%201/EFTA00000476.pdf).pdf | Current (re-release) | DOJ dataset file `dataset1/DataSet` 1/DataSet 1/VOL00001/IMAGES/0001/[EFTA00000476](https://www.justice.gov/epstein/files/DataSet%201/EFTA00000476.pdf).pdf` | 362,263 bytes |
| [EFTA00001932](https://www.justice.gov/epstein/files/DataSet%201/EFTA00001932.pdf).pdf | Original (Dec 19) | local analysis file `originals/december_2025/VOL00001/IMAGES/0002/EFTA00001932.pdf` | 573,379 bytes |
| [EFTA00001932](https://www.justice.gov/epstein/files/DataSet%201/EFTA00001932.pdf).pdf | Current (re-release) | DOJ dataset file `dataset1/DataSet` 1/DataSet 1/VOL00001/IMAGES/0002/[EFTA00001932](https://www.justice.gov/epstein/files/DataSet%201/EFTA00001932.pdf).pdf` | 572,881 bytes |

### Tools Used
- PDF analysis tools for PDF structure analysis
- pdftotext for text extraction
- pdfimages for image listing
- PIL/numpy/scipy for pixel-level image comparison
- Direct content stream parsing for rendering order verification

---

## FINDING 1: PDF RENDERING PIPELINE

All four PDFs (both versions of both files) share an identical 5-layer rendering structure:

```
Layer 1: Graphics state save (q)
Layer 2: INVISIBLE OCR TEXT LAYER (Text Rendering Mode 3)
Layer 3: SCANNED IMAGE (/Im0 Do) - rendered ON TOP of text layer
Layer 4: WHITE RECTANGLE (clip mask) + BLACK EFTA LABEL at bottom
Layer 5: End
```

### Evidence from content streams:

**[EFTA00000476](https://www.justice.gov/epstein/files/DataSet%201/EFTA00000476.pdf) Original** - Content streams [23, 6, 7, 24, 25]:
- Stream 23 (1 byte): `q` - graphics state save
- Stream 6 (24,617 bytes): OCR text with `3 Tr` (invisible mode), 410 unique Tz values
- Stream 7 (34 bytes): `q / 864 0 0 576.75 0 0 cm / /Im0 Do / Q` - image rendering
- Stream 24 (171 bytes): White rectangle fill + hex-encoded "[EFTA00000476](https://www.justice.gov/epstein/files/DataSet%201/EFTA00000476.pdf)" label
- Stream 25: End

**[EFTA00001932](https://www.justice.gov/epstein/files/DataSet%201/EFTA00001932.pdf) Original** - Content streams [29, 22, 4, 30, 31]:
- Stream 29 (1 byte): `q` - graphics state save
- Stream 22 (15,993 bytes): OCR text with `3 Tr` (invisible mode), 279 unique Tz values
- Stream 4 (34 bytes): Image rendering
- Stream 30 (142 bytes): White rectangle fill + hex-encoded "[EFTA00001932](https://www.justice.gov/epstein/files/DataSet%201/EFTA00001932.pdf)" label
- Stream 31: End

### Critical Detail: Text Rendering Mode 3

From the raw content streams:
```
3 Tr
```

PDF Text Rendering Mode 3 means the text is **neither filled nor stroked** - it is completely invisible. This is the standard method used by OCR software (such as ABBYY FineReader, Adobe Acrobat's OCR, or OmniPage) to create a "searchable" text layer behind a scanned image. The text exists only for search/copy functionality, not for visual display.

---

## FINDING 2: OCR SIGNATURE PROOF

The text layers exhibit unmistakable OCR signatures:

### Wildly Varying Font Sizes
| Document | Version | Unique Font Sizes |
|----------|---------|-------------------|
| [EFTA00000476](https://www.justice.gov/epstein/files/DataSet%201/EFTA00000476.pdf) | Original | 197 |
| [EFTA00000476](https://www.justice.gov/epstein/files/DataSet%201/EFTA00000476.pdf) | Current | Different OCR run |
| [EFTA00001932](https://www.justice.gov/epstein/files/DataSet%201/EFTA00001932.pdf) | Original | 266 |
| [EFTA00001932](https://www.justice.gov/epstein/files/DataSet%201/EFTA00001932.pdf) | Current | Different OCR run |

Real PDF text typically uses 2-10 font sizes. Having 197-266 unique sizes means OCR software is assigning different sizes to each word to match the spatial dimensions detected in the scan.

### Wildly Varying Horizontal Scaling (Tz)
| Document | Unique Tz Values |
|----------|-----------------|
| [EFTA00000476](https://www.justice.gov/epstein/files/DataSet%201/EFTA00000476.pdf) Original | 410 |
| [EFTA00000476](https://www.justice.gov/epstein/files/DataSet%201/EFTA00000476.pdf) Current | 272 |
| [EFTA00001932](https://www.justice.gov/epstein/files/DataSet%201/EFTA00001932.pdf) Original | 279 |
| [EFTA00001932](https://www.justice.gov/epstein/files/DataSet%201/EFTA00001932.pdf) Current | 248 |

The `Tz` operator sets horizontal text scaling. OCR software varies this per-word to fit each recognized word into the exact pixel width of the original. Real text documents have Tz=100 (or a handful of values). 272-410 unique values is absolute proof of OCR generation.

### Standard OCR Font Names
All four PDFs use identical non-embedded standard fonts:
- `Courier` (OPBaseFont0)
- `Helvetica` (OPBaseFont1)
- `Helvetica-Bold` (OPBaseFont2)
- `Times-Roman` (OPBaseFont3)
- `ArialMT` (OPExtFont0)

These are the default substitute fonts used by OCR engines when the actual font is unknown. The `OPBaseFont` naming convention is specific to OmniPage OCR software.

---

## FINDING 3: IMAGE ANALYSIS

### Both PDFs contain a single embedded image per page

| Document | Image Size | Color | Resolution | Coverage |
|----------|-----------|-------|------------|----------|
| [EFTA00000476](https://www.justice.gov/epstein/files/DataSet%201/EFTA00000476.pdf) | 1152x769 | Indexed (1-bit/8bpc) | 96 DPI | Full page |
| [EFTA00001932](https://www.justice.gov/epstein/files/DataSet%201/EFTA00001932.pdf) | 1152x769 | Indexed (1-bit/8bpc) | 96 DPI | Full page |

96 DPI is extremely low for OCR purposes (typical OCR requires 300+ DPI for good accuracy). This explains the garbled text output.

### Images are DIFFERENT between versions

Pixel-level comparison shows the images were re-scanned or re-processed:

| Document | Changed Pixels | Percentage |
|----------|---------------|------------|
| [EFTA00000476](https://www.justice.gov/epstein/files/DataSet%201/EFTA00000476.pdf) | 716,787 / 885,888 | 80.91% |
| [EFTA00001932](https://www.justice.gov/epstein/files/DataSet%201/EFTA00001932.pdf) | 808,168 / 885,888 | 91.23% |

Despite the massive pixel-level difference, the images look visually similar to the human eye - this indicates re-scanning from the same physical document (or re-processing with different settings).

### [EFTA00001932](https://www.justice.gov/epstein/files/DataSet%201/EFTA00001932.pdf) has an ADDITIONAL redaction in the current version

Visual comparison reveals the current (re-released) version of [EFTA00001932](https://www.justice.gov/epstein/files/DataSet%201/EFTA00001932.pdf) has a new black rectangle that does not exist in the December 19 original:

- **Original**: 1 black rectangle at image coordinates (73,105)-(165,222) - top-left area (likely covering the greeting/name)
- **Current**: 2 black rectangles - the original one PLUS a new one at approximately (370,524)-(403,676) - middle area of the letter

This new redaction is **baked into the scanned image itself**, not a PDF annotation overlay. It was applied by re-scanning or re-processing the physical document with an additional physical or digital redaction.

---

## FINDING 4: WHAT IS THE "EXPOSED TEXT"?

### [EFTA00000476](https://www.justice.gov/epstein/files/DataSet%201/EFTA00000476.pdf) (Financial Ledger)

The document is a **photograph of a financial ledger** lying on a manila envelope. The image shows:
- A spreadsheet/table with columns for dates, descriptions, and dollar amounts
- Black marker redactions covering certain cells in the physical document
- The document was photographed (not flatbed scanned) at low resolution

The "213 lines of exposed text" are OCR's attempt to read this low-resolution photograph:
```
04044 so 4,10y yentaYI ory a 4
Afaoutt W a Paso pew teoi 016.4
L290 /39 51 92100'0I
```

This is not "exposed hidden text" - it is garbled OCR of the visible printed content in the photograph, mangled by:
1. Low resolution (96 DPI)
2. Angle distortion (it's a photograph, not a scan)
3. Complex table layout confusing the OCR engine
4. Black marker redactions creating partial character occlusion

### [EFTA00001932](https://www.justice.gov/epstein/files/DataSet%201/EFTA00001932.pdf) (Handwritten Victim Letter)

The document is a **handwritten letter in blue cursive ink** on decorative stationery paper with a cartoon owl design. The "47 lines of exposed text" are OCR's attempt to read cursive handwriting:

```
ear e.i-freA;
1- i.O,,-_)c ot( hac.i. a wonderi-iti 110ii-
dali SeaSO11.
```

Manual reconstruction suggests this reads approximately:
```
Dear [name],
I hope [you] had a wonderful holiday
season.
```

The letter is a **victim thank-you letter** to Epstein, expressing gratitude for:
- Holiday/Christmas celebrations
- Trips to Palm Beach, Las Vegas, Mexico, and an island
- Flying her sister out to visit
- Use of a Manhattan apartment
- Help seeing her mother
- "Pushing me to be at my best"

This is consistent with the well-documented grooming pattern where victims were conditioned to express gratitude for material benefits.

---

## FINDING 5: NO BLACK RECTANGLE OVERLAY HIDING TEXT

### Annotation Check
| Document | Version | PDF Annotations | Redaction Annotations |
|----------|---------|----------------|----------------------|
| [EFTA00000476](https://www.justice.gov/epstein/files/DataSet%201/EFTA00000476.pdf) | Original | 0 | 0 |
| [EFTA00000476](https://www.justice.gov/epstein/files/DataSet%201/EFTA00000476.pdf) | Current | 0 | 0 |
| [EFTA00001932](https://www.justice.gov/epstein/files/DataSet%201/EFTA00001932.pdf) | Original | 0 | 0 |
| [EFTA00001932](https://www.justice.gov/epstein/files/DataSet%201/EFTA00001932.pdf) | Current | 0 | 0 |

**Zero PDF redaction annotations exist in any version.** There are no "overlay" black rectangles in the PDF structure.

### Drawing Object Check
| Document | Version | Drawings | Black-Filled | Description |
|----------|---------|----------|-------------|-------------|
| [EFTA00000476](https://www.justice.gov/epstein/files/DataSet%201/EFTA00000476.pdf) | Original | 1 | 0 | White page border only |
| [EFTA00001932](https://www.justice.gov/epstein/files/DataSet%201/EFTA00001932.pdf) | Original | 1 | 0 | White page border only |

The only drawing objects are white-filled page borders. **No black rectangles exist as PDF drawing objects.**

### Where are the black rectangles?

The black rectangles visible in these documents are **baked into the scanned images themselves**. They are part of the pixel data of the embedded raster image. This means:

1. The physical documents were redacted (with black tape, marker, or digital masking) BEFORE scanning
2. The scanner captured the already-redacted document
3. No text exists "behind" the black rectangles because the text was physically obscured before the scan

For [EFTA00000476](https://www.justice.gov/epstein/files/DataSet%201/EFTA00000476.pdf) (financial ledger):
- 143,650 near-black pixels in the image (17.1% of image area)
- Largest black region: 998x561 pixels (the table area with multiple column redactions)
- These are physical marker/tape redactions on the printed document, captured in the photograph

For [EFTA00001932](https://www.justice.gov/epstein/files/DataSet%201/EFTA00001932.pdf) (victim letter):
- Original: 1 black region at (73,105)-(165,222) = 92x117 pixels
- Current: Same region PLUS new region at (370,524)-(403,676) = 33x152 pixels
- The additional redaction in the re-release was applied to the image (re-scanned or digitally added to the raster)

---

## FINDING 6: ORIGINAL vs. CURRENT COMPARISON

### Text Layer Differences
| Document | Original Text Length | Current Text Length | Identical? |
|----------|---------------------|--------------------| -----------|
| [EFTA00000476](https://www.justice.gov/epstein/files/DataSet%201/EFTA00000476.pdf) | 2,853 chars | 2,392 chars | No |
| [EFTA00001932](https://www.justice.gov/epstein/files/DataSet%201/EFTA00001932.pdf) | 1,409 chars | 1,240 chars | No |

The text layers differ because **different OCR runs produce different results** from different scans of the same physical document. The current versions were re-scanned and re-OCR'd, producing slightly different (but equally garbled) text.

Key differences for [EFTA00001932](https://www.justice.gov/epstein/files/DataSet%201/EFTA00001932.pdf):
- Original OCR produces 257 words; Current OCR produces 229 words
- The original has "Mexico", "Vegas", "Friends"; the current has "Arizona", "Christmas", "circus"
- Both are equally garbled attempts at reading the same handwriting
- The current version lost some text near the new redaction area

### File Size Differences
| Document | Original Size | Current Size | Difference |
|----------|--------------|-------------|------------|
| [EFTA00000476](https://www.justice.gov/epstein/files/DataSet%201/EFTA00000476.pdf) | 365,781 | 362,263 | -3,518 bytes |
| [EFTA00001932](https://www.justice.gov/epstein/files/DataSet%201/EFTA00001932.pdf) | 573,379 | 572,881 | -498 bytes |

The slight size differences are consistent with different compression of the re-scanned images and different OCR text content.

---

## CONCLUSION

### Answer to the Key Question

**Do the original December 19 PDFs have actual selectable text behind black visual rectangles (text-based PDFs with overlay redactions)?**

**NO.** The evidence conclusively shows:

1. **These are image-based scanned documents**, not text-based PDFs
2. **The text layer is invisible OCR** (Text Rendering Mode 3), placed behind the image
3. **The OCR is garbled** because of 96 DPI resolution, cursive handwriting, and photographic distortion
4. **The black rectangles are in the images**, not PDF overlays - they represent physical redactions applied before scanning
5. **No PDF annotations or drawing objects** create the visible redactions
6. **The "exposed text" is garbled OCR artifacts**, not hidden text behind removable black bars

### Assessment of the Viral "Poorly Redacted" Claim for These Specific Files

For [EFTA00000476](https://www.justice.gov/epstein/files/DataSet%201/EFTA00000476.pdf) and [EFTA00001932](https://www.justice.gov/epstein/files/DataSet%201/EFTA00001932.pdf) specifically, the claim that redacted text can be "exposed" by copying/pasting from behind black rectangles is **overstated**. The "exposed text" is the OCR engine's garbled attempt at reading visible (non-redacted) content from a low-resolution scan. It does not reveal information hidden by the redactions.

However, it is worth noting:
- [EFTA00001932](https://www.justice.gov/epstein/files/DataSet%201/EFTA00001932.pdf) did receive an **additional physical redaction** in the re-release, covering what appears to be a signature or name area. This confirms that DOJ recognized at least some content needed additional redaction.
- The OCR text, while garbled, does provide fragmentary readable content from the non-redacted portions (e.g., recognizable words like "Christmas," "Manhattan," "Mexico," "friends," "wonderful")
- Other documents in the collection may have different redaction methods - this analysis applies only to these two specific files

### Root Cause of Garbled "Exposed Text"

The text appears garbled because:
1. **96 DPI scan resolution** - far below the 300 DPI minimum recommended for OCR
2. **Handwriting recognition failure** ([EFTA00001932](https://www.justice.gov/epstein/files/DataSet%201/EFTA00001932.pdf)) - cursive blue ink is extremely difficult for OCR
3. **Photographic distortion** ([EFTA00000476](https://www.justice.gov/epstein/files/DataSet%201/EFTA00000476.pdf)) - the document was photographed, not flatbed scanned
4. **OmniPage OCR limitations** - the `OPBaseFont` naming in the fonts confirms OmniPage was used, which has limited handwriting recognition
5. **Indexed color space** - both images use 1-bit indexed color (essentially black and white), losing any gray-scale information that could help character recognition

---

## APPENDIX: Technical Evidence

### Content Stream: [EFTA00000476](https://www.justice.gov/epstein/files/DataSet%201/EFTA00000476.pdf) Original - OCR Layer (first 500 bytes)
```
%WB0AiUxr
q
1 0.06 -0.06 1 17.43 -26.32 cm
BT
0 0 0 rg
0 0 0 RG
1 0 0 1 252.8 489 Tm
77.33 Tz
3 Tr/OPBaseFont0 8.33 Tf(\)1 )Tj
1 0 0 1 246.23 475.12 Tm
65.18 Tz/OPBaseFont0 15.62 Tf(4 )Tj
```

Key operators:
- `3 Tr` = Text Rendering Mode 3 (invisible)
- `77.33 Tz` / `65.18 Tz` = per-word horizontal scaling (OCR signature)
- `/OPBaseFont0` = OmniPage default font substitute

### Content Stream: Image Rendering Layer
```
q
864 0 0 576.75 0 0 cm
/Im0 Do
Q
```
This renders the image at full page size, ON TOP of the invisible text layer.

### Font Inventory ([EFTA00000476](https://www.justice.gov/epstein/files/DataSet%201/EFTA00000476.pdf) Original)
```
(9,  'n/a', 'Type1', 'Courier',        'OPBaseFont0', 'WinAnsiEncoding')
(10, 'n/a', 'Type1', 'Helvetica',      'OPBaseFont1', 'WinAnsiEncoding')
(11, 'n/a', 'Type1', 'Helvetica-Bold', 'OPBaseFont2', 'WinAnsiEncoding')
(12, 'n/a', 'Type1', 'Times-Roman',    'OPBaseFont3', 'WinAnsiEncoding')
(13, 'n/a', 'Type1', 'ArialMT',        'OPExtFont0',  'WinAnsiEncoding')
```

All fonts are non-embedded standard Type1 fonts with OmniPage naming.