newsfresh — Command Reference

A fast CLI tool for querying, filtering, and analyzing GDELT Global Knowledge Graph (GKG) v2.1 data — the world's largest open dataset of global news events, updated every 15 minutes.

Global Options

These options are available on every subcommand.

Flag	Description
`-v, --verbose`	Increase logging verbosity. Stackable: `-v` (info), `-vv` (debug), `-vvv` (trace)
`-q, --quiet`	Suppress non-error output
`-h, --help`	Print help for any command

$ newsfresh --help
Query and analyze GDELT GKG v2.1 data

Usage: newsfresh [OPTIONS] <COMMAND>

Commands:
  fetch    Download GKG data (latest or historical)
  parse    Parse a local GKG file and output records
  query    Fetch + parse + filter in one step
  schema   Print GKG type definitions
  analyze  NL search + analyze GKG records
  help     Print this message or the help of the given subcommand(s)

`fetch` — Download GKG Data

Downloads a GKG data file (latest 15-minute update or historical) from GDELT servers. Automatically extracts the CSV from the ZIP archive.

Arguments

Flag	Type	Description	Default
`--latest`	bool	Fetch the latest 15-minute update	`true`
`--date <DATE>`	string	Fetch a specific historical file (YYYYMMDDHHMMSS)	—
`--translation`	bool	Fetch non-English (translation) variant	`false`
`-o, --output <DIR>`	path	Output directory	`./data`
`--keep-zip`	bool	Keep the .zip file after extraction	`false`

Examples

# Latest 15-minute update (default)
$ newsfresh fetch

# Historical file by date
$ newsfresh fetch --date 20250217150000

# Non-English variant, custom output directory, keep zip
$ newsfresh fetch --translation -o ./my-data --keep-zip

Sample Output

Fetching: http://data.gdeltproject.org/gdeltv2/20260216054500.gkg.csv.zip
Extracted: data/20260216054500.gkg.csv

`parse` — Parse a Local GKG File

Parses a local .csv or .csv.zip GKG file. Supports all filters, output formats, field projection, and offset/limit pagination.

Arguments

Flag	Type	Description	Default
`<FILE>`	path	Path to a local .csv or .csv.zip GKG file	required
`-f, --format`	enum	Output format: json, json-compact, tealeaf, tealeaf-compact	`json`
`-o, --output`	path	Output file (stdout if omitted)	`stdout`
`--limit <N>`	int	Maximum number of records to output	`all`
`--offset <N>`	int	Skip first N records	`0`
`--fields <LIST>`	string	Comma-separated field names for projection	`all fields`
+ all filter options

Examples

# Parse a CSV file with filters
$ newsfresh parse data/20250217150000.gkg.csv \
  --country US --person "Trump" --limit 10

# Parse directly from a zip file
$ newsfresh parse data/20250217150000.gkg.csv.zip -f json

# Output specific fields only
$ newsfresh parse data/gkg.csv \
  -f json --fields document_identifier,source_common_name,tone

Sample Output — JSON with field projection

[
{
  "document_identifier": "https://www.washingtonpost.com/politics/2025/02/17/congress-budget-...",
  "source_common_name": "washingtonpost.com",
  "tone": {
    "tone": -1.82,
    "positive_score": 3.12,
    "negative_score": 4.94,
    "polarity": 8.06,
    "activity_ref_density": 15.43,
    "self_group_ref_density": 0.22,
    "word_count": 612
  }
}
]

`query` — Fetch + Parse + Filter

Downloads a GKG data file, parses it, applies filters, and outputs results — all in one step. Combines fetch + parse.

Arguments

Flag	Type	Description	Default
`--latest`	bool	Fetch the latest 15-minute update	`false`
`--date <DATE>`	string	Fetch a specific historical file (YYYYMMDDHHMMSS)	—
`--translation`	bool	Fetch non-English variant	`false`
`--persist-data-file`	bool	Persist downloaded data to persisted-storage/	`false`
`-f, --format`	enum	Output format	`json`
`-o, --output`	path	Output file	`stdout`
`--limit <N>`	int	Maximum records	`all`
`--offset <N>`	int	Skip first N records	`0`
`--fields <LIST>`	string	Comma-separated field projection	`all fields`
+ all filter options

Examples

# Fetch latest and filter by theme
$ newsfresh query --country US --theme "CLIMATE_CHANGE" --limit 5

# Historical data with tone filter
$ newsfresh query --date 20250201120000 --tone-min=-10 --tone-max=-2

# Persist downloaded files for reuse
$ newsfresh query --persist-data-file --country US --limit 20

# Output in compact TeaLeaf format
$ newsfresh query --country UK --has-quote -f tealeaf-compact

`analyze` — Full-Text Search & Statistics

Builds an in-memory Tantivy full-text index with BM25 ranking over GKG records, then runs natural language search queries. Optionally computes aggregate statistics using Polars DataFrames.

Search Enrichment

Enrichment	Example
FIPS country codes expanded to full names (240+ countries)	Searching "United States" matches code US
ADM1 state/province codes expanded to readable names	Searching "California" matches US06
Theme code canonicalization	TAX_FNCACT_PRESIDENT becomes searchable as "President"

Arguments

Flag	Type	Description	Default
`[FILE]`	path	Local .csv or .csv.zip file (optional — use --latest or --date instead)	—
`--search <QUERY>`	string	Natural language search query	required
`--latest`	bool	Fetch the latest 15-minute update	`false`
`--date <DATE>`	string	Fetch historical file (YYYYMMDDHHMMSS)	—
`--translation`	bool	Non-English variant	`false`
`--persist-data-file`	bool	Persist downloaded data	`false`
`--limit <N>`	int	Maximum number of results	`20`
`--stats`	bool	Show aggregate statistics instead of records	`false`
`--stats-top-n <N>`	int	Number of top entries per frequency table	`10`
`-f, --format`	enum	Output format (when not using --stats)	`tealeaf`
`-o, --output`	path	Output file	`stdout`
`--fields <LIST>`	string	Comma-separated field projection	`all fields`
+ all filter options

Examples — Record Output

# Search the latest GDELT data with natural language
$ newsfresh analyze --latest \
  --search "elections Congress US economy" --limit 20

# Search with additional structured filters
$ newsfresh analyze --latest \
  --search "climate carbon emissions policy" \
  --country US --limit 10

# From a local file
$ newsfresh analyze data/gkg.csv \
  --search "Ukraine Russia ceasefire negotiations" --limit 15

# Compact TeaLeaf output for LLM consumption (~47% fewer tokens)
$ newsfresh analyze --latest \
  --search "AI regulation technology" --limit 10 -f tealeaf-compact

Examples — Aggregate Statistics (`--stats`)

# Aggregate statistics with Polars DataFrames
$ newsfresh analyze --latest \
  --search "US politics" --limit 50 --stats

# Top 5 per category
$ newsfresh analyze data/gkg.csv \
  --search "climate change" --stats --stats-top-n 5 --limit 100

Sample `--stats` Output

=== GDELT Analysis Stats (50 records) ===

--- Top Themes ---
   1. GENERAL GOVERNMENT            24  (2.0%)
   2. LEADER                        24  (2.0%)
   3. GENERAL1                      23  (2.0%)
   4. GOVERNMENT                    21  (1.8%)
   5. UNGP FORESTS RIVERS OCEANS    20  (1.7%)

--- Top Countries ---
   1. United States (US)                48  (31.2%)
   2. United Kingdom (UK)                9  (5.8%)
   3. Canada (CA)                        7  (4.5%)
   4. China (CH)                         7  (4.5%)
   5. Australia (AS)                     6  (3.9%)

--- Tone ---
  Mean: -0.82  Std: 3.45  Range: [-8.12, 4.91]
  Most positive: [4.91] https://www.nytimes.com/.../economy-jobs-report...
  Most negative: [-8.12] https://www.washingtonpost.com/.../congress-budget-crisis...

--- Top Persons ---
   1. donald trump         9  (4.9%)
   2. elon musk            5  (2.7%)
   3. kamala harris        3  (1.6%)
   4. marco rubio          3  (1.6%)
   5. jerome powell        3  (1.6%)

--- Top Organizations ---
   1. congress                         3  (1.7%)
   2. federal reserve                  3  (1.7%)
   3. microsoft                        3  (1.7%)
   4. pentagon                         2  (1.2%)
   5. state department                 2  (1.2%)

--- Top Sources ---
   1. nytimes.com           16  (32.0%)
   2. washingtonpost.com     6  (12.0%)
   3. cnn.com               6  (12.0%)
   4. foxnews.com           5  (10.0%)
   5. politico.com          4  (8.0%)

Flag	Type	Description	Default
`-f, --format`	enum	tealeaf or json-schema	`tealeaf`

Filter Options

Available on parse, query, and analyze. All filters compose with AND logic.

Flag	Description	Example
`--person`	Person name (case-insensitive substring)	`--person "Trump"`
`--org`	Organization name	`--org "United Nations"`
`--theme`	GKG theme code	`--theme "TAX_POLICY"`
`--location`	Location name	`--location "Washington"`
`--country`	FIPS country code	`--country US`
`--tone-min / --tone-max`	Tone score range	`--tone-min -5 --tone-max 5`
`--date-from / --date-to`	Date range (YYYYMMDD)	`--date-from 20250201`
`--source`	Source name	`--source "bbc"`
`--has-image`	Only records with a sharing image	`--has-image`
`--has-quote`	Only records with quotations	`--has-quote`

Format	Flag	Description
JSON (pretty)	`-f json`	Pretty-printed JSON array (default)
JSON (compact)	`-f json-compact`	Minified single-line JSON
TeaLeaf	`-f tealeaf`	Schema-driven format, ~47% fewer tokens than JSON
TeaLeaf (compact)	`-f tealeaf-compact`	Minified TeaLeaf, additional ~21% savings

GKG v2.1 Record Fields

Each record contains up to 27 tab-delimited fields. The GkgRecord struct maps all of them.

#	Field	Type	Description
0	`gkg_record_id`	String	Unique record identifier
1	`date`	i64	Publication date (YYYYMMDDHHMMSS)
2	`source_collection_id`	i32	Source type (1=Web, 2=Citation, 3=Core, ...)
3	`source_common_name`	String	Human-readable source name
4	`document_identifier`	String	Article URL
5	`v1_counts`	Vec<CountV1>	Event counts (protests, arrests, etc.)
6	`v21_counts`	Vec<CountV21>	V2.1 counts with character offsets
7	`v1_themes`	Vec<String>	Theme codes (e.g., TAX_POLICY)
8	`v2_enhanced_themes`	Vec<EnhancedTheme>	Themes with character offsets
9	`v1_locations`	Vec<LocationV1>	Geocoded locations (country, lat/lon)
10	`v2_enhanced_locations`	Vec<EnhancedLocation>	V2 locations with ADM2 codes
11	`v1_persons`	Vec<String>	Person names mentioned
12	`v2_enhanced_persons`	Vec<EnhancedEntity>	Persons with character offsets
13	`v1_organizations`	Vec<String>	Organization names
14	`v2_enhanced_organizations`	Vec<EnhancedEntity>	Organizations with offsets
15	`tone`	Option<Tone>	Sentiment (tone, polarity, pos/neg, word count)
16	`v21_enhanced_dates`	Vec<EnhancedDate>	Dates mentioned in the article
17	`gcam`	Vec<GcamEntry>	GCAM content-analysis dimension scores
18	`sharing_image`	Option<String>	Primary sharing image URL
19	`related_images`	Vec<String>	Related image URLs
20	`social_image_embeds`	Vec<String>	Social media image embeds
21	`social_video_embeds`	Vec<String>	Social media video embeds
22	`quotations`	Vec<Quotation>	Direct quotes with attribution verbs
23	`all_names`	Vec<NameEntry>	All named entities
24	`amounts`	Vec<AmountEntry>	Monetary/numerical amounts
25	`translation_info`	Option<TranslationInfo>	Translation source language and engine
26	`extras_xml`	Option<String>	Extra XML content

Module Overview

The crate is organized into 9 public modules.

Module	Description
`cli`	CLI argument definitions (clap derive) — structs for all 5 subcommands
`error`	`NewsfreshError` enum covering HTTP, I/O, parse, ZIP, JSON, and Polars errors
`fetch`	HTTP client for downloading GKG data, ZIP extraction, lastupdate.txt parsing
`filter`	`RecordFilter` trait and 10 composable predicate filters
`model`	Complete GKG v2.1 data model — `GkgRecord` with 27 fields and 14 sub-types
`output`	`OutputFormatter` trait with JSON and TeaLeaf formatters, field projection, schema printing
`parse`	Streaming GKG parser — `GkgReader` iterator, tab-delimited field parsing
`search`	Tantivy full-text search — `SearchEngine` trait, BM25, FIPS/ADM1/theme enrichment
`stats`	Polars DataFrame aggregation — theme/country/person/org/source frequency, tone statistics

newsfresh Command Reference

Global Options

`fetch` — Download GKG Data

Arguments

Examples

Sample Output

`parse` — Parse a Local GKG File

Arguments

Examples

Sample Output — JSON with field projection

`query` — Fetch + Parse + Filter

Arguments

Examples

`analyze` — Full-Text Search & Statistics

Search Enrichment

Arguments

Examples — Record Output

Examples — Aggregate Statistics (`--stats`)

Sample `--stats` Output

`schema` — Print GKG Type Definitions

Arguments

Examples

Filter Options

Output Formats

GKG v2.1 Record Fields

Module Overview

newsfresh Command Reference

Global Options

fetch — Download GKG Data

Arguments

Examples

Sample Output

parse — Parse a Local GKG File

Arguments

Examples

Sample Output — JSON with field projection

query — Fetch + Parse + Filter

Arguments

Examples

analyze — Full-Text Search & Statistics

Search Enrichment

Arguments

Examples — Record Output

Examples — Aggregate Statistics (--stats)

Sample --stats Output

schema — Print GKG Type Definitions

Arguments

Examples

Filter Options

Output Formats

GKG v2.1 Record Fields

Module Overview

`fetch` — Download GKG Data

`parse` — Parse a Local GKG File

`query` — Fetch + Parse + Filter

`analyze` — Full-Text Search & Statistics

Examples — Aggregate Statistics (`--stats`)

Sample `--stats` Output

`schema` — Print GKG Type Definitions