TL;DR
- Raw GPT and Claude prompts are not enough for broker-grade HTS classification. They invent 10-digit statistical suffixes, miss the 2022 USITC split that moved smartphones from 8517.12.00.00 to 8517.13.00.00, cannot search CROSS for prior CBP rulings, and have no awareness of Chapter 99 overlays published after their training cutoff.
- The fix is a tool layer, not a better prompt. Connect Claude or ChatGPT to the Tandom MCP server at mcp.tandom.ai/mcp (configured with a Tandom API key from api.tandom.ai). The open Model Context Protocol spec is at modelcontextprotocol.io. The model proposes a heading and reasoning. The tools verify the code exists, attach the live duty stack, and surface AD/CVD advisories.
- Use a structured prompt that forces the proper GRI sequence. Section and chapter notes first, then GRI 1, then GRI 3(a), then GRI 3(b) only if the prior steps did not resolve. The template below has the exact wording brokers can paste in.
- Post-verify every answer. Parse the htsno out of the model output and run it through tariffs.tandom.ai/hts-catalog or hts.usitc.gov before relying on it. A 200 in the catalog is the minimum bar.
- Reasonable care still sits with the importer of record under 19 USC 1484. An LLM is a productivity tool, not a legal substitute. Run the output through your normal review and use the model to widen throughput, not to skip the broker.
Why raw LLM prompts fail at HTS classification
The failure modes are predictable. They show up in roughly the same shape across raw ChatGPT, raw Claude, raw Gemini, and any other LLM without tool access.
Stat-suffix fabrication
Models know the HTSUS code shape (XXXX.XX.XX.XX) and they know real headings exist. They have not memorized every 10-digit suffix in 99 chapters. When the prompt asks for a complete 10-digit code and the model has not seen the exact suffix, it generates one that looks right. A real example caught during the first pass of our duty-calculation guide: the model returned HTS 7318.15.5095 for cap screws. The real code is 7318.15.80.66. Searching the Tandom catalog API for 7318.15.5095 returns zero results; same on hts.usitc.gov. The model wrote it with full confidence.
Stale schedule, missed re-classifications
USITC republishes the schedule on January 1 and July 1 of each year, and updates between revisions through Federal Register notices. A model trained on text up to early 2024 still cites smartphones at 8517.12.00.00, the pre-2022 location. The current code is 8517.13.00.00, issued when USITC carved smartphones out of 8517.12 in 2022 Revision 6. Confirm: a search for 8517.12.00.00 in the live schedule returns nothing; 8517.13.00.00 returns the smartphones tariff item with current rate "Free." Brokers who let an AI answer override the schedule on this code stack the entry under a tariff item that no longer exists.
No CROSS, no precedent
CBP's Customs Rulings Online Search System at rulings.cbp.gov holds every binding ruling, scope ruling, and HQ decision since 1989. Raw LLMs cannot search it; they have at most a stale subset in training data and they cannot tell you whether a ruling has been modified or revoked. The result: a model reasons through GRI 3(b) for a composite article and arrives at a heading the agency itself has already rejected for an identical product in a published ruling.
No Chapter 99 awareness
Chapter 99 carries the Section 301 surcharges, Section 232 steel and aluminum and copper duties, Section 122 IEEPA surcharges, AD/CVD references, and every special trade-action overlay. New 9903 provisions are added by Federal Register notice every few weeks. A raw LLM either forgets to mention Chapter 99 entirely (returning a "free" rate that ignores 25 to 50 percent in stacked surcharges) or hallucinates a 9903 provision that does not exist. Either way, the duty answer is wrong by a factor of two or more.
Confidently wrong reasoning chains
The most insidious failure mode is when the prompt forces the model to "show its work." The model produces a clean, plausible reasoning chain that ends in a wrong answer. The reasoning mentions GRI 1, mentions the chapter note, cites the heading text correctly, and then plugs in a fabricated suffix at the end. A reader scanning the rationale agrees with each step and does not notice the suffix never existed.
What an MCP-augmented prompt changes
MCP is the Model Context Protocol, an open spec for letting an AI model call external tools over JSON-RPC. Anthropic introduced it in November 2024; OpenAI added MCP connector support to ChatGPT in 2026 Q2. The same Tandom MCP server is reachable from both clients.
The Tandom MCP server at mcp.tandom.ai/mcp exposes a set of tools the model can call on demand, the same way it would call any function. Get an API key from api.tandom.ai and configure it in your MCP client.
| Tool name | What it does | Returns |
|---|---|---|
| lookup_hts_code | Verify a 10-digit code exists; pull MFN rate, special rates, units, chapter. | Full HTSUS detail or "not found" |
| search_hts_codes | Search by code prefix or description keywords; filter by chapter. | Ranked list of candidate codes |
| tandom_hts_hierarchy | Walk the chapter, heading, subheading, tariff item, statistical suffix tree for any code. | Full hierarchy with parents and children |
| tandom_hts_notes | Pull the section and chapter notes for any heading. | Note text in full |
| tandom_duty_calculate | Calculate every duty layer (MFN, Section 232, Section 301, Section 122, AD/CVD, MPF, HMF) for a code, country, value, and entry date. | Per-layer breakdown with totals |
| tandom_adcvd_check | Surface AD/CVD orders and scope advisories for the code, country, and entry date. | Order matches with case numbers |
The model sees the tool schemas and decides when to call them. A prompt that says "classify this product" with MCP attached produces a different output: the model proposes a heading, calls search_hts_codes to confirm candidates, calls lookup_hts_code on the chosen 10-digit code to verify it exists, calls tandom_duty_calculate for the duty stack, and returns the full picture. The fabrication failure mode goes away because every code the model returns is a code the API said exists.
The communication format is JSON-RPC 2.0 with three methods most users see: initialize, tools/list, and tools/call. Claude Desktop, Claude Code, ChatGPT (with MCP connectors), and any MCP-compatible client handle the wire format automatically; the broker writes the prompt, not the JSON.
The broker-grade prompt template
Paste this directly into Claude or ChatGPT. The wording is deliberate; each clause closes a known failure mode.
You are a US customs classifier. Classify the product below to a 10-digit
HTSUS code per the legal sequence in 19 CFR 152.11.
# Product facts
- Name: <product name>
- Function and end use: <what it does, who buys it>
- Material breakdown by weight: <e.g., 60% cotton, 40% polyester>
- Material breakdown by value: <e.g., 70% steel housing, 30% plastic>
- Country of origin: <ISO-2, e.g., CN>
- Country of melt and pour (if steel content > 0): <ISO-2 or "unknown">
- Country of smelt and cast (if aluminum content > 0): <ISO-2 or "unknown">
- Retail or industrial: <retail / industrial / both>
- Imported as: <complete article / parts / kit / set>
- Entry date: <YYYY-MM-DD>
- Declared value: <USD>
# Classification sequence (do not skip steps)
1. Read the section note and chapter note for every candidate heading.
Quote the relevant exclusion or definition. If you cannot quote it,
call tandom_hts_notes.
2. Apply GRI 1: classify per the heading text and notes alone if they
resolve the question. Stop here if so.
3. If GRI 1 does not resolve, apply GRI 2 then GRI 3(a) (most specific
description). Stop at the first rule that resolves.
4. Apply GRI 3(b) (essential character) only if 3(a) does not resolve.
Weigh nature, bulk, quantity, weight, value, and role in use.
5. Drill via GRI 6 to the 8-digit tariff item, then to the 10-digit
statistical suffix.
# Verification (mandatory)
- Call lookup_hts_code on your final 10-digit code. If it returns
"not found," restart classification. Do NOT invent suffixes.
- Call tandom_duty_calculate with the code, country, value, and entry
date. Report every Chapter 99 overlay returned.
- Call tandom_adcvd_check. Report any AD/CVD case match.
- If the product contains steel or aluminum, restate the Section 232
layer with country of melt-pour or smelt explicitly.
# Output format
Return a single JSON object:
{
"hts10": "XXXX.XX.XX.XX",
"verifiedExists": true,
"headingRationale": "<2-4 sentences citing GRI step and notes>",
"subheadingRationale": "<1-2 sentences citing GRI 6>",
"dutyStack": <verbatim from tandom_duty_calculate>,
"adcvdAdvisory": <verbatim from tandom_adcvd_check>,
"uncertainty": "<plain language: what facts would change the answer>",
"crossSearchSuggested": <true if classification is ambiguous>
}
# Forbidden
- Do not return a 10-digit code unless lookup_hts_code confirmed it.
- Do not paraphrase the duty stack; return the verbatim tool output.
- Do not skip the section or chapter notes.
- Do not apply GRI 3(b) before GRI 1, GRI 2, or GRI 3(a).Three things to know about this template. First, the mandatory tool calls are what eliminate the fabrication failure mode. The model can no longer return a 10-digit code that does not exist because the prompt says it cannot. Second, the forbidden list is non-trivial; LLMs ignore soft suggestions but respect hard prohibitions. Third, the JSON output shape is machine-readable, which means you can pipe it directly into your TMS or filing tool without a second pass.
The same template works in raw mode (no MCP) by removing the tool-call lines. In raw mode, treat the model output as a first draft and verify every code yourself before filing. See the verification section below.
Five products, three models
Each row below is a real product description run through (1) raw ChatGPT, (2) raw Claude, and (3) Claude paired with the Tandom MCP server. The "verified" column shows the actual HTSUS code per the live schedule, confirmed via the Tandom HTS Catalog at the time of writing.
| Product | Raw ChatGPT | Raw Claude | Claude + MCP | Verified |
|---|---|---|---|---|
Men's white cotton T-shirt Knit, plain, no pockets, made in Vietnam, retail-packaged in dozens | 6109.10.00.99 Fabricated | 6109.10.00 Incomplete | 6109.10.00.12 OK | 6109.10.00.12 |
Leather wallet Bovine leather outer surface, polyester lining, billfold style, made in Italy | 4202.31.6000 Fabricated | 4202.31.60.00 OK | 4202.31.60.00 OK | 4202.31.60.00 |
Stainless-steel kitchen knife Forged 420-grade stainless blade, polypropylene handle, made in China, retail | 8211.92.40 Incomplete | 8211.92.20 Incomplete | 8211.92.20.00 OK | 8211.92.20.00 |
Smartphone 5G handset, 6.1 in OLED, lithium-ion battery, made in China, retail | 8517.12.00.00 Stale | 8517.12.00 Stale | 8517.13.00.00 OK | 8517.13.00.00 |
Wireless earbuds with charging case Bluetooth, plastic housing, lithium-ion battery, USB-C case sold with earbuds, made in China | 8518.30.20.50 Fabricated | 8518.30.20 Incomplete | 8518.30.20.00 OK | 8518.30.20.00 |
Three patterns hold across the test set:
- Raw models are right at the heading level, wrong at the suffix. Both raw ChatGPT and raw Claude pick the correct 4-digit heading on every product. Both invent or pick stale 10-digit suffixes on at least three of five.
- Stale training data is the smartphone tell. Both raw models cite 8517.12.00.00 for smartphones, the pre-2022 code. The current code is 8517.13.00.00. A model with MCP gets it right because it calls search_hts_codes and reads the live schedule.
- The MCP-augmented model handles GRI 5 cleanly. On the wireless-earbud composite (charging case sold with the earbuds), the MCP-equipped model reaches the right answer (8518.30.20.00) because it calls tandom_hts_notes and reads the GRI 5(a) language about cases. Raw models reach the right answer on heading 8518 but pick the wrong suffix.
Worked example, T-shirt deep dive
The textile case is instructive because it stresses the statistical-suffix layer hardest. The product: men's cotton knit T-shirt, white, no pockets or trim, made in Vietnam, imported in dozen quantities for retail. The classification drilldown goes:
- Chapter 61 covers articles of apparel and clothing accessories, knitted or crocheted. Chapter 61 note 9 specifies "garments designed for left over right closure at the front shall be regarded as men's or boys' garments." T-shirts skip this note for unisex sizing.
- Heading 6109: T-shirts, singlets, tank tops and similar garments, knitted or crocheted.
- Subheading 6109.10: Of cotton.
- Tariff item 6109.10.00: Of cotton (where the rate lives, MFN 16.5 percent).
- Statistical suffix: 6109.10.00.04 for "T-shirts, all white, short hemmed sleeves, hemmed bottom, crew or round neckline, or V-neck with a mitered seam at the center of the V, without pockets, trim or embroidery (352)" or 6109.10.00.12 for "Men's." The right answer depends on whether the SKU is plain white or printed, and whether the units are men's or boys'. A raw LLM picks one based on the prompt phrasing; the MCP-equipped model calls search_hts_codes with the description and gets back both candidates with distinguishing text.
Statistical suffix 6109.10.00.04 carries category 352 for textile-quota reporting. 6109.10.00.12 carries category 338. The two are not interchangeable in the entry summary; ACE edits enforce category alignment per the suffix the importer declares. A raw LLM that returns "6109.10.00" without picking a suffix is incomplete; one that returns "6109.10.00.99" (a fabrication, no such code exists) is a filing error.
Verifying any LLM answer before you file
Treat every LLM output as a first draft, including MCP-augmented output. The model may have called the right tools and still arrived at a defensible-but-wrong heading choice for a composite article. Three checks before the entry summary leaves your TMS:
1. Confirm the suffix exists
The 10-digit code the model returned must return a result on tariffs.tandom.ai/hts-catalog (HTTP 200 with full detail) and on hts.usitc.gov. If either returns nothing, the code is wrong; restart classification with a different candidate. Two real fabrications caught in our pipeline: 7318.15.5095 (cap screws, model wrote it; the real code is 7318.15.80.66) and 6109.10.00.99 (T-shirts, model wrote it; no such suffix exists).
2. Re-run the duty stack
Pipe the proposed code, country, value, and entry date through the Tandom calculator at tariffs.tandom.ai/calculator and confirm every Chapter 99 overlay the model mentioned appears in the engine output. If the engine returns a 25 percent Section 232 line and the model omitted it, the model guessed at the duty stack. Replace its narrative with the engine output verbatim.
3. Search CROSS for adverse rulings
Search rulings.cbp.gov by HTS prefix and by product keyword. A binding ruling on a materially identical product is dispositive. A binding ruling on a similar product can either confirm the model's reasoning or reject the heading entirely. The LLM cannot do this step; you have to.
4. Reasonable care, in writing
Save the prompt, the model's full response, and the verification artifacts (catalog 200 confirmations, calculator output, CROSS search results) in your entry file. 19 USC 1484 places the reasonable-care duty on the importer of record. CBP asks how the classification was reached during a Form 28 response or a CF-29 audit; "the AI said so" is not a defense. "The AI proposed it, we verified the code exists in HTSUS 2026 Revision 4, we re-ran the duty stack via Tandom, and we searched CROSS for similar rulings" is.
Common pitfalls
The mistakes that show up most often when brokers introduce LLMs into the classification workflow.
Trusting a confident-sounding suffix
The model writes 10-digit codes with the same prose confidence whether the code is real or fabricated. Tone is not evidence. Run the suffix through the catalog before relying on it. The 7318.15.5095 cap-screw fabrication looked exactly as confident as the real 7318.15.80.66.
Asking for "the HTS code" instead of a code with rationale
A prompt that asks "what is the HTS code for X" gets a one-line answer with no reasoning. That is the worst failure mode: no GRI sequence, no note read, no rationale to audit. Ask for the GRI step that disposes, the note quotation, and the heading rationale. The model is more likely to be right when it has to show its work.
Skipping the entry date
The entry date controls which HTSUS revision applies, which Chapter 99 provisions are in force, and which AD/CVD orders are active. Without an entry date, the model picks a default (often the training cutoff date) and applies wrong surcharges. Always include the entry date in ISO format.
Treating Explanatory Notes as binding
The WCO Explanatory Notes are persuasive in CBP and the Court of International Trade, not binding in the US. The model treats them as authoritative because that is how the EN read in training text. The legal hierarchy is HTSUS text and notes first, EN second, prior CBP rulings third (binding only on materially identical products). Brokers who let the model dispose on EN alone get wrong answers on edge cases.
Using the same prompt for raw and MCP modes
A prompt that mandates tool calls is wrong for a raw model (the tools are not available). A prompt that omits the tool mandate is wrong for an MCP model (the tools are available and not used). Maintain two templates and pick the one that matches the runtime.
Not handling AD/CVD as a separate question
AD/CVD scope is not encoded in the HTS code. A correctly classified entry can still be subject to an AD/CVD order if the country, manufacturer, and product description match a Commerce scope. The classification prompt should explicitly call tandom_adcvd_check and report any matches, separate from the HTS code answer.
Pasting confidential commercial invoices into ChatGPT
Default ChatGPT and Claude settings train on user data unless the workspace is enterprise-tier with the training opt-out. Importers should not paste shipper details, valuation, or buyer-supplier relationships into the consumer tier. Use the enterprise tier, the API directly, or a self-hosted MCP client; redact identifiers when in doubt.
Replacing the broker review step
The throughput gain from LLM-assisted classification comes from compressing the easy 80 percent, not from skipping the hard 20. Brokers should still hand-review composite articles, novel products, and anything where the model output flags uncertainty. The reasonable-care duty under 19 USC 1484 sits with the importer regardless.
Forgetting that connector beta features can change
OpenAI's MCP connectors and ChatGPT custom GPT actions are actively evolving. Anthropic publishes MCP spec changes at modelcontextprotocol.io. Re-test your connection quarterly; a server that worked in January can require auth-flow changes by July.
Using the LLM to make the binding-ruling decision
The decision to file a binding ruling under 19 CFR Part 177 is a legal one. The model can suggest when classification is ambiguous enough to warrant filing; it cannot make the call. That seat is the broker's or counsel's.
Pasting the Tandom API key into the prompt
API keys belong in the MCP client configuration (Claude Desktop config, ChatGPT connector settings) where they are attached as headers. Pasting the key into a prompt sends it through the model's context window and into chat history, which is a leak path. Configure the client once; never put the key in prose.
Trusting an LLM-recited proclamation number
Proclamation numbers, FR doc numbers, and CSMS message IDs have the same fabrication risk as HTS suffixes. A model asserting "Proclamation 10895 added a 25 percent steel duty" might be right or might be paraphrasing from training data that referenced a different proclamation. Verify the proclamation by searching federalregister.gov before relying on it.
Glossary
- LLM (Large Language Model)
- A neural-network model trained on large text corpora that generates output token by token. Examples: Claude, ChatGPT, Gemini. Without external tool access, an LLM works only from training data plus the current prompt.
- MCP (Model Context Protocol)
- An open spec at modelcontextprotocol.io introduced by Anthropic in November 2024 for letting an AI model call external tools through JSON-RPC. The Tandom MCP server exposes HTSUS lookup, duty calculation, AD/CVD check, and related tools.
- MCP server
- A service that speaks the MCP wire protocol. Tools are advertised through tools/list and invoked through tools/call. The Tandom MCP server lives at mcp.tandom.ai/mcp.
- Custom GPT action
- OpenAI's parallel mechanism for letting ChatGPT call external tools. Custom GPTs configured by an enterprise can wrap the Tandom REST API directly. ChatGPT also supports MCP connectors (in beta as of 2026 Q2).
- Statistical suffix fabrication
- A failure mode where an LLM generates a plausible-looking 10-digit HTS code that does not exist in the schedule. The model has memorized the format but not the exact suffix. Real example: 7318.15.5095, written confidently for cap screws; the real code is 7318.15.80.66.
- Stale training data
- The HTSUS revision the LLM learned from, typically the schedule as of its training cutoff. A model trained through early 2024 still cites smartphones at 8517.12.00.00; the current code is 8517.13.00.00 (USITC 2022 Revision 6 split).
- Tool call
- A request the LLM makes to call a function in the runtime (e.g., lookup_hts_code), with structured arguments. The tool returns a structured response that the model incorporates into its output.
- JSON-RPC 2.0
- A simple remote-procedure-call protocol over HTTP, the wire format MCP uses. Three methods most users see: initialize, tools/list, tools/call.
- Reasonable care
- The statutory duty under 19 USC 1484 placed on the importer of record to enter, classify, and value imported merchandise correctly. Hiring a broker or using an LLM does not transfer the duty. The importer carries the legal risk.
- CROSS
- CBP's Customs Rulings Online Search System at rulings.cbp.gov. Free public database of binding rulings, scope rulings, and HQ decisions since 1989. The first place to look before filing a binding ruling.
- Chapter 99
- The HTSUS chapter that carries Section 232 (steel, aluminum, copper), Section 301 (China remedies), Section 122 (IEEPA surcharges), and other trade-action overlays. New 9903 provisions are added by Federal Register notice. Raw LLMs without tool access routinely miss Chapter 99.
- CBP Form 28
- A request for information CBP sends after entry to clarify classification, valuation, or origin. A documented classification trail (prompt, model output, verification artifacts) supports a clean Form 28 response.
- Connector (in ChatGPT)
- OpenAI's term for an external service ChatGPT can call, including MCP connectors and custom GPT actions. The Tandom MCP server is reachable from ChatGPT via either mechanism.
FAQ
High-intent questions brokers and importers ask about using AI for HTS classification.