Using Proxy with Hugging Face Datasets | Generated by AI

Home 2025.11

Yes — datasets follows your system / environment proxy settings, but it does not have its own proxy configuration. So you can absolutely make Hugging Face download datasets through a proxy when you’re in China.

Here’s how.


How to make load_dataset use a proxy (China-friendly)

1. Set environment variables for HTTP/HTTPS proxy

datasets uses Python’s requests library internally. requests automatically uses the standard proxy env variables:

Linux / macOS

export HTTP_PROXY="http://127.0.0.1:7890"
export HTTPS_PROXY="http://127.0.0.1:7890"

If your proxy uses SOCKS5:

export HTTP_PROXY="socks5://127.0.0.1:1080"
export HTTPS_PROXY="socks5://127.0.0.1:1080"

Windows (PowerShell)

setx HTTP_PROXY "http://127.0.0.1:7890"
setx HTTPS_PROXY "http://127.0.0.1:7890"

Then reopen terminal.


2. If you’re running Python inside code (like Jupyter), set it inside Python

import os

os.environ['HTTP_PROXY'] = 'http://127.0.0.1:7890'
os.environ['HTTPS_PROXY'] = 'http://127.0.0.1:7890'

This works reliably.


3. Confirm proxy is working

You can test with:

import requests
print(requests.get("https://api.ipify.org?format=json").text)

If the IP is your proxy’s exit node, then the Hugging Face dataset downloads will also go through that proxy.


4. Hugging Face’s huggingface_hub also respects these variables

datasets internally uses huggingface_hub, and it also follows:

So once env variables are set, both model and dataset downloads will use your proxy automatically.


Notes about usage in China


If you want, I can show:

Just tell me what you need.


Back

openai/gpt-5

Donate