Your cart is currently empty!
Web Scraping Service Documentation
Getting Started
Learn how to integrate and begin using our web scraping service. Our API provides a simple and efficient way to extract data from websites.
import requests
# Authentication
API_KEY = 'your_api_key'
BASE_URL = 'https://api.machinelabs.in/v1'
# Basic request setup
headers = {
'Authorization': f'Bearer {API_KEY}',
'Content-Type': 'application/json'
}
Basic Usage
Start with simple data extraction requests. Here’s how to scrape a basic webpage:
# Simple GET request
def scrape_page(url):
endpoint = f"{BASE_URL}/scrape"
payload = {
'url': url,
'format': 'json'
}
response = requests.post(endpoint, json=payload, headers=headers)
return response.json()
# Example usage
url = 'https://example.com/products'
data = scrape_page(url)
Headless Browsers
For JavaScript-heavy websites, use our headless browser functionality:
# Headless browser request
def scrape_dynamic_page(url):
endpoint = f"{BASE_URL}/scrape/dynamic"
payload = {
'url': url,
'wait_for': '.product-container', # CSS selector to wait for
'javascript': True,
'browser': 'chrome'
}
response = requests.post(endpoint, json=payload, headers=headers)
return response.json()
Custom Headers
Customize your requests with specific headers:
# Adding custom headers
def scrape_with_headers(url, custom_headers):
endpoint = f"{BASE_URL}/scrape"
payload = {
'url': url,
'headers': custom_headers
}
response = requests.post(endpoint, json=payload, headers=headers)
return response.json()
# Example headers
custom_headers = {
'User-Agent': 'Mozilla/5.0...',
'Accept-Language': 'en-US,en;q=0.9',
'Referer': 'https://google.com'
}
Sessions
Maintain session state across multiple requests:
# Session handling
def create_session():
endpoint = f"{BASE_URL}/session/create"
response = requests.post(endpoint, headers=headers)
return response.json()['session_id']
def scrape_with_session(url, session_id):
endpoint = f"{BASE_URL}/scrape"
payload = {
'url': url,
'session_id': session_id
}
response = requests.post(endpoint, json=payload, headers=headers)
return response.json()
Geographic Location
Specify geographic locations for your requests:
# Geo-located requests
def scrape_from_location(url, country_code):
endpoint = f"{BASE_URL}/scrape"
payload = {
'url': url,
'geo': country_code, # e.g., 'US', 'UK', 'DE'
'format': 'json'
}
response = requests.post(endpoint, json=payload, headers=headers)
return response.json()
Premium Residential Proxies
Access our premium proxy network for improved success rates:
# Using residential proxies
def scrape_with_residential_proxy(url):
endpoint = f"{BASE_URL}/scrape/premium"
payload = {
'url': url,
'proxy_type': 'residential',
'rotating': True
}
response = requests.post(endpoint, json=payload, headers=headers)
return response.json()
Account Information
Monitor your usage and account status:
# Account status and usage
def get_account_info():
endpoint = f"{BASE_URL}/account"
response = requests.get(endpoint, headers=headers)
return response.json()
def get_usage_stats():
endpoint = f"{BASE_URL}/account/usage"
response = requests.get(endpoint, headers=headers)
return response.json()
Rate Limits and Quotas
- Basic Plan: 50,000 requests/month
- Professional Plan: 200,000 requests/month
- Advanced Plan: 400,000 requests/month
Error Handling
Always implement proper error handling in your code:
def safe_scrape(url):
try:
response = scrape_page(url)
return response
except requests.exceptions.RequestException as e:
print(f"Error during scraping: {e}")
return None
except Exception as e:
print(f"Unexpected error: {e}")
return None
Best Practices
Always implement proper error handling in your code:
- Implement rate limiting in your code
- Handle errors gracefully
- Cache results when possible
- Use session handling for related requests
- Monitor your usage regularly