FileShift is an automated file archiving service that uploads aged files to AWS S3 based on file age and extensions. It runs as a Docker container triggered by a systemd timer, making it perfect for archiving old recordings, logs, or backups to cloud storage. Optimized for S3 Glacier to minimize storage costs.
- 🕒 Age-based filtering: Upload files older than a specified number of days
- 📁 Extension filtering: Target specific file types (e.g.,
.mkv,.mp4,.log) - ☁️ S3 Upload & Archival: Direct upload to AWS S3 with automatic local deletion
- � Cost-optimized: Support for Glacier storage class (84-96% cheaper than standard)
- 🐳 Docker containerized: No dependency management, runs isolated
- ⏰ Automated scheduling: Runs daily via systemd timer (configurable schedule)
- 🔄 Recursive scanning: Scans subdirectories for matching files
- 🗂️ Structure preservation: Optionally maintain directory structure in S3
- 📝 Comprehensive logging: Track all operations via systemd journal and log files
- 🧪 Dry-run mode: Test configuration without uploading files
- ⚙️ Flexible configuration: JSON-based configuration file
- � AWS Integration: Support for multiple authentication methods
- � S3 Lifecycle rules: Automatic transition between storage classes
Recommended Installation Method: Docker with Systemd Timer
- Ubuntu or any systemd-based Linux distribution
- Docker and Docker Compose installed
- AWS account with S3 bucket
- AWS credentials configured (
aws configure)
- Clone or download this repository:
cd /path/to/fileshift- Run the Docker installation script:
chmod +x docker-install.sh
./docker-install.sh- Configure your settings:
nano config.jsonEdit the configuration with your S3 bucket and file preferences:
{
"source_dir": "/source",
"destination": {
"s3_bucket": "your-bucket-name",
"s3_prefix": "archived-files",
"aws_region": "us-east-1",
"storage_class": "GLACIER",
"delete_after_upload": true
},
"age_days": 30,
"file_extensions": [".mkv", ".mp4", ".mov", ".avi"]
}- Update docker-compose.yml with your source directory:
nano docker-compose.yml
# Change: /path/to/your/source:/source
# To your actual directory- Test with a dry run:
docker compose run --rm fileshift -c /config/config.json --dry-run- Set up the systemd timer for daily automated runs:
sudo cp fileshift-docker.service fileshift-docker.timer /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable fileshift-docker.timer
sudo systemctl start fileshift-docker.timer- Verify the timer is active:
systemctl status fileshift-docker.timerThat's it! FileShift will now run automatically every day at 2 AM.
- Edit the configuration file:
nano config.json- Configure AWS credentials (for S3 destinations):
# Option 1: Using AWS CLI configuration (recommended)
### Optional: S3 Lifecycle Policy for Cost Savings
Set up automatic transition to Glacier for existing files:
```bash
cat > /tmp/lifecycle-policy.json << 'EOF'
{
"Rules": [
{
"ID": "TransitionToGlacier",
"Status": "Enabled",
"Filter": {
"Prefix": "archived-files/"
},
"Transitions": [
{
"Days": 1,
"StorageClass": "GLACIER"
}
]
}
]
}
EOF
aws s3api put-bucket-lifecycle-configuration \
--bucket your-bucket-name \
--lifecycle-configuration file:///tmp/lifecycle-policy.jsonThis ensures all files (including those uploaded before adding storage_class: GLACIER) are transitioned to Glacier after 1 day.
Run FileShift manually with Docker:
docker compose run --rm fileshift -c /config/config.jsonRun with dry-run mode:
docker compose run --rm fileshift -c /config/config.json --dry-runThe systemd timer runs FileShift automatically every day at 2:00 AM.
Check timer status:
systemctl status fileshift-docker.timer
systemctl list-timers | grep fileshiftManually trigger a run:
sudo systemctl start fileshift-docker.serviceView service logs:
sudo journalctl -u fileshift-docker.service -fChange schedule:
Edit /etc/systemd/system/fileshift-docker.timer and modify the OnCalendar setting:
# Run at 3 AM instead
OnCalendar=*-*-* 03:00:00
# Run every 6 hours
OnCalendar=*-*-* 00,06,12,18:00:00Then reload:
sudo systemctl daemon-reload
sudo systemctl restart fileshift-docker.timerCheck uploaded files:
aws s3 ls s3://your-bucket-name/archived-files/ --human-readableCheck remaining local files:
ls -lh /path/to/source/*.mkv | wc -lMonitor Docker container:
docker ps | grep fileshift
docker stats fileshift-fileshift-run-xxxxxView upload logs:
tail -f logs/fileshift.logThe configuration file is located at config.json in the project directory. This file is mounted into the Docker container at /config/config.json.
{
"source_dir": "/source",
"destination": {
"s3_bucket": "my-archive-bucket",
"s3_prefix": "videos/archived",
"aws_region": "us-east-1",
"aws_profile": "default",
"storage_class": "STANDARD_IA",
}
}The configuration file is located at config.json in the project directory. This file is mounted into the Docker container at /config/config.json.
{
"source_dir": "/source",
"destination": {
"s3_bucket": "my-video-archive",
"s3_prefix": "archived-files",
"storage_class": "GLACIER",
"delete_after_upload": true,
"overwrite": false
},
"age_days": 30,
"file_extensions": [".mkv", ".mp4", ".mov", ".avi"],
"recursive": true,
"preserve_structure": false,
"time_attribute": "mtime",
"dry_run": false,
"logging": {
"log_file": "/logs/fileshift.log",
"log_level": "INFO"
}
}Note:
source_dirshould be/sourcewhen using Docker (mapped from host directory in docker-compose.yml)log_fileshould be/logs/fileshift.logto use the mounted logs directorystorage_class: GLACIERprovides 84% cost savings compared to STANDARD ($3.18/month vs $20.29/month for 1TB)
| Option | Type | Description |
|---|---|---|
source_dir |
string | Directory to scan for files (use /source with Docker) |
destination |
string or object | Local path (string) or S3 configuration (object) |
age_days |
number | Files older than this many days will be moved |
file_extensions |
array | List of file extensions to target (with or without dot) |
recursive |
boolean | Whether to scan subdirectories (default: true) |
preserve_structure |
boolean | Maintain directory structure in destination (default: false) |
time_attribute |
string | Time attribute to check: "mtime" (modification), "atime" (access), or "ctime" (change) |
dry_run |
boolean | If true, only log what would be done without moving files |
logging.log_file |
string | Path to log file (use /logs/fileshift.log with Docker) |
logging.log_level |
string | Log level: DEBUG, INFO, WARNING, ERROR, CRITICAL |
When destination is an object with s3_bucket:
| Option | Type | Required | Description |
|---|---|---|---|
s3_bucket |
string | Yes | S3 bucket name |
s3_prefix |
string | No | Prefix (folder path) in S3 bucket |
storage_class |
string | No | GLACIER (recommended), STANDARD, REDUCED_REDUNDANCY, STANDARD_IA, ONEZONE_IA, INTELLIGENT_TIERING, DEEP_ARCHIVE, GLACIER_IR |
delete_after_upload |
boolean | No | Delete local file after successful upload (default: true) |
overwrite |
boolean | No | Overwrite existing S3 objects (default: false) |
Advanced Options (usually not needed):
| Option | Type | Required | Description |
|---|---|---|---|
aws_region |
string | No | AWS region (e.g., "us-east-1") - auto-detected from credentials |
aws_profile |
string | No | AWS credentials profile name - use ~/.aws mount instead |
server_side_encryption |
string | No | Encryption: AES256 or aws:kms |
Recommended Method for Docker: Mount your AWS credentials directory into the container (already configured in docker-compose.yml):
volumes:
- ~/.aws:/root/.aws:roThen configure AWS CLI on your host:
aws configureAlternative Methods:
- Named Profile in docker-compose.yml:
environment:
- AWS_PROFILE=my-profile- Environment Variables in docker-compose.yml:
environment:
- AWS_ACCESS_KEY_ID=your-key-id
- AWS_SECRET_ACCESS_KEY=your-secret-key
- AWS_DEFAULT_REGION=us-east-1- IAM Role (for EC2 instances): No configuration needed if running on EC2 with an attached IAM role.
For long-term video archival, use Glacier storage class:
Cost Comparison for 1TB:
- STANDARD: $0.023/GB/month = $23.55/month ($282.60/year)
- GLACIER: $0.0036/GB/month = $3.69/month ($44.28/year)
- Savings: 84% ($238.32/year)
Additional Cost Optimization: Set up S3 Lifecycle Policy to transition older files to Glacier (see installation section above). This ensures even files uploaded with STANDARD storage class will automatically transition to Glacier after 1 day.
Retrieval Considerations:
- Glacier retrieval takes 3-5 hours (Expedited: 1-5 minutes, costs extra)
- Retrieval cost: $0.01/GB for standard retrieval
- Best for archival data that's rarely accessed
Start the scheduler (runs daily at 2 AM):
docker-compose up -d fileshift-schedulerStop the scheduler:
docker-compose down
**Retrieval cost: $0.01/GB for standard retrieval
- Best for archival data that's rarely accessed
## Examples
### Example 1: Upload Old Videos to S3 Glacier
Upload video files older than 30 days to S3 with Glacier storage for cost savings:
```json
{
"source_dir": "/source",
"destination": {
"s3_bucket": "my-video-archive",
"s3_prefix": "archived-files",
"storage_class": "GLACIER",
"delete_after_upload": true
},
"age_days": 30,
"file_extensions": [".mkv", ".mp4", ".mov", ".avi"],
"recursive": true,
"preserve_structure": false,
"time_attribute": "mtime"
}Move log files older than 90 days to S3 with standard infrequent access storage:
{
"source_dir": "/var/log/myapp",
"destination": {
"s3_bucket": "company-logs-archive",
"s3_prefix": "application-logs",
"aws_profile": "backup-profile",
"storage_class": "STANDARD_IA",
"server_side_encryption": "AES256",
"delete_after_upload": true
},
"age_days": 90,
"file_extensions": [".log", ".log.gz"],
"recursive": true,
"preserve_structure": true,
"time_attribute": "mtime"
}Move log files older than 30 days to a local archive directory:
{
"source_dir": "/var/log/myapp",
"destination": "/var/log/myapp/archive",
"age_days": 30,
"file_extensions": [".log", ".log.gz"],
"recursive": true,
"preserve_structure": true,
"time_attribute": "mtime"
}Move temporary files older than 7 days to a cleanup directory:
{
"source_dir": "/tmp/workspace",
"destination": "/tmp/old",
"age_days": 7,
"file_extensions": [".tmp", ".temp", ".cache"],
"recursive": false,
"preserve_structure": false,
"time_attribute": "atime"
}"preserve_structure": false, "time_attribute": "mtime", "dry_run": false, "logging": { "log_file": "/logs/fileshift.log", "log_level": "INFO" } }
**Run with Docker:**
```bash
# Update docker-compose.yml with your source directory:
volumes:
- /path/to/your/videos:/source
# Test with dry run first:
docker compose run --rm fileshift -c /config/config.json --dry-run
# Run actual upload:
docker compose run --rm fileshift -c /config/config.json
Set up the systemd timer to run automatically every day at 2 AM:
# Already configured during installation
sudo systemctl status fileshift-docker.timer
# View next scheduled run:
systemctl list-timers | grep fileshift
# Manually trigger a run:
sudo systemctl start fileshift-docker.service
# View logs:
sudo journalctl -u fileshift-docker.service -fWhen uploading large video files (several GB each):
# Check what Docker is doing:
docker ps | grep fileshift
docker stats <container-id>
# Check S3 bucket (files appear only after complete upload):
aws s3 ls s3://your-bucket-name/archived-files/ --human-readable
# Check local files remaining:
ls -lh /path/to/source/*.mkv | wc -l
# View detailed logs:
tail -f logs/fileshift.logNote: Large files take time to upload. A 25GB video file may take an hour on typical home internet. Monitor with docker stats to see network I/O progress.
Check service status:
sudo systemctl status fileshift-docker.timer
sudo systemctl status fileshift-docker.serviceView logs:
# Systemd logs
sudo journalctl -u fileshift-docker.service -f
# Application logs
tail -f logs/fileshift.log
# Docker logs
docker ps -a | grep fileshift
docker logs <container-id>Container not starting:
docker compose logs fileshift
docker compose psRebuild image after code changes:
docker compose build --no-cacheCheck AWS credentials in container:
docker compose run --rm fileshift sh -c "aws s3 ls s3://your-bucket-name"Volume mount issues:
# Check if source directory is accessible
ls -la /path/to/source
# Check config file ownership (should be your user, not root)
ls -la config.json
chown $USER:$USER config.jsonCannot delete files after upload:
- Ensure source volume mount is NOT read-only in docker-compose.yml
- Should be:
/path/to/source:/source(not/path/to/source:/source:ro)
Files not appearing in S3:
- Large files appear only after complete upload
- Monitor progress with
docker stats <container-id>- check NET I/O - Each file is uploaded atomically (all-or-nothing)
Slow uploads:
- Normal for large video files (25GB file ≈ 1 hour on 60Mbps upload)
- Check network:
docker statsshows NET I/O throughput - Consider running manually during off-hours instead of daily timer
Duplicate containers running:
# Check for multiple running containers
docker ps | grep fileshift
# Stop all fileshift containers
docker stop $(docker ps -q --filter ancestor=fileshift)Credentials not found:
# Verify AWS CLI works on host
aws s3 ls
# Verify credentials mounted correctly
docker compose run --rm fileshift sh -c "ls -la /root/.aws"
docker compose run --rm fileshift sh -c "cat /root/.aws/config"Bucket access denied:
# Verify bucket exists and you have access
aws s3 ls s3://your-bucket-name
# Check IAM permissions - need s3:PutObject at minimum
aws s3api get-bucket-policy --bucket your-bucket-nameStorage class not applied:
- Check config.json has
"storage_class": "GLACIER" - Verify after upload:
aws s3api head-object --bucket your-bucket --key archived-files/file.mkv - Set up lifecycle policy to transition existing files (see installation section)
Wrong paths in config.json:
- When using Docker,
source_dirshould be/source(container path) log_fileshould be/logs/fileshift.log(container path)- Host paths are specified in docker-compose.yml, not config.json
Timer not running:
# Check timer is enabled
sudo systemctl is-enabled fileshift-docker.timer
# Enable if disabled
sudo systemctl enable fileshift-docker.timer
# Check next trigger time
systemctl list-timers | grep fileshiftService stays "activating":
- Normal behavior for Type=oneshot services during execution
- Service is actively running, not stuck
- Check logs to see progress:
sudo journalctl -u fileshift-docker.service -f
# Stop and remove systemd timer
sudo systemctl stop fileshift-docker.timer
sudo systemctl disable fileshift-docker.timer
sudo rm /etc/systemd/system/fileshift-docker.{service,timer}
sudo systemctl daemon-reload
# Remove Docker containers and images
docker compose down
docker rmi fileshift:latest
# Optional: Remove project directory
cd ..
rm -rf fileshift# List objects before deletion
aws s3 ls s3://your-bucket-name/archived-files/ --recursive --human-readable
# Delete all objects in prefix
aws s3 rm s3://your-bucket-name/archived-files/ --recursive
# Delete bucket
aws s3 rb s3://your-bucket-name --forceNote: Glacier objects have minimum 90-day storage commitment. Deleting early incurs pro-rated storage charges for the remaining days.
GLACIER (Recommended for archival):
- Lowest cost: $0.0036/GB/month
- 3-5 hour retrieval time
- Best for rarely accessed data
"storage_class": "GLACIER"GLACIER_IR (Instant Retrieval):
- Low cost: $0.004/GB/month
- Millisecond retrieval
- Best for quarterly access patterns
"storage_class": "GLACIER_IR"INTELLIGENT_TIERING:
- Automatically moves data between tiers
- $0.0025/GB/month monitoring fee
- Best when access patterns unknown
"storage_class": "INTELLIGENT_TIERING"Edit /etc/systemd/system/fileshift-docker.timer:
[Timer]
# Run every day at 3:30 AM
OnCalendar=*-*-* 03:30:00
# Or run every 6 hours
OnCalendar=*-*-* 00,06,12,18:00:00
# Or run every Monday at 1:00 AM
OnCalendar=Mon *-*-* 01:00:00
# Run 5 minutes after boot (in case system was off)
OnBootSec=5minAfter editing:
sudo systemctl daemon-reload
sudo systemctl restart fileshift-docker.timerCreate separate config files for each directory:
# config-videos.json
{
"source_dir": "/source",
...
}
# config-photos.json
{
"source_dir": "/source",
...
}Update docker-compose.yml with multiple volume mounts:
volumes:
- /path/to/your/videos:/source
- ./config-videos.json:/config/config.jsonCreate separate systemd services for each configuration.
FileShift logs include structured data that can be parsed for metrics:
- Files processed
- Bytes uploaded
- Errors encountered
- Processing time
Use a log parser like promtail or filebeat to export metrics to Prometheus/Grafana for visualization.
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Test with Docker
- Submit a pull request
MIT License - see LICENSE file for details
For issues or questions:
- Check the Troubleshooting section above
- Review logs:
tail -f logs/fileshift.logorsudo journalctl -u fileshift-docker.service - Open an issue on GitHub with:
- Log excerpts
- Configuration (sanitized)
- Docker/OS version
- Steps to reproduce
Q: How long does it take to upload 1TB of videos? A: Depends on your upload speed. At 60Mbps upload (typical home internet), approximately:
- 60 Mbps = 7.5 MB/s
- 1TB / 7.5 MB/s ≈ 37 hours
Q: Can I pause and resume uploads? A: Uploads are per-file atomic. If interrupted, completed files remain in S3, incomplete files remain local. Re-run FileShift to continue.
Q: Why don't I see files in S3 immediately?
A: Files appear in S3 only after complete upload. Large video files (10-30GB each) take time. Monitor progress with docker stats.
Q: Will I be charged if I delete Glacier files early? A: Yes, Glacier has 90-day minimum storage commitment. Early deletion incurs pro-rated charges for remaining days.
Q: Can I use this with other cloud providers (Azure, GCP)? A: Currently only S3 is supported. Local filesystem destination works for all providers via mounted volumes.
Q: How do I retrieve files from Glacier? A: Use AWS Console or CLI:
# Initiate retrieval (takes 3-5 hours)
aws s3api restore-object --bucket your-bucket --key archived-files/file.mkv \
--restore-request Days=7,GlacierJobParameters={Tier=Standard}
# After 3-5 hours, download
aws s3 cp s3://your-bucket/archived-files/file.mkv ./file.mkvQ: What happens if the systemd service fails?
A: Check logs with sudo journalctl -u fileshift-docker.service. The timer will retry at the next scheduled time (2 AM daily).
Q: Can I run this on Windows/macOS? A: Docker parts work on all platforms. Systemd timer is Linux-only. On Windows/macOS, use Docker alone with Task Scheduler/cron.
- AWS credentials mounted read-only from
~/.awsdirectory - Configuration and logs stored in project directory (not system-wide)
- Docker container runs with user permissions (User=youruser in systemd service)
- Test with dry-run mode before enabling
- Use IAM roles or AWS CLI configuration instead of embedding keys in config
- Ensure proper file permissions on configuration:
chmod 600 config.json- Consider using S3 bucket encryption for sensitive data
- S3 bucket policies: Use least privilege (PutObject only if possible)
MIT License - feel free to use and modify as needed.
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Test with Docker and dry-run mode
- Submit a pull request with clear description
For issues or questions:
- Check the Troubleshooting section
- Check the FAQ section
- Review logs:
tail -f logs/fileshift.log - Open an issue on GitHub with configuration (sanitized) and log excerpts