Skip to content

miguelamezola/fileshift

Repository files navigation

FileShift

FileShift is an automated file archiving service that uploads aged files to AWS S3 based on file age and extensions. It runs as a Docker container triggered by a systemd timer, making it perfect for archiving old recordings, logs, or backups to cloud storage. Optimized for S3 Glacier to minimize storage costs.

Features

  • 🕒 Age-based filtering: Upload files older than a specified number of days
  • 📁 Extension filtering: Target specific file types (e.g., .mkv, .mp4, .log)
  • ☁️ S3 Upload & Archival: Direct upload to AWS S3 with automatic local deletion
  • Cost-optimized: Support for Glacier storage class (84-96% cheaper than standard)
  • 🐳 Docker containerized: No dependency management, runs isolated
  • Automated scheduling: Runs daily via systemd timer (configurable schedule)
  • 🔄 Recursive scanning: Scans subdirectories for matching files
  • 🗂️ Structure preservation: Optionally maintain directory structure in S3
  • 📝 Comprehensive logging: Track all operations via systemd journal and log files
  • 🧪 Dry-run mode: Test configuration without uploading files
  • ⚙️ Flexible configuration: JSON-based configuration file
  • AWS Integration: Support for multiple authentication methods
  • S3 Lifecycle rules: Automatic transition between storage classes

Installation

Recommended Installation Method: Docker with Systemd Timer

Prerequisites

  • Ubuntu or any systemd-based Linux distribution
  • Docker and Docker Compose installed
  • AWS account with S3 bucket
  • AWS credentials configured (aws configure)

Quick Install (5 minutes)

  1. Clone or download this repository:
cd /path/to/fileshift
  1. Run the Docker installation script:
chmod +x docker-install.sh
./docker-install.sh
  1. Configure your settings:
nano config.json

Edit the configuration with your S3 bucket and file preferences:

{
  "source_dir": "/source",
  "destination": {
    "s3_bucket": "your-bucket-name",
    "s3_prefix": "archived-files",
    "aws_region": "us-east-1",
    "storage_class": "GLACIER",
    "delete_after_upload": true
  },
  "age_days": 30,
  "file_extensions": [".mkv", ".mp4", ".mov", ".avi"]
}
  1. Update docker-compose.yml with your source directory:
nano docker-compose.yml
# Change: /path/to/your/source:/source
# To your actual directory
  1. Test with a dry run:
docker compose run --rm fileshift -c /config/config.json --dry-run
  1. Set up the systemd timer for daily automated runs:
sudo cp fileshift-docker.service fileshift-docker.timer /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable fileshift-docker.timer
sudo systemctl start fileshift-docker.timer
  1. Verify the timer is active:
systemctl status fileshift-docker.timer

That's it! FileShift will now run automatically every day at 2 AM.

  1. Edit the configuration file:
nano config.json
  1. Configure AWS credentials (for S3 destinations):
# Option 1: Using AWS CLI configuration (recommended)
### Optional: S3 Lifecycle Policy for Cost Savings

Set up automatic transition to Glacier for existing files:

```bash
cat > /tmp/lifecycle-policy.json << 'EOF'
{
  "Rules": [
    {
      "ID": "TransitionToGlacier",
      "Status": "Enabled",
      "Filter": {
        "Prefix": "archived-files/"
      },
      "Transitions": [
        {
          "Days": 1,
          "StorageClass": "GLACIER"
        }
      ]
    }
  ]
}
EOF

aws s3api put-bucket-lifecycle-configuration \
  --bucket your-bucket-name \
  --lifecycle-configuration file:///tmp/lifecycle-policy.json

This ensures all files (including those uploaded before adding storage_class: GLACIER) are transitioned to Glacier after 1 day.

Usage

Manual Execution

Run FileShift manually with Docker:

docker compose run --rm fileshift -c /config/config.json

Run with dry-run mode:

docker compose run --rm fileshift -c /config/config.json --dry-run

Systemd Timer Management

The systemd timer runs FileShift automatically every day at 2:00 AM.

Check timer status:

systemctl status fileshift-docker.timer
systemctl list-timers | grep fileshift

Manually trigger a run:

sudo systemctl start fileshift-docker.service

View service logs:

sudo journalctl -u fileshift-docker.service -f

Change schedule: Edit /etc/systemd/system/fileshift-docker.timer and modify the OnCalendar setting:

# Run at 3 AM instead
OnCalendar=*-*-* 03:00:00

# Run every 6 hours
OnCalendar=*-*-* 00,06,12,18:00:00

Then reload:

sudo systemctl daemon-reload
sudo systemctl restart fileshift-docker.timer

Monitoring Progress

Check uploaded files:

aws s3 ls s3://your-bucket-name/archived-files/ --human-readable

Check remaining local files:

ls -lh /path/to/source/*.mkv | wc -l

Monitor Docker container:

docker ps | grep fileshift
docker stats fileshift-fileshift-run-xxxxx

View upload logs:

tail -f logs/fileshift.log

Configuration

The configuration file is located at config.json in the project directory. This file is mounted into the Docker container at /config/config.json.

S3 Destination Example

{
  "source_dir": "/source",
  "destination": {
    "s3_bucket": "my-archive-bucket",
    "s3_prefix": "videos/archived",
    "aws_region": "us-east-1",
    "aws_profile": "default",
    "storage_class": "STANDARD_IA",
  }
}

Configuration

The configuration file is located at config.json in the project directory. This file is mounted into the Docker container at /config/config.json.

Configuration Example: Video Archive to S3 Glacier

{
  "source_dir": "/source",
  "destination": {
    "s3_bucket": "my-video-archive",
    "s3_prefix": "archived-files",
    "storage_class": "GLACIER",
    "delete_after_upload": true,
    "overwrite": false
  },
  "age_days": 30,
  "file_extensions": [".mkv", ".mp4", ".mov", ".avi"],
  "recursive": true,
  "preserve_structure": false,
  "time_attribute": "mtime",
  "dry_run": false,
  "logging": {
    "log_file": "/logs/fileshift.log",
    "log_level": "INFO"
  }
}

Note:

  • source_dir should be /source when using Docker (mapped from host directory in docker-compose.yml)
  • log_file should be /logs/fileshift.log to use the mounted logs directory
  • storage_class: GLACIER provides 84% cost savings compared to STANDARD ($3.18/month vs $20.29/month for 1TB)

Configuration Options

Common Options

Option Type Description
source_dir string Directory to scan for files (use /source with Docker)
destination string or object Local path (string) or S3 configuration (object)
age_days number Files older than this many days will be moved
file_extensions array List of file extensions to target (with or without dot)
recursive boolean Whether to scan subdirectories (default: true)
preserve_structure boolean Maintain directory structure in destination (default: false)
time_attribute string Time attribute to check: "mtime" (modification), "atime" (access), or "ctime" (change)
dry_run boolean If true, only log what would be done without moving files
logging.log_file string Path to log file (use /logs/fileshift.log with Docker)
logging.log_level string Log level: DEBUG, INFO, WARNING, ERROR, CRITICAL

S3-Specific Options

When destination is an object with s3_bucket:

Option Type Required Description
s3_bucket string Yes S3 bucket name
s3_prefix string No Prefix (folder path) in S3 bucket
storage_class string No GLACIER (recommended), STANDARD, REDUCED_REDUNDANCY, STANDARD_IA, ONEZONE_IA, INTELLIGENT_TIERING, DEEP_ARCHIVE, GLACIER_IR
delete_after_upload boolean No Delete local file after successful upload (default: true)
overwrite boolean No Overwrite existing S3 objects (default: false)

Advanced Options (usually not needed):

Option Type Required Description
aws_region string No AWS region (e.g., "us-east-1") - auto-detected from credentials
aws_profile string No AWS credentials profile name - use ~/.aws mount instead
server_side_encryption string No Encryption: AES256 or aws:kms

AWS Credentials Configuration

Recommended Method for Docker: Mount your AWS credentials directory into the container (already configured in docker-compose.yml):

volumes:
  - ~/.aws:/root/.aws:ro

Then configure AWS CLI on your host:

aws configure

Alternative Methods:

  1. Named Profile in docker-compose.yml:
environment:
  - AWS_PROFILE=my-profile
  1. Environment Variables in docker-compose.yml:
environment:
  - AWS_ACCESS_KEY_ID=your-key-id
  - AWS_SECRET_ACCESS_KEY=your-secret-key
  - AWS_DEFAULT_REGION=us-east-1
  1. IAM Role (for EC2 instances): No configuration needed if running on EC2 with an attached IAM role.

Cost Optimization with Glacier

For long-term video archival, use Glacier storage class:

Cost Comparison for 1TB:

  • STANDARD: $0.023/GB/month = $23.55/month ($282.60/year)
  • GLACIER: $0.0036/GB/month = $3.69/month ($44.28/year)
  • Savings: 84% ($238.32/year)

Additional Cost Optimization: Set up S3 Lifecycle Policy to transition older files to Glacier (see installation section above). This ensures even files uploaded with STANDARD storage class will automatically transition to Glacier after 1 day.

Retrieval Considerations:

  • Glacier retrieval takes 3-5 hours (Expedited: 1-5 minutes, costs extra)
  • Retrieval cost: $0.01/GB for standard retrieval
  • Best for archival data that's rarely accessed

Scheduler Management

Start the scheduler (runs daily at 2 AM):

docker-compose up -d fileshift-scheduler

Stop the scheduler:

docker-compose down
**Retrieval cost: $0.01/GB for standard retrieval
- Best for archival data that's rarely accessed

## Examples

### Example 1: Upload Old Videos to S3 Glacier

Upload video files older than 30 days to S3 with Glacier storage for cost savings:

```json
{
  "source_dir": "/source",
  "destination": {
    "s3_bucket": "my-video-archive",
    "s3_prefix": "archived-files",
    "storage_class": "GLACIER",
    "delete_after_upload": true
  },
  "age_days": 30,
  "file_extensions": [".mkv", ".mp4", ".mov", ".avi"],
  "recursive": true,
  "preserve_structure": false,
  "time_attribute": "mtime"
}

Example 2: Archive Old Log Files to S3

Move log files older than 90 days to S3 with standard infrequent access storage:

{
  "source_dir": "/var/log/myapp",
  "destination": {
    "s3_bucket": "company-logs-archive",
    "s3_prefix": "application-logs",
    "aws_profile": "backup-profile",
    "storage_class": "STANDARD_IA",
    "server_side_encryption": "AES256",
    "delete_after_upload": true
  },
  "age_days": 90,
  "file_extensions": [".log", ".log.gz"],
  "recursive": true,
  "preserve_structure": true,
  "time_attribute": "mtime"
}

Example 3: Archive Old Log Files (Local)

Move log files older than 30 days to a local archive directory:

{
  "source_dir": "/var/log/myapp",
  "destination": "/var/log/myapp/archive",
  "age_days": 30,
  "file_extensions": [".log", ".log.gz"],
  "recursive": true,
  "preserve_structure": true,
  "time_attribute": "mtime"
}

Example 4: Clean Up Temporary Files (Local)

Move temporary files older than 7 days to a cleanup directory:

{
  "source_dir": "/tmp/workspace",
  "destination": "/tmp/old",
  "age_days": 7,
  "file_extensions": [".tmp", ".temp", ".cache"],
  "recursive": false,
  "preserve_structure": false,
  "time_attribute": "atime"
}

"preserve_structure": false, "time_attribute": "mtime", "dry_run": false, "logging": { "log_file": "/logs/fileshift.log", "log_level": "INFO" } }


**Run with Docker:**
```bash
# Update docker-compose.yml with your source directory:
volumes:
  - /path/to/your/videos:/source

# Test with dry run first:
docker compose run --rm fileshift -c /config/config.json --dry-run

# Run actual upload:
docker compose run --rm fileshift -c /config/config.json

Example 2: Daily Automated Archival with Systemd

Set up the systemd timer to run automatically every day at 2 AM:

# Already configured during installation
sudo systemctl status fileshift-docker.timer

# View next scheduled run:
systemctl list-timers | grep fileshift

# Manually trigger a run:
sudo systemctl start fileshift-docker.service

# View logs:
sudo journalctl -u fileshift-docker.service -f

Example 3: Monitor Large Upload Progress

When uploading large video files (several GB each):

# Check what Docker is doing:
docker ps | grep fileshift
docker stats <container-id>

# Check S3 bucket (files appear only after complete upload):
aws s3 ls s3://your-bucket-name/archived-files/ --human-readable

# Check local files remaining:
ls -lh /path/to/source/*.mkv | wc -l

# View detailed logs:
tail -f logs/fileshift.log

Note: Large files take time to upload. A 25GB video file may take an hour on typical home internet. Monitor with docker stats to see network I/O progress.

Troubleshooting

Docker & Systemd Issues

Check service status:

sudo systemctl status fileshift-docker.timer
sudo systemctl status fileshift-docker.service

View logs:

# Systemd logs
sudo journalctl -u fileshift-docker.service -f

# Application logs
tail -f logs/fileshift.log

# Docker logs
docker ps -a | grep fileshift
docker logs <container-id>

Container not starting:

docker compose logs fileshift
docker compose ps

Rebuild image after code changes:

docker compose build --no-cache

Check AWS credentials in container:

docker compose run --rm fileshift sh -c "aws s3 ls s3://your-bucket-name"

Permission Issues

Volume mount issues:

# Check if source directory is accessible
ls -la /path/to/source

# Check config file ownership (should be your user, not root)
ls -la config.json
chown $USER:$USER config.json

Cannot delete files after upload:

  • Ensure source volume mount is NOT read-only in docker-compose.yml
  • Should be: /path/to/source:/source (not /path/to/source:/source:ro)

Upload Issues

Files not appearing in S3:

  • Large files appear only after complete upload
  • Monitor progress with docker stats <container-id> - check NET I/O
  • Each file is uploaded atomically (all-or-nothing)

Slow uploads:

  • Normal for large video files (25GB file ≈ 1 hour on 60Mbps upload)
  • Check network: docker stats shows NET I/O throughput
  • Consider running manually during off-hours instead of daily timer

Duplicate containers running:

# Check for multiple running containers
docker ps | grep fileshift

# Stop all fileshift containers
docker stop $(docker ps -q --filter ancestor=fileshift)

AWS/S3 Issues

Credentials not found:

# Verify AWS CLI works on host
aws s3 ls

# Verify credentials mounted correctly
docker compose run --rm fileshift sh -c "ls -la /root/.aws"
docker compose run --rm fileshift sh -c "cat /root/.aws/config"

Bucket access denied:

# Verify bucket exists and you have access
aws s3 ls s3://your-bucket-name

# Check IAM permissions - need s3:PutObject at minimum
aws s3api get-bucket-policy --bucket your-bucket-name

Storage class not applied:

  • Check config.json has "storage_class": "GLACIER"
  • Verify after upload: aws s3api head-object --bucket your-bucket --key archived-files/file.mkv
  • Set up lifecycle policy to transition existing files (see installation section)

Common Configuration Mistakes

Wrong paths in config.json:

  • When using Docker, source_dir should be /source (container path)
  • log_file should be /logs/fileshift.log (container path)
  • Host paths are specified in docker-compose.yml, not config.json

Timer not running:

# Check timer is enabled
sudo systemctl is-enabled fileshift-docker.timer

# Enable if disabled
sudo systemctl enable fileshift-docker.timer

# Check next trigger time
systemctl list-timers | grep fileshift

Service stays "activating":

  • Normal behavior for Type=oneshot services during execution
  • Service is actively running, not stuck
  • Check logs to see progress: sudo journalctl -u fileshift-docker.service -f

Uninstallation

Docker Installation

# Stop and remove systemd timer
sudo systemctl stop fileshift-docker.timer
sudo systemctl disable fileshift-docker.timer
sudo rm /etc/systemd/system/fileshift-docker.{service,timer}
sudo systemctl daemon-reload

# Remove Docker containers and images
docker compose down
docker rmi fileshift:latest

# Optional: Remove project directory
cd ..
rm -rf fileshift

Clean Up S3 (Optional)

# List objects before deletion
aws s3 ls s3://your-bucket-name/archived-files/ --recursive --human-readable

# Delete all objects in prefix
aws s3 rm s3://your-bucket-name/archived-files/ --recursive

# Delete bucket
aws s3 rb s3://your-bucket-name --force

Note: Glacier objects have minimum 90-day storage commitment. Deleting early incurs pro-rated storage charges for the remaining days.

Advanced Usage

Using Different Storage Classes

GLACIER (Recommended for archival):

  • Lowest cost: $0.0036/GB/month
  • 3-5 hour retrieval time
  • Best for rarely accessed data
"storage_class": "GLACIER"

GLACIER_IR (Instant Retrieval):

  • Low cost: $0.004/GB/month
  • Millisecond retrieval
  • Best for quarterly access patterns
"storage_class": "GLACIER_IR"

INTELLIGENT_TIERING:

  • Automatically moves data between tiers
  • $0.0025/GB/month monitoring fee
  • Best when access patterns unknown
"storage_class": "INTELLIGENT_TIERING"

Customizing the Schedule

Edit /etc/systemd/system/fileshift-docker.timer:

[Timer]
# Run every day at 3:30 AM
OnCalendar=*-*-* 03:30:00

# Or run every 6 hours
OnCalendar=*-*-* 00,06,12,18:00:00

# Or run every Monday at 1:00 AM
OnCalendar=Mon *-*-* 01:00:00

# Run 5 minutes after boot (in case system was off)
OnBootSec=5min

After editing:

sudo systemctl daemon-reload
sudo systemctl restart fileshift-docker.timer

Running on Multiple Directories

Create separate config files for each directory:

# config-videos.json
{
  "source_dir": "/source",
  ...
}

# config-photos.json
{
  "source_dir": "/source",
  ...
}

Update docker-compose.yml with multiple volume mounts:

volumes:
  - /path/to/your/videos:/source
  - ./config-videos.json:/config/config.json

Create separate systemd services for each configuration.

Monitoring with Prometheus (Advanced)

FileShift logs include structured data that can be parsed for metrics:

  • Files processed
  • Bytes uploaded
  • Errors encountered
  • Processing time

Use a log parser like promtail or filebeat to export metrics to Prometheus/Grafana for visualization.

Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Test with Docker
  4. Submit a pull request

License

MIT License - see LICENSE file for details

Support

For issues or questions:

  • Check the Troubleshooting section above
  • Review logs: tail -f logs/fileshift.log or sudo journalctl -u fileshift-docker.service
  • Open an issue on GitHub with:
    • Log excerpts
    • Configuration (sanitized)
    • Docker/OS version
    • Steps to reproduce

FAQ

Q: How long does it take to upload 1TB of videos? A: Depends on your upload speed. At 60Mbps upload (typical home internet), approximately:

  • 60 Mbps = 7.5 MB/s
  • 1TB / 7.5 MB/s ≈ 37 hours

Q: Can I pause and resume uploads? A: Uploads are per-file atomic. If interrupted, completed files remain in S3, incomplete files remain local. Re-run FileShift to continue.

Q: Why don't I see files in S3 immediately? A: Files appear in S3 only after complete upload. Large video files (10-30GB each) take time. Monitor progress with docker stats.

Q: Will I be charged if I delete Glacier files early? A: Yes, Glacier has 90-day minimum storage commitment. Early deletion incurs pro-rated charges for remaining days.

Q: Can I use this with other cloud providers (Azure, GCP)? A: Currently only S3 is supported. Local filesystem destination works for all providers via mounted volumes.

Q: How do I retrieve files from Glacier? A: Use AWS Console or CLI:

# Initiate retrieval (takes 3-5 hours)
aws s3api restore-object --bucket your-bucket --key archived-files/file.mkv \
  --restore-request Days=7,GlacierJobParameters={Tier=Standard}

# After 3-5 hours, download
aws s3 cp s3://your-bucket/archived-files/file.mkv ./file.mkv

Q: What happens if the systemd service fails? A: Check logs with sudo journalctl -u fileshift-docker.service. The timer will retry at the next scheduled time (2 AM daily).

Q: Can I run this on Windows/macOS? A: Docker parts work on all platforms. Systemd timer is Linux-only. On Windows/macOS, use Docker alone with Task Scheduler/cron.

Security Considerations

  • AWS credentials mounted read-only from ~/.aws directory
  • Configuration and logs stored in project directory (not system-wide)
  • Docker container runs with user permissions (User=youruser in systemd service)
  • Test with dry-run mode before enabling
  • Use IAM roles or AWS CLI configuration instead of embedding keys in config
  • Ensure proper file permissions on configuration:
chmod 600 config.json
  • Consider using S3 bucket encryption for sensitive data
  • S3 bucket policies: Use least privilege (PutObject only if possible)

License

MIT License - feel free to use and modify as needed.

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Test with Docker and dry-run mode
  4. Submit a pull request with clear description

Support

For issues or questions:

  • Check the Troubleshooting section
  • Check the FAQ section
  • Review logs: tail -f logs/fileshift.log
  • Open an issue on GitHub with configuration (sanitized) and log excerpts

About

Automated file archiving service that uploads aged files to AWS S3 Glacier storage. Docker-based with systemd timer scheduling for cost-effective cloud backup of videos, logs, and backups.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors