How to read and echo file size of uploaded file being written at server in real time without blocking at both server and client?

You need to clearstatcache to get real file size. With few other bits fixed, your stream.php may look like following:

<?php

header("Content-Type: text/event-stream");
header("Cache-Control: no-cache");
header("Connection: keep-alive");
// Check if the header's been sent to avoid `PHP Notice:  Undefined index: HTTP_LAST_EVENT_ID in stream.php on line `
// php 7+
//$lastId = $_SERVER["HTTP_LAST_EVENT_ID"] ?? 0;
// php < 7
$lastId = isset($_SERVER["HTTP_LAST_EVENT_ID"]) ? intval($_SERVER["HTTP_LAST_EVENT_ID"]) : 0;

$upload = $_GET["filename"];
$data = 0;
// if file already exists, its initial size can be bigger than the new one, so we need to ignore it
$wasLess = $lastId != 0;
while ($data < $_GET["filesize"] || !$wasLess) {
    // system calls are expensive and are being cached with assumption that in most cases file stats do not change often
    // so we clear cache to get most up to date data
    clearstatcache(true, $upload);
    $data = filesize($upload);
    $wasLess |= $data <  $_GET["filesize"];
    // don't send stale filesize
    if ($wasLess) {
        sendMessage($lastId, $data);
        $lastId++;
    }
    // not necessary here, though without thousands of `message` events will be dispatched
    //sleep(1);
    // millions on poor connection and large files. 1 second might be too much, but 50 messages a second must be okay
    usleep(20000);
}

function sendMessage($id, $data)
{
    echo "id: $id\n";
    echo "data: $data\n\n";
    ob_flush();
    // no need to flush(). It adds content length of the chunk to the stream
    // flush();
}

Few caveats:

Security. I mean luck of it. As I understand it is a proof of concept, and security is the least of concerns, yet the disclaimer should be there. This approach is fundamentally flawed, and should be used only if you don’t care of DOS attacks or information about your files goes out.

CPU. Without usleep the script will consume 100% of a single core. With long sleep you are at risk of uploading the whole file within a single iteration and the exit condition will be never met. If you are testing it locally, the usleep should be removed completely, since it is matter of milliseconds to upload MBs locally.

Open connections. Both apache and nginx/fpm have finite number of php processes that can serve the requests. A single file upload will takes 2 for the time required to upload the file. With slow bandwidth or forged requests, this time can be quite long, and the web server may start to reject requests.

Clientside part. You need to analyse the response and finally stop listening to the events when the file is fully uploaded.

EDIT:

To make it more or less production friendly, you will need an in-memory storage like redis, or memcache to store file metadata.

Making a post request, add a unique token which identify the file, and the file size.

In your javascript:

const fileId = Math.random().toString(36).substr(2); // or anything more unique
...

const [request, source] = [
    new Request(`${url}?fileId=${fileId}&size=${filesize}`, {
        method:"POST", headers:headers, body:file
    })
    , new EventSource(`${stream}?fileId=${fileId}`)
];
....

In data.php register the token and report progress by chunks:

....

$fileId = $_GET['fileId'];
$fileSize = $_GET['size'];

setUnique($fileId, 0, $fileSize);

while ($uploaded = stream_copy_to_stream($input, $file, 1024)) {
    updateProgress($id, $uploaded);
}
....


/**
 * Check if Id is unique, and store processed as 0, and full_size as $size 
 * Set reasonable TTL for the key, e.g. 1hr 
 *
 * @param string $id
 * @param int $size
 * @throws Exception if id is not unique
 */
function setUnique($id, $size) {
    // implement with your storage of choice
}

/**
 * Updates uploaded size for the given file
 *
 * @param string $id
 * @param int $processed
 */
function updateProgress($id, $processed) {
    // implement with your storage of choice
}

So your stream.php don’t need to hit the disk at all, and can sleep as long as it is acceptable by UX:

....
list($progress, $size) = getProgress('non_existing_key_to_init_default_values');
$lastId = 0;

while ($progress < $size) {
    list($progress, $size) = getProgress($_GET["fileId"]);
    sendMessage($lastId, $progress);
    $lastId++;
    sleep(1);
}
.....


/**
 * Get progress of the file upload.
 * If id is not there yet, returns [0, PHP_INT_MAX]
 *
 * @param $id
 * @return array $bytesUploaded, $fileSize
 */
function getProgress($id) {
    // implement with your storage of choice
}

The problem with 2 open connections cannot be solved unless you give up EventSource for old good pulling. Response time of stream.php without loop is a matter of milliseconds, and it is quite wasteful to keep the connection open all the time, unless you need hundreds updates a second.

Leave a Comment