boto3
How to read a list of parquet files from S3 as a pandas dataframe using pyarrow?
You should use the s3fs module as proposed by yjk21. However as result of calling ParquetDataset you’ll get a pyarrow.parquet.ParquetDataset object. To get the Pandas DataFrame you’ll rather want to apply .read_pandas().to_pandas() to it: import pyarrow.parquet as pq import s3fs s3 = s3fs.S3FileSystem() pandas_dataframe = pq.ParquetDataset(‘s3://your-bucket/’, filesystem=s3).read_pandas().to_pandas()
Retrieving subfolders names in S3 bucket from boto3
Below piece of code returns ONLY the ‘subfolders’ in a ‘folder’ from s3 bucket. import boto3 bucket=”my-bucket” #Make sure you provide / in the end prefix = ‘prefix-name-with-slash/’ client = boto3.client(‘s3’) result = client.list_objects(Bucket=bucket, Prefix=prefix, Delimiter=”https://stackoverflow.com/”) for o in result.get(‘CommonPrefixes’): print ‘sub folder : ‘, o.get(‘Prefix’) For more details, you can refer to https://github.com/boto/boto3/issues/134
Listing contents of a bucket with boto3
One way to see the contents would be: for my_bucket_object in my_bucket.objects.all(): print(my_bucket_object)
How to SSH and run commands in EC2 using boto3?
This thread is a bit old, but since I’ve spent a frustrating afternoon discovering a simple solution, I might as well share it. NB This is not a strict answer to the OP’s question, as it doesn’t use ssh. But, one point of boto3 is that you don’t have to – so I think in … Read more
Error “Read-only file system” in AWS Lambda when downloading a file from S3
Only /tmp seems to be writable in AWS Lambda. Therefore this would work: filepath=”/tmp/” + key References: https://aws.amazon.com/blogs/compute/choosing-between-aws-lambda-data-storage-options-in-web-apps https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html
Boto3 Error: botocore.exceptions.NoCredentialsError: Unable to locate credentials
try specifying keys manually s3 = boto3.resource(‘s3′, aws_access_key_id=ACCESS_ID, aws_secret_access_key= ACCESS_KEY) Make sure you don’t include your ACCESS_ID and ACCESS_KEY in the code directly for security concerns. Consider using environment configs and injecting them in the code as suggested by @Tiger_Mike. For Prod environments consider using rotating access keys: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html#Using_RotateAccessKey
check if a key exists in a bucket in s3 using boto3
Boto 2’s boto.s3.key.Key object used to have an exists method that checked if the key existed on S3 by doing a HEAD request and looking at the the result, but it seems that that no longer exists. You have to do it yourself: import boto3 import botocore s3 = boto3.resource(‘s3’) try: s3.Object(‘my-bucket’, ‘dootdoot.jpg’).load() except botocore.exceptions.ClientError … Read more
Read file content from S3 bucket with boto3
boto3 offers a resource model that makes tasks like iterating through objects easier. Unfortunately, StreamingBody doesn’t provide readline or readlines. s3 = boto3.resource(‘s3’) bucket = s3.Bucket(‘test-bucket’) # Iterates through all the objects, doing the pagination for you. Each obj # is an ObjectSummary, so it doesn’t contain the body. You’ll need to call # get … Read more