ó ôBQc@sdZddlZddlZddlZddlZddlZddlZddlZddlm Z ddl m Z ddl m Z ejdƒZdZdZd efd „ƒYZd efd „ƒYZd efd„ƒYZdefd„ƒYZe eddd„Zd„ZdS(sÌImplementation of wildcarding over StorageUris. StorageUri is an abstraction that Google introduced in the boto library, for representing storage provider-independent bucket and object names with a shorthand URI-like syntax (see boto/boto/storage_uri.py) The current class provides wildcarding support for StorageUri objects (including both bucket and file system objects), allowing one to express collections of objects with syntax like the following: gs://mybucket/images/*.png file:///tmp/???abc??? We provide wildcarding support as part of gsutil rather than as part of boto because wildcarding is really part of shell command-like functionality. A comment about wildcard semantics: We support both single path component wildcards (e.g., using '*') and recursive wildcards (using '**'), for both file and cloud URIs. For example, gs://bucket/doc/*/*.html would enumerate HTML files one directory down from gs://bucket/doc, while gs://bucket/**/*.html would enumerate HTML files in all objects contained in the bucket. Note also that if you use file system wildcards it's likely your shell interprets the wildcarding before passing the command to gsutil. For example: % gsutil cp /opt/eclipse/*/*.html gs://bucket/eclipse would likely be expanded by the shell into the following before running gsutil: % gsutil cp /opt/eclipse/RUNNING.html gs://bucket/eclipse Note also that most shells don't support '**' wildcarding (I think only zsh does). If you want to use '**' wildcarding with such a shell you can single quote each wildcarded string, so it gets passed uninterpreted by the shell to gsutil (at which point gsutil will perform the wildcarding expansion): % gsutil cp '/opt/eclipse/**/*.html' gs://bucket/eclipse iÿÿÿÿN(tPrefix(tBucketStorageUri(tBucketListingRefs[*?\[\]]twildcard_object_iteratortwildcard_bucket_iteratortWildcardIteratorcBseZdZd„ZRS(s<Base class for wildcarding over StorageUris. This class implements support for iterating over StorageUris that contain wildcards. The base class is abstract; you should instantiate using the wildcard_iterator() static factory method, which chooses the right implementation depending on the StorageUri. cCs d|jS(s2Returns string representation of WildcardIterator.sWildcardIterator(%s)(t wildcard_uri(tself((s5/tmp/tmp.yUYbTOKr8o/gsutil/gslib/wildcard_iterator.pyt__repr__Xs(t__name__t __module__t__doc__R(((s5/tmp/tmp.yUYbTOKr8o/gsutil/gslib/wildcard_iterator.pyRMs tCloudWildcardIteratorcBsPeZdZeeddd„Zd„Zd„Zd„Z d„Z d„Z RS( s¯WildcardIterator subclass for buckets and objects. Iterates over BucketListingRef matching the StorageUri wildcard. It's much more efficient to request the Key from the BucketListingRef (via GetKey()) than to request the StorageUri and then call uri.get_key() to retrieve the key, for cases where you want to get metadata that's available in the Bucket (for example to get the name and size of each object), because that information is available in the bucket GET results. If you were to iterate over URIs for such cases and then get the name and size info from each resulting StorageUri, it would cause an additional object GET request for each of the result URIs. icCsX||_|dkr!i|_n|jƒ|_||_||_||_||_dS(sõ Instantiates an iterator over BucketListingRef matching given wildcard URI. Args: wildcard_uri: StorageUri that contains the wildcard to iterate. proj_id_handler: ProjectIdHandler to use for current command. bucket_storage_uri_class: BucketStorageUri interface. Settable for testing/mocking. headers: Dictionary containing optional HTTP headers to pass to boto. debug: Debug level to pass in to boto connection (range 0..3). N(RtNonetheaderstcopytproj_id_handlertbucket_storage_uri_classt all_versionstdebug(RRRRRRR((s5/tmp/tmp.yUYbTOKr8o/gsutil/gslib/wildcard_iterator.pyt__init__ks      c cs2t|jjƒrétj|jjƒ}g}tj|ƒ}|jjt |j|j ƒx£|jj d|j ƒD]q}|j |j ƒrqd|jjtjt|j ƒƒf}|jtj|d|jd|jdtƒƒqqqqWn|jjdƒg}|jjt|j|j ƒx|D] }|jjƒrYt|dd dd d|j ƒVq!t|jjƒs£|j|jjƒ}t|dd dd d|j ƒVq!|j|jjƒg}xlt|ƒd kr)|jd ƒ} |j | jƒ\} } } } tjtj| ƒƒ}x|j!d| d | d|j d |j"ƒD]â}|j |j j#d ƒƒr@| r¾|j j#d ƒ| kr¾t$|t%ƒr|j| j|j j#d ƒd | ƒƒqq"| j&|ƒ}t$|t%ƒrÿt|dd d|d|j ƒVq"t|d|dd d|j ƒVq@q@Wq¾Wq!Wd S(sPython iterator that gets called when iterating over cloud wildcard. Yields: BucketListingRef, or empty iterator if no matches. Rs%s://%sRRtsuppress_consec_slashesttkeytprefixit delimiterRt/N('tContainsWildcardRt bucket_nametfnmatcht translatetretcompileRtFillInProjectHeaderIfNeededtWILDCARD_BUCKET_ITERATORRtget_all_bucketstmatchtnametschemeturllibt quote_pluststrtappendtbotot storage_uriRRtFalsetclone_replace_nametWILDCARD_OBJECT_ITERATORt names_bucketRR t object_nametlentpopt_BuildBucketFilterStringst list_bucketRtrstript isinstanceRtclone_replace_key(Rtregext bucket_uristprogtbturi_strt bucket_urit uri_to_yieldturis_needing_expansionturiRRtprefix_wildcardtsuffix_wildcardRt expanded_uri((s5/tmp/tmp.yUYbTOKr8o/gsutil/gslib/wildcard_iterator.pyt__iter__ˆsd         $c Cs€tj|ƒ}|s0|}d}|}d}n |jƒdkre||jƒ }||jƒ}n d}|}|jdƒ}|dkr||d }n|p¦d|jdƒ}||jƒ}|jdƒ}|dkrêd}n||d}|jdƒdkr&d}||}d}nd}|j|ƒ} |jdkrptj j d|||||fƒn||||fS( s£ Builds strings needed for querying a bucket and filtering results to implement wildcard object name matching. Args: wildcard: The wildcard string to match to objects. Returns: (prefix, delimiter, prefix_wildcard, suffix_wildcard) where: prefix is the prefix to be sent in bucket GET request. delimiter is the delimiter to be sent in bucket GET request. prefix_wildcard is the wildcard to be used to filter bucket GET results. suffix_wildcard is wildcard to be appended to filtered bucket GET results for next wildcard expansion iteration. For example, given the wildcard gs://bucket/abc/d*e/f*.txt we would build prefix= abc/d, delimiter=/, prefix_wildcard=d*e, and suffix_wildcard=f*.txt. Using this prefix and delimiter for a bucket listing request will then produce a listing result set that can be filtered using this prefix_wildcard; and we'd use this suffix_wildcard to feed into the next call(s) to _BuildBucketFilterStrings(), for the next iteration of listing/filtering. Raises: AssertionError if wildcard doesn't contain any wildcard chars. RRiiÿÿÿÿis**sTDEBUG: wildcard=%s, prefix=%s, delimiter=%s, prefix_wildcard=%s, suffix_wildcard=%s N( tWILDCARD_REGEXtsearchtstartR tfindR6tendRtsyststderrtwrite( RtwildcardR$RRRBRCt wildcard_partRJt delim_pos((s5/tmp/tmp.yUYbTOKr8o/gsutil/gslib/wildcard_iterator.pyR4ãs>       ccs5x.|jƒD] }|jƒr |jƒVq q WdS(sð Convenience iterator that runs underlying iterator and returns Key for each iteration. Yields: Subclass of boto.s3.key.Key, or empty iterator if no matches. Raises: WildcardException: for bucket-only uri. N(REtHasKeytGetKey(Rtbucket_listing_ref((s5/tmp/tmp.yUYbTOKr8o/gsutil/gslib/wildcard_iterator.pytIterKeys4s  ccs&x|jƒD]}|jƒVq WdS(s« Convenience iterator that runs underlying iterator and returns StorageUri for each iteration. Yields: StorageUri, or empty iterator if no matches. N(REtGetUri(RRS((s5/tmp/tmp.yUYbTOKr8o/gsutil/gslib/wildcard_iterator.pytIterUrisCsccs5x.|jƒD] }|jƒr |jƒVq q WdS(sÎ Convenience iterator that runs underlying iterator and returns the StorageUri for each iterated BucketListingRef that has a Key. Yields: StorageUri, or empty iterator if no matches. N(RERQRU(RRS((s5/tmp/tmp.yUYbTOKr8o/gsutil/gslib/wildcard_iterator.pytIterUrisForKeysNs N( R R R RR-R RRER4RTRVRW(((s5/tmp/tmp.yUYbTOKr8o/gsutil/gslib/wildcard_iterator.pyR ]s  [ Q  tFileWildcardIteratorcBs8eZdZddd„Zd„Zd„Zd„ZRS(s–WildcardIterator subclass for files and directories. If you use recursive wildcards ('**') only a single such wildcard is supported. For example you could use the wildcard '**/*.txt' to list all .txt files in any subdirectory of the current directory, but you couldn't use a wildcard like '**/abc/**/*.txt' (which would, if supported, let you find .txt files in any subdirectory named 'abc'). icCs||_||_||_dS(s7 Instantiate an iterator over BucketListingRefs matching given wildcard URI. Args: wildcard_uri: StorageUri that contains the wildcard to iterate. headers: Dictionary containing optional HTTP headers to pass to boto. debug: Debug level to pass in to boto connection (range 0..3). N(RRR(RRRR((s5/tmp/tmp.yUYbTOKr8o/gsutil/gslib/wildcard_iterator.pyRes  c #s%|jj}tjd|ƒ}|rä||jƒd }||jƒd}|jdƒrntd|ƒ‚n|s}d}n|jtj ƒ}g}x[tj |ƒD]8\‰}}|j ‡fd†t j ||ƒDƒƒq¥Wntj|ƒ}x+|D]#}|jj|ƒ} t| ƒVqúWdS(Ns\*\*iit*s5Invalid wildcard with more than 2 consecutive *s (%s)c3s$|]}tjjˆ|ƒVqdS(N(tostpathtjoin(t.0tf(tdirpath(s5/tmp/tmp.yUYbTOKr8o/gsutil/gslib/wildcard_iterator.pys ‰s(RR1RRGRHt startswithtWildcardExceptiontlstripRZtseptwalktextendRtfiltertglobR.R( RRNR$tbase_dirtremaining_wildcardt filepathstunused_dirnamest filenamestfilepathRD((R_s5/tmp/tmp.yUYbTOKr8o/gsutil/gslib/wildcard_iterator.pyRErs(    cCstdƒ‚dS(sw Placeholder to allow polymorphic use of WildcardIterator. Raises: WildcardException: in all cases. s3Iterating over Keys not possible for file wildcardsN(Ra(R((s5/tmp/tmp.yUYbTOKr8o/gsutil/gslib/wildcard_iterator.pyRT“sccs&x|jƒD]}|jƒVq WdS(s« Convenience iterator that runs underlying iterator and returns StorageUri for each iteration. Yields: StorageUri, or empty iterator if no matches. N(RERU(RRS((s5/tmp/tmp.yUYbTOKr8o/gsutil/gslib/wildcard_iterator.pyRVsN(R R R R RRERTRV(((s5/tmp/tmp.yUYbTOKr8o/gsutil/gslib/wildcard_iterator.pyRX[s  ! RacBs)eZdZd„Zd„Zd„ZRS(s+Exception thrown for invalid wildcard URIs.cCstj|ƒ||_dS(N(t StandardErrorRtreason(RRo((s5/tmp/tmp.yUYbTOKr8o/gsutil/gslib/wildcard_iterator.pyR¬s cCs d|jS(NsWildcardException: %s(Ro(R((s5/tmp/tmp.yUYbTOKr8o/gsutil/gslib/wildcard_iterator.pyR°scCs d|jS(NsWildcardException: %s(Ro(R((s5/tmp/tmp.yUYbTOKr8o/gsutil/gslib/wildcard_iterator.pyt__str__³s(R R R RRRp(((s5/tmp/tmp.yUYbTOKr8o/gsutil/gslib/wildcard_iterator.pyRa©s  ic Cs¦t|tƒr9tj|d|dtd|dtƒ}n|}|jƒrpt||d|d|d|d|ƒS|jƒr’t|d|d|ƒSt d|ƒ‚dS( s Instantiate a WildCardIterator for the given StorageUri. Args: uri_or_str: StorageUri or URI string naming wildcard objects to iterate. proj_id_handler: ProjectIdHandler to use for current command. bucket_storage_uri_class: BucketStorageUri interface. Settable for testing/mocking. headers: Dictionary containing optional HTTP headers to pass to boto. debug: Debug level to pass in to boto connection (range 0..3). Returns: A WildcardIterator that handles the requested iteration. RtvalidateRRRRs"Unexpected type of StorageUri (%s)N( R7t basestringR+R,R-t is_cloud_uriR t is_file_uriRXRa(t uri_or_strRRRRRRA((s5/tmp/tmp.yUYbTOKr8o/gsutil/gslib/wildcard_iterator.pytwildcard_iterator·s     cCs<t|tƒr"ttj|ƒƒSttj|jƒƒSdS(sChecks whether uri_or_str contains a wildcard. Args: uri_or_str: StorageUri or URI string to check. Returns: bool indicator. N(R7RrtboolRFRGRA(Ru((s5/tmp/tmp.yUYbTOKr8o/gsutil/gslib/wildcard_iterator.pyRàs (R R+RRgRZRRKR'tboto.s3.prefixRtboto.storage_uriRRSRR RFR/R"tobjectRR RXRnRaR-R RvR(((s5/tmp/tmp.yUYbTOKr8o/gsutil/gslib/wildcard_iterator.pyt8s*       þN&